Sunday, 9 Nov 2025
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > Apple aims for on-device user intent understanding with UI-JEPA models
AI

Apple aims for on-device user intent understanding with UI-JEPA models

Last updated: September 14, 2024 4:26 pm
Published September 14, 2024
Share
Apple aims for on-device user intent understanding with UI-JEPA models
SHARE

Be part of our every day and weekly newsletters for the newest updates and unique content material on industry-leading AI protection. Be taught Extra


Understanding consumer intentions based mostly on consumer interface (UI) interactions is a essential problem in creating intuitive and useful AI functions. 

In a new paper, researchers from Apple introduce UI-JEPA, an structure that considerably reduces the computational necessities of UI understanding whereas sustaining excessive efficiency. UI-JEPA goals to allow light-weight, on-device UI understanding, paving the way in which for extra responsive and privacy-preserving AI assistant functions. This might match into Apple’s broader technique of enhancing its on-device AI.

The challenges of UI understanding

Understanding consumer intents from UI interactions requires processing cross-modal options, together with photos and pure language, to seize the temporal relationships in UI sequences. 

“Whereas developments in Multimodal Massive Language Fashions (MLLMs), like Anthropic Claude 3.5 Sonnet and OpenAI GPT-4 Turbo, provide pathways for personalised planning by including private contexts as a part of the immediate to enhance alignment with customers, these fashions demand in depth computational sources, enormous mannequin sizes, and introduce excessive latency,” co-authors Yicheng Fu, Machine Studying Researcher interning at Apple, and Raviteja Anantha, Principal ML Scientist at Apple, advised VentureBeat. “This makes them impractical for situations the place light-weight, on-device options with low latency and enhanced privateness are required.”

Alternatively, present light-weight fashions that may analyze consumer intent are nonetheless too computationally intensive to run effectively on consumer units. 

The JEPA structure

UI-JEPA attracts inspiration from the Joint Embedding Predictive Structure (JEPA), a self-supervised studying method launched by Meta AI Chief Scientist Yann LeCun in 2022. JEPA goals to be taught semantic representations by predicting masked areas in photos or movies. As an alternative of making an attempt to recreate each element of the enter information, JEPA focuses on studying high-level options that seize a very powerful elements of a scene.

See also  Understanding the Intersection of Observability and Zero Trust | DCN

JEPA considerably reduces the dimensionality of the issue, permitting smaller fashions to be taught wealthy representations. Furthermore, it’s a self-supervised studying algorithm, which implies it may be skilled on giant quantities of unlabeled information, eliminating the necessity for expensive guide annotation. Meta has already launched I-JEPA and V-JEPA, two implementations of the algorithm which are designed for photos and video.

“In contrast to generative approaches that try and fill in each lacking element, JEPA can discard unpredictable data,” Fu and Anantha mentioned. “This ends in improved coaching and pattern effectivity, by an element of 1.5x to 6x as noticed in V-JEPA, which is essential given the restricted availability of high-quality and labeled UI movies.”

UI-JEPA

UI-JEPA architecture
UI-JEPA structure Credit score: arXiv

UI-JEPA builds on the strengths of JEPA and adapts it to UI understanding. The framework consists of two principal elements: a video transformer encoder and a decoder-only language mannequin. 

The video transformer encoder is a JEPA-based mannequin that processes movies of UI interactions into summary characteristic representations. The LM takes the video embeddings and generates a textual content description of the consumer intent. The researchers used Microsoft Phi-3, a light-weight LM with roughly 3 billion parameters, making it appropriate for on-device experimentation and deployment.

This mixture of a JEPA-based encoder and a light-weight LM allows UI-JEPA to realize excessive efficiency with considerably fewer parameters and computational sources in comparison with state-of-the-art MLLMs.

To additional advance analysis in UI understanding, the researchers launched two new multimodal datasets and benchmarks: “Intent within the Wild” (IIW) and “Intent within the Tame” (IIT). 

See also  DARPA program aims to sift through quantum computing hype
IIT and IIW datasets for UI-JEPA
Examples of IIT and IIW datasets for UI-JEPA Credit score: arXiv

IIW captures open-ended sequences of UI actions with ambiguous consumer intent, resembling reserving a trip rental. The dataset contains few-shot and zero-shot splits to guage the fashions’ potential to generalize to unseen duties. IIT focuses on extra frequent duties with clearer intent, resembling making a reminder or calling a contact.

“We consider these datasets will contribute to the event of extra highly effective and light-weight MLLMs, in addition to coaching paradigms with enhanced generalization capabilities,” the researchers write.

UI-JEPA in motion

The researchers evaluated the efficiency of UI-JEPA on the brand new benchmarks, evaluating it towards different video encoders and personal MLLMs like GPT-4 Turbo and Claude 3.5 Sonnet.

On each IIT and IIW, UI-JEPA outperformed different video encoder fashions in few-shot settings. It additionally achieved comparable efficiency to the a lot bigger closed fashions. However at 4.4 billion parameters, it’s orders of magnitude lighter than the cloud-based fashions. The researchers discovered that incorporating textual content extracted from the UI utilizing optical character recognition (OCR) additional enhanced UI-JEPA’s efficiency. In zero-shot settings, UI-JEPA lagged behind the frontier fashions.

UI-JEPA vs other encoders
Efficiency of UI-JEPA vs different encoders and frontier fashions on IIW and IIT datasets (increased is healthier) Credit score: arXiv

“This means that whereas UI-JEPA excels in duties involving acquainted functions, it faces challenges with unfamiliar ones,” the researchers write.

The researchers envision a number of potential makes use of for UI-JEPA fashions. One key software is creating automated suggestions loops for AI brokers, enabling them to be taught repeatedly from interactions with out human intervention. This method can considerably cut back annotation prices and guarantee consumer privateness.

See also  Snap introduces advanced AI for next-level augmented reality

“As these brokers collect extra information by way of UI-JEPA, they turn into more and more correct and efficient of their responses,” the authors advised VentureBeat. “Moreover, UI-JEPA’s capability to course of a steady stream of onscreen contexts can considerably enrich prompts for LLM-based planners. This enhanced context helps generate extra knowledgeable and nuanced plans, significantly when dealing with advanced or implicit queries that draw on previous multimodal interactions (e.g., Gaze monitoring to speech interplay).” 

One other promising software is integrating UI-JEPA into agentic frameworks designed to trace consumer intent throughout totally different functions and modalities. UI-JEPA may perform because the notion agent, capturing and storing consumer intent at numerous time factors. When a consumer interacts with a digital assistant, the system can then retrieve essentially the most related intent and generate the suitable API name to satisfy the consumer’s request.

“UI-JEPA can improve any AI agent framework by leveraging onscreen exercise information to align extra carefully with consumer preferences and predict consumer actions,” Fu and Anantha mentioned. “Mixed with temporal (e.g., time of day, day of the week) and geographical (e.g., on the workplace, at house) data, it may infer consumer intent and allow a broad vary of direct functions.” 
UI-JEPA appears to be a superb match for Apple Intelligence, which is a set of light-weight generative AI instruments that goal to make Apple units smarter and extra productive. Given Apple’s deal with privateness, the low price and added effectivity of UI-JEPA fashions can provide its AI assistants a bonus over others that depend on cloud-based fashions.


Source link
TAGGED: aims, Apple, Intent, models, ondevice, UIJEPA, Understanding, User
Share This Article
Twitter Email Copy Link Print
Previous Article OSS partners with DOD to develop rugged edge AI/ML compute solutions OSS partners with DOD to develop rugged edge AI/ML compute solutions
Next Article Man using formula math sheet to cheat on test. Hand of university student holding pencil doing math question in examination and cheating with answer piece of paper in hand. Bash command cheat sheet | Network World
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

TD SYNNEX and AWS Partner to Accelerate Cloud and AI Adoption

TD SYNNEX has introduced a brand new strategic collaboration settlement (SCA) with Amazon Net Providers…

September 2, 2025

Pay now, protect always: Securing digital payments

Dr Ismini Vasileiou, Affiliate Professor, De Montfort University and Director, East Midlands Cyber Security Cluster,…

September 12, 2025

Verne and OCF power ENGYS’ sustainable HPC deployment

Verne, supplier of sustainable knowledge middle options for prime depth computing, has introduced the profitable…

June 25, 2024

Liberty Company Raises $100M in Funding

Liberty Company, a Gainesville, FL-based insurance coverage dealer dealing with the business, private and worker…

February 16, 2025

OpenAI ramps up enterprise support with a focus on security, control, and cost

OpenAI, identified for its massive language mannequin ChatGPT, is making a powerful push for the enterprise…

April 25, 2024

You Might Also Like

NYU’s new AI architecture makes high-quality image generation faster and cheaper
AI

NYU’s new AI architecture makes high-quality image generation faster and cheaper

By saad
LLMs, ChatGPT, Generative AI
Global Market

Perplexity’s open-source tool to run trillion-parameter models without costly upgrades

By saad
Quantifying AI ROI in strategy
AI

Quantifying AI ROI in strategy

By saad
What could possibly go wrong if an enterprise replaces all its engineers with AI?
AI

What could possibly go wrong if an enterprise replaces all its engineers with AI?

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.