Friday, 1 May 2026
Subscribe
logo
  • AI Compute
  • Infrastructure
  • Power & Cooling
  • Security
  • Colocation
  • Cloud Computing
  • More
    • Sustainability
    • Industry News
    • About Data Center News
    • Terms & Conditions
Font ResizerAa
Data Center NewsData Center News
Search
  • AI Compute
  • Infrastructure
  • Power & Cooling
  • Security
  • Colocation
  • Cloud Computing
  • More
    • Sustainability
    • Industry News
    • About Data Center News
    • Terms & Conditions
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI & Compute > From hallucinations to hardware: Lessons from a real-world computer vision project gone sideways
AI & Compute

From hallucinations to hardware: Lessons from a real-world computer vision project gone sideways

Last updated: June 28, 2025 9:15 pm
Published June 28, 2025
Share
From hallucinations to hardware: Lessons from a real-world computer vision project gone sideways
SHARE

Be part of the occasion trusted by enterprise leaders for almost twenty years. VB Rework brings collectively the individuals constructing actual enterprise AI technique. Learn more


Pc imaginative and prescient initiatives not often go precisely as deliberate, and this one was no exception. The thought was easy: Construct a mannequin that would take a look at a photograph of a laptop computer and determine any bodily harm — issues like cracked screens, lacking keys or damaged hinges. It appeared like an easy use case for picture fashions and huge language fashions (LLMs), but it surely shortly changed into one thing extra sophisticated.

Alongside the best way, we bumped into points with hallucinations, unreliable outputs and pictures that weren’t even laptops. To unravel these, we ended up making use of an agentic framework in an atypical method — not for job automation, however to enhance the mannequin’s efficiency.

On this submit, we’ll stroll via what we tried, what didn’t work and the way a mix of approaches ultimately helped us construct one thing dependable.

The place we began: Monolithic prompting

Our preliminary strategy was pretty commonplace for a multimodal mannequin. We used a single, giant immediate to cross a picture into an image-capable LLM and requested it to determine seen harm. This monolithic prompting technique is straightforward to implement and works decently for clear, well-defined duties. However real-world knowledge not often performs alongside.

We bumped into three main points early on:

  • Hallucinations: The mannequin would typically invent harm that didn’t exist or mislabel what it was seeing.
  • Junk picture detection: It had no dependable method to flag pictures that weren’t even laptops, like photos of desks, partitions or individuals often slipped via and acquired nonsensical harm studies.
  • Inconsistent accuracy: The mixture of those issues made the mannequin too unreliable for operational use.

This was the purpose when it grew to become clear we would wish to iterate.

First repair: Mixing picture resolutions

One factor we seen was how a lot picture high quality affected the mannequin’s output. Customers uploaded every kind of pictures starting from sharp and high-resolution to blurry. This led us to discuss with research highlighting how picture decision impacts deep studying fashions.

See also  Mistral AI launches Devstral, powerful new open source SWE agent model that runs on laptops

We skilled and examined the mannequin utilizing a mixture of high-and low-resolution pictures. The thought was to make the mannequin extra resilient to the big selection of picture qualities it could encounter in apply. This helped enhance consistency, however the core problems with hallucination and junk picture dealing with endured.

The multimodal detour: Textual content-only LLM goes multimodal

Inspired by latest experiments in combining picture captioning with text-only LLMs — just like the method lined in The Batch, the place captions are generated from pictures after which interpreted by a language mannequin, we determined to offer it a attempt.

Right here’s the way it works:

  • The LLM begins by producing a number of attainable captions for a picture. 
  • One other mannequin, referred to as a multimodal embedding mannequin, checks how effectively every caption matches the picture. On this case, we used SigLIP to attain the similarity between the picture and the textual content.
  • The system retains the highest few captions primarily based on these scores.
  • The LLM makes use of these high captions to put in writing new ones, making an attempt to get nearer to what the picture really exhibits.
  • It repeats this course of till the captions cease bettering, or it hits a set restrict.

Whereas intelligent in idea, this strategy launched new issues for our use case:

  • Persistent hallucinations: The captions themselves typically included imaginary harm, which the LLM then confidently reported.
  • Incomplete protection: Even with a number of captions, some points have been missed completely.
  • Elevated complexity, little profit: The added steps made the system extra sophisticated with out reliably outperforming the earlier setup.

It was an attention-grabbing experiment, however finally not an answer.

A artistic use of agentic frameworks

This was the turning level. Whereas agentic frameworks are often used for orchestrating job flows (suppose brokers coordinating calendar invitations or customer support actions), we questioned if breaking down the picture interpretation job into smaller, specialised brokers may assist.

See also  CoSyn: The open-source tool that’s making GPT-4V-level vision AI accessible to everyone

We constructed an agentic framework structured like this:

  • Orchestrator agent: It checked the picture and recognized which laptop computer parts have been seen (display screen, keyboard, chassis, ports).
  • Part brokers: Devoted brokers inspected every element for particular harm varieties; for instance, one for cracked screens, one other for lacking keys.
  • Junk detection agent: A separate agent flagged whether or not the picture was even a laptop computer within the first place.

This modular, task-driven strategy produced far more exact and explainable outcomes. Hallucinations dropped dramatically, junk pictures have been reliably flagged and every agent’s job was easy and centered sufficient to regulate high quality effectively.

The blind spots: Commerce-offs of an agentic strategy

As efficient as this was, it was not good. Two essential limitations confirmed up:

  • Elevated latency: Working a number of sequential brokers added to the whole inference time.
  • Protection gaps: Brokers might solely detect points they have been explicitly programmed to search for. If a picture confirmed one thing surprising that no agent was tasked with figuring out, it could go unnoticed.

We would have liked a method to steadiness precision with protection.

The hybrid answer: Combining agentic and monolithic approaches

To bridge the gaps, we created a hybrid system:

  1. The agentic framework ran first, dealing with exact detection of identified harm varieties and junk pictures. We restricted the variety of brokers to essentially the most important ones to enhance latency.
  2. Then, a monolithic picture LLM immediate scanned the picture for the rest the brokers might need missed.
  3. Lastly, we fine-tuned the mannequin utilizing a curated set of pictures for high-priority use instances, like steadily reported harm situations, to additional enhance accuracy and reliability.

This mixture gave us the precision and explainability of the agentic setup, the broad protection of monolithic prompting and the arrogance enhance of focused fine-tuning.

See also  How can AI unlock human potential in the supply chain?

What we realized

A number of issues grew to become clear by the point we wrapped up this venture:

  • Agentic frameworks are extra versatile than they get credit score for: Whereas they’re often related to workflow administration, we discovered they might meaningfully enhance mannequin efficiency when utilized in a structured, modular method.
  • Mixing completely different approaches beats counting on only one: The mixture of exact, agent-based detection alongside the broad protection of LLMs, plus a little bit of fine-tuning the place it mattered most, gave us much more dependable outcomes than any single methodology by itself.
  • Visible fashions are susceptible to hallucinations: Even the extra superior setups can bounce to conclusions or see issues that aren’t there. It takes a considerate system design to maintain these errors in examine.
  • Picture high quality selection makes a distinction: Coaching and testing with each clear, high-resolution pictures and on a regular basis, lower-quality ones helped the mannequin keep resilient when confronted with unpredictable, real-world photographs.
  • You want a method to catch junk pictures: A devoted examine for junk or unrelated photos was one of many easiest modifications we made, and it had an outsized impression on total system reliability.

Closing ideas

What began as a easy concept, utilizing an LLM immediate to detect bodily harm in laptop computer pictures, shortly changed into a a lot deeper experiment in combining completely different AI methods to sort out unpredictable, real-world issues. Alongside the best way, we realized that among the most helpful instruments have been ones not initially designed for such a work.

Agentic frameworks, usually seen as workflow utilities, proved surprisingly efficient when repurposed for duties like structured harm detection and picture filtering. With a little bit of creativity, they helped us construct a system that was not simply extra correct, however simpler to know and handle in apply.

Shruti Tiwari is an AI product supervisor at Dell Applied sciences.

Vadiraj Kulkarni is an information scientist at Dell Applied sciences.


Source link
TAGGED: Computer, hallucinations, hardware, Lessons, Project, RealWorld, sideways, vision
Share This Article
Twitter Email Copy Link Print
Previous Article AI agents are hitting a liability wall. Mixus has a plan to overcome it using human overseers on high-risk workflows AI agents are hitting a liability wall. Mixus has a plan to overcome it using human overseers on high-risk workflows
Next Article Retail Resurrection: David's Bridal bets its future on AI after double bankruptcy Retail Resurrection: David’s Bridal bets its future on AI after double bankruptcy
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

DCIM software market to reach $3.63B by 2029

The worldwide Knowledge Heart Infrastructure Administration (DCIM) software program market is on monitor for important…

February 14, 2025

Microsoft and Amazon Capex in Focus Amid Potential AI Pullback

(Bloomberg) -- When the 2 greatest gamers in cloud computing report earnings this week, the…

April 29, 2025

Advancing Liquid Cooling for Future Thermal Needs

AI is radically reshaping knowledge heart infrastructure. As next-gen AI accelerators push rack densities past…

September 30, 2025

Defensive AI and how machine learning strengthens cyber defense

Cyber threats don’t comply with predictable patterns, forcing safety groups to rethink how safety works…

January 23, 2026

Ontology is the real guardrail: How to stop AI agents from misunderstanding your business

Enterprises are investing billions of {dollars} in AI brokers and infrastructure to rework enterprise processes.…

November 30, 2025

You Might Also Like

STL launches Neuralis data centre connectivity suite in the U.S.
AI & Compute

STL launches Neuralis data centre connectivity suite in the U.S.

By saad
What is optical interconnect and why Lightelligence's $10B debut says it matters for AI
AI & Compute

What is optical interconnect and why Lightelligence’s $10B debut says it matters for AI

By saad
IBM launches AI platform Bob to regulate SDLC costs
AI & Compute

IBM launches AI platform Bob to regulate SDLC costs

By saad
The evolution of encoders: From simple models to multimodal AI
AI & Compute

The evolution of encoders: From simple models to multimodal AI

By saad

About Us

Data Center News is your dedicated source for data center infrastructure, AI compute, cloud, and industry news.

Top Categories

  • AI & Compute
  • Cloud Computing
  • Power & Cooling
  • Colocation
  • Security
  • Infrastructure
  • Sustainability
  • Industry News

Useful Links

  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

Find Us on Socials

© 2026 Data Center News. All Rights Reserved.

© 2026 Data Center News. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.