Galileo, a number one developer of generative AI for enterprise functions, has launched its newest Hallucination Index.
The analysis framework – which focuses on Retrieval Augmented Technology (RAG) – assessed 22 outstanding Gen AI LLMs from main gamers together with OpenAI, Anthropic, Google, and Meta. This yr’s index expanded considerably, including 11 new fashions to mirror the fast development in each open- and closed-source LLMs over the previous eight months.
Vikram Chatterji, CEO and Co-founder of Galileo, mentioned: “In at the moment’s quickly evolving AI panorama, builders and enterprises face a vital problem: methods to harness the facility of generative AI whereas balancing value, accuracy, and reliability. Present benchmarks are sometimes based mostly on tutorial use-cases, somewhat than real-world functions.”
The index employed Galileo’s proprietary analysis metric, context adherence, to verify for output inaccuracies throughout varied enter lengths, starting from 1,000 to 100,000 tokens. This method goals to assist enterprises make knowledgeable choices about balancing worth and efficiency of their AI implementations.
Key findings from the index embrace:
- Anthropic’s Claude 3.5 Sonnet emerged as one of the best total performing mannequin, persistently scoring near-perfect throughout quick, medium, and lengthy context eventualities.
- Google’s Gemini 1.5 Flash ranked as one of the best performing mannequin by way of cost-effectiveness, delivering robust efficiency throughout all duties.
- Alibaba’s Qwen2-72B-Instruct stood out as the highest open-source mannequin, significantly excelling in brief and medium context eventualities.
The index additionally highlighted a number of traits within the LLM panorama:
- Open-source fashions are quickly closing the hole with their closed-source counterparts, providing improved hallucination efficiency at decrease prices.
- Present RAG LLMs reveal important enhancements in dealing with prolonged context lengths with out sacrificing high quality or accuracy.
- Smaller fashions generally outperform bigger ones, suggesting that environment friendly design will be extra essential than scale.
- The emergence of robust performers from outdoors the US, similar to Mistral’s Mistral-large and Alibaba’s qwen2-72b-instruct, signifies a rising international competitors in LLM growth.
Whereas closed-source fashions like Claude 3.5 Sonnet and Gemini 1.5 Flash preserve their lead attributable to proprietary coaching information, the index reveals that the panorama is evolving quickly. Google’s efficiency was significantly noteworthy, with its open-source Gemma-7b mannequin performing poorly whereas its closed-source Gemini 1.5 Flash persistently ranked close to the highest.
Because the AI trade continues to grapple with hallucinations as a serious hurdle to production-ready Gen AI merchandise, Galileo’s Hallucination Index gives precious insights for enterprises seeking to undertake the proper mannequin for his or her particular wants and finances constraints.
See additionally: Senators probe OpenAI on security and employment practices

Wish to be taught extra about AI and large information from trade leaders? Take a look at AI & Big Data Expo happening in Amsterdam, California, and London. The great occasion is co-located with different main occasions together with Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Discover different upcoming enterprise expertise occasions and webinars powered by TechForge here.
The publish Anthropic to Google: Who’s successful towards AI hallucinations? appeared first on AI Information.