Saturday, 21 Mar 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > Will updating your AI agents help or hamper their performance? Raindrop's new tool Experiments tells you
AI

Will updating your AI agents help or hamper their performance? Raindrop's new tool Experiments tells you

Last updated: October 11, 2025 5:02 am
Published October 11, 2025
Share
SHARE

It looks like virtually each week for the final two years since ChatGPT launched, new massive language fashions (LLMs) from rival labs or from OpenAI itself have been launched. Enterprises are onerous pressed to maintain up with the huge tempo of change, not to mention perceive adapt to it — which of those new fashions ought to they undertake, if any, to energy their workflows and the customized AI brokers they’re constructing to hold them out?

Assist has arrived: AI functions observability startup Raindrop has launched Experiments, a brand new analytics function that the corporate describes as the primary A/B testing suite designed particularly for enterprise AI brokers — permitting firms to see and examine how updating brokers to new underlying fashions, or altering their directions and power entry, will influence their efficiency with actual finish customers.

The discharge extends Raindrop’s current observability instruments, giving builders and groups a solution to see how their brokers behave and evolve in real-world circumstances.

With Experiments, groups can monitor how adjustments — similar to a brand new instrument, immediate, mannequin replace, or full pipeline refactor — have an effect on AI efficiency throughout hundreds of thousands of person interactions. The brand new function is out there now for customers on Raindrop’s Professional subscription plan ($350 month-to-month) at raindrop.ai.

A Knowledge-Pushed Lens on Agent Improvement

Raindrop co-founder and chief expertise officer Ben Hylak famous in a product announcement video (above) that Experiments helps groups see “how actually something modified,” together with instrument utilization, person intents, and situation charges, and to discover variations by demographic elements similar to language. The objective is to make mannequin iteration extra clear and measurable.

The Experiments interface presents outcomes visually, exhibiting when an experiment performs higher or worse than its baseline. Will increase in unfavorable alerts would possibly point out greater activity failure or partial code output, whereas enhancements in constructive alerts might replicate extra full responses or higher person experiences.

See also  Freed says 20K clinicians use its AI scribe, but competition looms

By making this knowledge straightforward to interpret, Raindrop encourages AI groups to method agent iteration with the identical rigor as fashionable software program deployment—monitoring outcomes, sharing insights, and addressing regressions earlier than they compound.

Background: From AI Observability to Experimentation

Raindrop’s launch of Experiments builds on the corporate’s basis as one of many first AI-native observability platforms, designed to assist enterprises monitor and perceive how their generative AI programs behave in manufacturing.

As VentureBeat reported earlier this yr, the corporate — initially referred to as Daybreak AI — emerged to handle what Hylak, a former Apple human interface designer, referred to as the “black field drawback” of AI efficiency, serving to groups catch failures “as they occur and clarify to enterprises what went incorrect and why.”

On the time, Hylak described how “AI merchandise fail always—in methods each hilarious and terrifying,” noting that in contrast to conventional software program, which throws clear exceptions, “AI merchandise fail silently.” Raindrop’s unique platform targeted on detecting these silent failures by analyzing alerts similar to person suggestions, activity failures, refusals, and different conversational anomalies throughout hundreds of thousands of day by day occasions.

The corporate’s co-founders— Hylak, Alexis Gauba, and Zubin Singh Koticha — constructed Raindrop after encountering firsthand the issue of debugging AI programs in manufacturing.

“We began by constructing AI merchandise, not infrastructure,” Hylak instructed VentureBeat. “However fairly rapidly, we noticed that to develop something critical, we wanted tooling to grasp AI conduct—and that tooling didn’t exist.”

With Experiments, Raindrop extends that very same mission from detecting failures to measuring enhancements. The brand new instrument transforms observability knowledge into actionable comparisons, letting enterprises take a look at whether or not adjustments to their fashions, prompts, or pipelines truly make their AI brokers higher—or simply totally different.

Fixing the “Evals Move, Brokers Fail” Downside

Conventional analysis frameworks, whereas helpful for benchmarking, hardly ever seize the unpredictable conduct of AI brokers working in dynamic environments.

See also  Simulation Theory raises $2M so computers stop wasting compute resources

As Raindrop co-founder Alexis Gauba defined in her LinkedIn announcement, “Conventional evals don’t actually reply this query. They’re nice unit exams, however you may’t predict your person’s actions and your agent is working for hours, calling lots of of instruments.”

Gauba stated the corporate constantly heard a typical frustration from groups: “Evals go, brokers fail.”

Experiments is supposed to shut that hole by exhibiting what truly adjustments when builders ship updates to their programs.

The instrument permits side-by-side comparisons of fashions, instruments, intents, or properties, surfacing measurable variations in conduct and efficiency.

Designed for Actual-World AI Habits

Within the announcement video, Raindrop described Experiments as a solution to “examine something and measure how your agent’s conduct truly modified in manufacturing throughout hundreds of thousands of actual interactions.”

The platform helps customers spot points similar to activity failure spikes, forgetting, or new instruments that set off surprising errors.

It can be utilized in reverse — ranging from a recognized drawback, similar to an “agent caught in a loop,” and tracing again to which mannequin, instrument, or flag is driving it.

From there, builders can dive into detailed traces to seek out the basis trigger and ship a repair rapidly.

Every experiment gives a visible breakdown of metrics like instrument utilization frequency, error charges, dialog length, and response size.

Customers can click on on any comparability to entry the underlying occasion knowledge, giving them a transparent view of how agent conduct modified over time. Shared hyperlinks make it straightforward to collaborate with teammates or report findings.

Integration, Scalability, and Accuracy

In keeping with Hylak, Experiments integrates instantly with “the function flag platforms firms know and love (like Statsig!)” and is designed to work seamlessly with current telemetry and analytics pipelines.

For firms with out these integrations, it may nonetheless examine efficiency over time—similar to yesterday versus in the present day—with out further setup.

Hylak stated groups sometimes want round 2,000 customers per day to provide statistically significant outcomes.

See also  What's the best interface for gen AI? It all depends on the use case

To make sure the accuracy of comparisons, Experiments displays for pattern dimension adequacy and alerts customers if a take a look at lacks sufficient knowledge to attract legitimate conclusions.

“We obsess over ensuring metrics like Job Failure and Person Frustration are metrics that you simply’d get up an on-call engineer for,” Hylak defined. He added that groups can drill into the precise conversations or occasions that drive these metrics, making certain transparency behind each combination quantity.

Safety and Knowledge Safety

Raindrop operates as a cloud-hosted platform but in addition gives on-premise personally identifiable info (PII) redaction for enterprises that want further management.

Hylak stated the corporate is SOC 2 compliant and has launched a PII Guard function that makes use of AI to mechanically take away delicate info from saved knowledge. “We take defending buyer knowledge very critically,” he emphasised.

Pricing and Plans

Experiments is a part of Raindrop’s Professional plan, which prices $350 per 30 days or $0.0007 per interplay. The Professional tier additionally consists of deep analysis instruments, subject clustering, customized situation monitoring, and semantic search capabilities.

Raindrop’s Starter plan — $65 per 30 days or $0.001 per interplay — gives core analytics together with situation detection, person suggestions alerts, Slack alerts, and person monitoring. Each plans include a 14-day free trial.

Bigger organizations can go for an Enterprise plan with customized pricing and superior options like SSO login, customized alerts, integrations, edge-PII redaction, and precedence assist.

Steady Enchancment for AI Methods

With Experiments, Raindrop positions itself on the intersection of AI analytics and software program observability. Its deal with “measure reality,” as said within the product video, displays a broader push throughout the business towards accountability and transparency in AI operations.

Moderately than relying solely on offline benchmarks, Raindrop’s method emphasizes actual person knowledge and contextual understanding. The corporate hopes this can permit AI builders to maneuver quicker, determine root causes sooner, and ship better-performing fashions with confidence.

Source link

Share This Article
Twitter Email Copy Link Print
Previous Article Biohybrid leaf mimics photosynthesis to turn CO₂ and sunlight into useful chemicals Biohybrid leaf mimics photosynthesis to turn CO₂ and sunlight into useful chemicals
Next Article it leader it professional engineer technician in network server room data center by antonio diaz sh Beyond Ping and SNMP: Building an AI-ready observability framework for enterprise networks
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Mercury Power celebrates 10 year milestone

The business, which supports the data centre industry, is headquartered in Redhill, Surrey and has…

January 22, 2024

Vertiv and Ezditek forge alliance for AI-ready data centres in Saudi Arabia

Vertiv and Ezditek have joined forces to carry cutting-edge, AI-ready knowledge centre options to Saudi…

November 1, 2025

Tencent introduces ‘Hunyuan3D 2.0’ AI that speeds up 3D design from days to seconds

Be a part of our every day and weekly newsletters for the most recent updates…

January 21, 2025

Nammi Therapeutics Completes Series B Funding

Nammi Therapeutics, Inc., a Los Angeles, CA-based immuno-oncology firm, acquired $1M in funding dedication by the…

August 6, 2024

Snowflake teams up with Mistral AI to integrate language models via Snowflake Cortex

Snowflake, a participant within the information cloud area, and Mistral AI, a supplier of AI…

March 7, 2024

You Might Also Like

NVIDIA Agent Toolkit Gives Enterprises a Framework to Deploy AI Agents at Scale
AI

NVIDIA Agent Toolkit Gives Enterprises a Framework to Deploy AI Agents at Scale

By saad
Visa prepares payment systems for AI agent-initiated transactions
AI

Visa prepares payment systems for AI agent-initiated transactions

By saad
For effective AI, insurance needs to get its data house in order
AI

For effective AI, insurance needs to get its data house in order

By saad
Mastercard keeps tabs on fraud with new foundation model
AI

Mastercard keeps tabs on fraud with new foundation model

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.