Runloop.ai and Fermatix.ai Took Aim at AI Agent Standards with Launch of Custom Benchmarks

NEWS 02 October 2025

With custom benchmarks, the two companies pave the way for smarter, safer, and more scalable AI agent deployment.

Runloop.ai and Fermatix.ai Took Aim at AI Agent Standards with Launch of Custom Benchmarks

Image source: Public Domain

Runloop.ai, the leading enterprise infrastructure platform for AI agents, announced the launch of its Custom Benchmarks product. The new offering enables organizations to create highly specialized, private benchmarks that accurately measure and refine AI agents on their unique, proprietary codebases and business logic. To highlight the product's broad applications and strategic value, Runloop.ai is collaborating with Fermatix.ai, a specialist in full-cycle data generation, on a landmark pilot program.

The explosion of AI agents has created a critical need for rigorous and relevant evaluation and functional training. While public benchmarks are crucial for general model evaluation, they often fail to capture the specific requirements of AI agents or the validation needs of enterprises. Runloop.ai's Custom Benchmarks solve this problem by providing a secure, scalable platform for companies to build benchmarks that test against their own internal business logic, tech stacks, and performance metrics.

Key features of Runloop.ai's Custom Benchmarks product include:

Private benchmarking: Securely test AI agents on proprietary code without exposing intellectual property.
Accurate performance evaluation: Measure agent effectiveness in real-world, business-specific conditions.
Scalable infrastructure: A reliable and isolated environment for running thousands of tests simultaneously.
Strategic model refinement: Obtain data for targeted improvement and retraining of AI agents for specific tasks.

"As AI agents move from prototypes to production, the benchmarks we use to evaluate them must evolve from generic tests to strategic assets," said Jonathan Wall, CEO of Runloop.ai. "Our new Custom Benchmarks product empowers enterprises to define what 'good' looks like for their unique business, enabling them to fine-tune and trust their AI agents in real-world scenarios. The pilot with Fermatix.ai is the perfect example of this in action, demonstrating the value of this approach in the most demanding environments."

Fermatix.ai , a company known for creating expert-level training data tailored to industry-critical tasks and highly specialized domains, with annotators who are practicing industry experts, brings the perfect expertise for this pilot. By leveraging Runloop.ai's infrastructure, Fermatix.ai is strategically expanding its capabilities to offer custom, in-house verification for its clients. The collaboration allows Fermatix.ai to move beyond its current offerings and provide a new level of assurance by creating benchmarks tailored to specific enterprise needs. This pilot program will demonstrate how Fermatix.ai's expertise in data engineering and expert-level annotation can be applied to create high-fidelity, multilingual benchmarks on Runloop.ai's platform.

"At Fermatix.ai, we've built our reputation on creating expert-level training data with practicing industry professionals as annotators," said Sergey Anchutin, CEO and Founder of Fermatix.ai. "This partnership with Runloop.ai represents a strategic evolution—moving beyond one-time data labeling to creating reusable benchmarks that deliver ongoing value to our clients. By leveraging our domain expertise and Runloop's infrastructure, we're not just providing data anymore; we're building the testing standards that will define how enterprises evaluate their AI agents across industry-critical tasks."

The Custom Benchmarks product is now available to all Runloop.ai Pro clients, with early results from the Fermatix.ai pilot program expected to be shared in the coming months.

Runloop.ai and Fermatix.ai Took Aim at AI Agent Standards with Launch of Custom Benchmarks

With custom benchmarks, the two companies pave the way for smarter, safer, and more scalable AI agent deployment.

United States of AMERICA

West Monroe Introduced WestMonroe.ai to Deliver Public Access to AI Business Strategy Agents

Kong Rolled Out Ascent to Help Enterprises Transition From Legacy APIs to AI-Ready Architectures

New Relic Launched AI Coding Observability to Enhance Oversight of AI-Powered Development Tool

Expensify Introduced MCP to Advance AI-Driven Expense Management and Automation

SPACInsider Strengthened Financial Data Platform With AI-Powered SPAC Database Access

Cardo AI Strengthened Asset Finance Intelligence Platform With Launch of Cash Flow Modeling Capability

Pega Introduced AI-Powered Modernization Capability on AWS for Legacy Application Transformation

EUROPE

Tempus Introduced Enhanced Lens Platform to Advance AI-Powered Oncology Development

Autobrains and Uber Launched Strategic Robotaxi Initiative Powered by NVIDIA DRIVE Hyperion

Nucs AI Partnered to Advance AI-Based Response Prediction for Therapeutic Radioconjugates

AppClose® Expanded Co-Parenting Support With Spanish Language Access and AI Communication Assistance

bizZone Introduced Autrinity, an AI-Powered Association Management System Built for the Future

Appriss Retail Expanded Retail Protection Capabilities With Launch of Agentic AI Platform Sidekick

ASIA

SPACInsider Strengthened Financial Data Platform With AI-Powered SPAC Database Access

Hitachi Strengthened AI Transformation Efforts Through Strategic Collaboration With Intel

OpsGuru Debuted Agentic Delivery to Modernize Enterprise AI Adoption and Service Delivery

Outreach Strengthened Revenue Intelligence Platform With Agentic AI and MCP Integration

Utopai Studios Announced PAI 2.0 to Transform AI-Powered Storytelling and Media Production

Runloop.ai and Fermatix.ai Took Aim at AI Agent Standards with Launch of Custom Benchmarks

United States of AMERICA

EUROPE

ASIA

Keep Up to Date with the Latest Artificial Intelligence Industry NEWS & Insights