Case Study: Cohere improves agentic enterprise model performance with Invisible Technologies

Cohere outperforms competitors in agentic enterprise tasks with Invisible evaluations

Cohere, a leading security-first enterprise AI company, needed to evaluate the performance of its new Command A model on specialized, real-world enterprise tasks. Off-the-shelf benchmarks were insufficient for testing nuanced scenarios in areas like customer service and HR. Cohere turned to its partner, Invisible Technologies, to provide PhD-level experts for scalable, high-quality human evaluations and training data.

Invisible Technologies implemented a comprehensive evaluation solution using expert human annotators. This enabled Cohere to fine-tune its model for 10 languages and rare programming languages, leading to transformative improvements. The results showed Command A matches or outperforms its larger competitors, achieving a 51.7% average win rate in head-to-head evaluations, while being dramatically more efficient and deployable on far fewer GPUs. Invisible's training and evaluations were critical to this commercial success.

View this case study…

Cohere

Wojciech Galuba

Director of Data & Evaluations

Invisible Technologies

16 Case Studies

Case Study: Cohere improves agentic enterprise model performance with Invisible Technologies

Cohere outperforms competitors in agentic enterprise tasks with Invisible evaluations

Cohere

Invisible Technologies

Was it helpful? Rate this case study:

Thank you for your feedback.