Case Study: Patronus AI achieves state-of-the-art hallucination detection with Databricks

A Databricks Case Study

Preview of the Patronus AI Case Study

Patronus AI x Databricks Training Models for Hallucination Detection

Patronus AI set out to address hallucinations in large language models, especially for RAG applications where responses must stay grounded in source documents. To build more reliable LLM evaluation, Patronus AI used Databricks and its Mosaic AI tools, including LLM Foundry, Composer, and Databricks Model Training, to develop its Lynx hallucination detection model.

Databricks helped Patronus AI train and monitor fine-tuning runs at scale, using 32 NVIDIA H100 GPUs, FSDP, and flash attention for the 70B model. The result was Lynx, which outperformed existing open- and closed-source LLM-as-a-judge evaluators; it beat GPT-4o by nearly 1% in average accuracy across HaluBench tasks and showed a 7.5% advantage on medical question-answering.


View this case study…

Databricks

457 Case Studies