Databricks
457 Case Studies
A Databricks Case Study
Patronus AI set out to address hallucinations in large language models, especially for RAG applications where responses must stay grounded in source documents. To build more reliable LLM evaluation, Patronus AI used Databricks and its Mosaic AI tools, including LLM Foundry, Composer, and Databricks Model Training, to develop its Lynx hallucination detection model.
Databricks helped Patronus AI train and monitor fine-tuning runs at scale, using 32 NVIDIA H100 GPUs, FSDP, and flash attention for the 70B model. The result was Lynx, which outperformed existing open- and closed-source LLM-as-a-judge evaluators; it beat GPT-4o by nearly 1% in average accuracy across HaluBench tasks and showed a 7.5% advantage on medical question-answering.