Case Study: University of Pécs achieves rapid, cost-effective Hungarian BERT-large training (200 hours, €1,000) with Microsoft Azure and ONNX Runtime

Preview of the University of Pécs Case Study

University of Pécs enables text and speech processing in Hungarian, builds the BERT-large model with just 1,000 euro

The University of Pécs’ Applied Data Science and AI team in Hungary faced a common challenge for smaller languages: there were few off‑the‑shelf tools to process Hungarian text and speech, and commercial providers often overlook languages with limited market size. The team needed a high‑quality, cost‑effective way to build native‑language NLP models without investing in expensive hardware, and partnered with the Research Institute of Linguistics to prepare a clean, 3.5 billion‑word corpus for training.

Using Azure AI services, Azure Machine Learning, ONNX Runtime with DeepSpeed, and Azure Blob Storage, the team trained HILBERT, a Hungarian BERT‑large model, on multi‑GPU clusters—completing training in about 200 hours for under €1,000 (vs. an estimated 1,500 hours without ONNX). The open‑source model enables text and speech processing, intelligent search, NER, Q&A, summarization and other NLP applications, and has already attracted interest from healthcare and government stakeholders.

Open case study document...

University of Pécs

Ádám Feldmann

Head of Data Science and Artifical Intelligence research group

Microsoft Azure

2593 Case Studies

Case Study: University of Pécs achieves rapid, cost-effective Hungarian BERT-large training (200 hours, €1,000) with Microsoft Azure and ONNX Runtime

University of Pécs enables text and speech processing in Hungarian, builds the BERT-large model with just 1,000 euro

University of Pécs

Microsoft Azure

Was it helpful? Rate this case study:

Thank you for your feedback.