Microsoft Azure
2593 Case Studies
A Microsoft Azure Case Study
The University of Pécs’ Applied Data Science and AI team in Hungary faced a common challenge for smaller languages: there were few off‑the‑shelf tools to process Hungarian text and speech, and commercial providers often overlook languages with limited market size. The team needed a high‑quality, cost‑effective way to build native‑language NLP models without investing in expensive hardware, and partnered with the Research Institute of Linguistics to prepare a clean, 3.5 billion‑word corpus for training.
Using Azure AI services, Azure Machine Learning, ONNX Runtime with DeepSpeed, and Azure Blob Storage, the team trained HILBERT, a Hungarian BERT‑large model, on multi‑GPU clusters—completing training in about 200 hours for under €1,000 (vs. an estimated 1,500 hours without ONNX). The open‑source model enables text and speech processing, intelligent search, NER, Q&A, summarization and other NLP applications, and has already attracted interest from healthcare and government stakeholders.
Ádám Feldmann
Head of Data Science and Artifical Intelligence research group