Nebius
14 Case Studies
A Nebius Case Study
Recraft, an AI company building a foundational model for designers, faced significant challenges in training its large 20-billion parameter model. The complexity of managing a massive hardware cluster and debugging low-level, unclear errors from technologies like NCCL and InfiniBand was causing slow training speeds and unpredictable stalls.
Nebius provided its Managed Kubernetes service and dedicated solution architects to address these issues. The Nebius team implemented critical fixes, including configuring GPU Direct RDMA and providing a custom NCCL patch, which made the network four times faster and increased training speed by approximately six times. With Nebius's support, Recraft successfully trained its state-of-the-art model, which later achieved over 50% preference on the PartiPrompts benchmark.