Case Study: Toongether achieves 12× faster compilation and 40% faster inference with Pruna AI

A Pruna AI Case Study

Preview of the Toongether Case Study

Beating torch compile 12× faster compilation time at Toongether.ai

Toongether, a B2C app for AI-powered comics asset generation, was facing significant performance limitations with their existing PyTorch optimization, torch.compile. The customer was experiencing a long 10-minute compilation warm-up time and only marginal efficiency gains of around 5%, which was impractical for their production environment. They turned to vendor Pruna AI to find a solution that would improve productivity and inference speed.

Pruna AI implemented its optimization stack, providing full transparency into the hyperparameters and methods used. The solution delivered a 12x faster compilation time, reducing it from 10 minutes to under a minute, and achieved a 40% inference speed-up. This eliminated the deployment bottleneck, significantly cut latency, and provided a maintainable solution, all accomplished within two days.


View this case study…

Pruna AI

7 Case Studies