CoreWeave, the AI Hyperscaler™, today announced its MLPerf v5.0 results, setting a new industry benchmark in AI inference with NVIDIA GB200 Grace Blackwell Superchips. Using a CoreWeave instance with NVIDIA GB200, featuring two NVIDIA Grace CPUs and four NVIDIA Blackwell GPUs, CoreWeave delivered 800 tokens per second (TPS) on the Llama 3.1 405B model1—one of the largest open-source models.
"CoreWeave is committed to delivering cutting-edge infrastructure optimized for large-model inference through our purpose-built cloud platform," said Peter Salanki, Chief Technology Officer at CoreWeave. "These benchmark MLPerf results reinforce CoreWeave's position as a preferred cloud provider for leading AI labs and enterprises."
CoreWeave also submitted new results for NVIDIA H200 GPU instances. It achieved 33,000 TPS on the Llama 2 70B model, representing a 40 percent improvement in throughput over NVIDIA H100 instances
By subscribing, you agree to receive email related to content and products. You unsubscribe at any time.
Copyright 2025, AI Reporter America All rights reserved.