Training a substantial language model can be an arduous process, lasting weeks, months, or even years, depending on the hardware used. Recognizing the impracticality of such extended durations for businesses, NVIDIA introduced the latest version of its Eos supercomputer on Wednesday. This iteration, fueled by over 10,000 H100 Tensor Core GPUs, achieves remarkable efficiency, training a 175 billion-parameter GPT-3 model on 1 billion tokens in under four minutes—three times faster than the previous benchmark set by NVIDIA just six months ago on the MLPerf AI industry standard.
Eos harnesses a formidable computational capacity, employing 10,752 GPUs interconnected through NVIDIA’s Infiniband networking, moving a petabyte of data per second. With 860 terabytes of high bandwidth memory, delivering 40 exaflops of AI processing power, Eos comprises 1344 nodes, and individual servers available for monthly rent at approximately $37,000. This rental option allows companies to enhance their AI capabilities without the need for extensive infrastructure development.
NVIDIA achieves groundbreaking results in nine benchmark tests, setting six records, including a notable 3.9-minute duration for GPT-3 training. It’s crucial to highlight that the GPT-3 version used in the benchmarking, with 175 billion parameters, is a scaled-down iteration, as the full-sized model with 3.7 trillion parameters is impractical for benchmarking due to its size and complexity. For instance, training the larger GPT-3 on the older A100 system with 512 GPUs would take 18 months, while Eos accomplishes this task in just eight days.
Instead, NVIDIA and MLCommons, the entity overseeing the MLPerf standard, opted for a more streamlined version utilizing 1 billion tokens—the smallest unit of data understood by generative AI systems. This approach employed a GPT-3 iteration with an identical number of potential switches to flip as the full-size model with 175 billion parameters, presenting a more manageable dataset of a billion tokens compared to the impractical 3.7 trillion.
The remarkable enhancement in performance stems from the use of 10,752 H100 GPUs in this recent test, a significant increase from the 3,584 Hopper GPUs utilized in June’s benchmarking trials. Despite tripling the number of GPUs, NVIDIA emphasizes that it maintained a 2.8x scaling in performance, translating to a 93 percent efficiency rate, achieved through extensive software optimization.
“Scaling is a wonderful thing,” Salvator remarked. “But with scaling, you’re talking about more infrastructure, which can also mean things like more cost. An efficiently scaled increase means users are making the best use of their infrastructure, allowing them to complete their work as swiftly as possible and extract maximum value from their organizational investment.”
NVIDIA wasn’t the sole contributor to these advancements. Microsoft’s Azure team also presented a similar 10,752 H100 GPU system in this benchmarking round, achieving results within a two percent margin of NVIDIA’s performance.
The Azure team has achieved a performance level comparable to the Eos supercomputer,” shared Dave Salvator, Director of Accelerated Computing Products at NVIDIA, during a prebrief on Tuesday. He emphasized that Azure’s accomplishment utilizes commercially available instances, rather than an isolated laboratory system, making it accessible for actual customers to benefit from.
NVIDIA intends to deploy these enhanced computing capabilities across various domains, including ongoing endeavors in foundational model development, AI-assisted GPU design, neural rendering, multimodal generative AI, and autonomous driving systems.
“Any reputable benchmark aiming to maintain market relevance must consistently update the workloads to reflect the evolving market it serves,” Salvator commented, highlighting MLCommons’ recent addition of a benchmark for assessing model performance in Stable Diffusion tasks. He added, “This is another exciting area of generative AI where we’re seeing all sorts of things being created—from programming code to discovering protein chains.
These benchmarks hold significance because, as Salvator highlights, the current landscape of generative AI marketing resembles a “Wild West.” The absence of rigorous oversight and regulation leads to uncertainty regarding the parameters behind certain AI performance claims. MLPerf addresses this by offering professional assurance that benchmark numbers generated through its tests undergo review, vetting, and sometimes even scrutiny by other consortium members. According to Salvator, it’s this peer-review process that adds credibility to the results.
NVIDIA has consistently directed its attention towards enhancing its AI capabilities and applications in recent months. CEO Jensen Huang likened this focus to the “iPhone moment for AI” during his GTC keynote in March. During that period, the company introduced its DGX cloud system, which allocates fragments of the supercomputer’s processing power—specifically with either eight H100 or A100 chips running 60GB of VRAM (640 in total). The supercomputing portfolio was further expanded with the introduction of DGX GH200 at Computex in May.