NVIDIA and Partners Showcase Cutting-Edge AI Performance and Versatility in MLPerf
NVIDIA and Partners Continued to Deliver the Best Overall AI Training Performance and Most Submissions Across All Benchmarks, with 90% of All Input Coming From the Ecosystem, According to MLPerf Benchmarks Released Today .
The NVIDIA AI platform covered all eight benchmarks of the MLPerf Training 2.0 cycle, highlighting its industry-leading versatility.
No other accelerator has run all the benchmarks, which represent popular AI use cases including speech recognition, natural language processing, recommender systems, object detection, object classification, and more. pictures, etc NVIDIA has done this consistently since its December 2018 submission to the first round of MLPerf, a standard AI benchmark suite.
Main benchmark results, availability
In its fourth consecutive MLPerf training submission, the NVIDIA A100 Tensor Core GPU based on the NVIDIA Ampere architecture continued to excel.
Selene, our in-house AI supercomputer based on the modular NVIDIA DGX SuperPOD and powered by NVIDIA A100 GPUs, our software stack, and NVIDIA InfiniBand networking, achieved the fastest time to train in four out of eight tests.
NVIDIA A100 also maintained its per-chip lead, proving the fastest in six out of eight tests.
A total of 16 partners submitted results for this round using the NVIDIA AI platform. They include ASUS, Baidu, CASIA (Institute of Automation, Chinese Academy of Sciences), Dell Technologies, Fujitsu, GIGABYTE, H3C, Hewlett Packard Enterprise, Inspur, KRAI, Lenovo, MosaicML, Nettrix, and Supermicro.
Most of our OEM partners have submitted results using NVIDIA-certified systems, servers validated by NVIDIA to deliver exceptional performance, manageability, security, and scalability for enterprise deployments.
Many models power real AI applications
An AI application may need to understand a user’s voice request, classify an image, make a recommendation, and provide a response as a voice message.
These tasks require multiple types of AI models to work in sequence, also known as pipeline. Users need to design, train, deploy and optimize these models quickly and flexibly.
That’s why versatility — the ability to run every model in MLPerf and beyond — along with cutting-edge performance is key to bringing real-world AI to production.
Deliver ROI with AI
For customers, their data science and engineering teams are their most valuable assets, and their productivity determines the ROI of the AI infrastructure. Customers should consider the cost of expensive data science teams, which often play a large role in the total cost of deploying AI, as well as the relatively low cost of deploying the AI infrastructure itself .
The productivity of AI researchers depends on the ability to quickly test new ideas, which requires both the versatility to train any model as well as the speed afforded by training those models at the largest scale. That’s why organizations are focusing on overall productivity per dollar to determine the best AI platforms — a more comprehensive view that more accurately represents the true cost of deploying AI.
Moreover, the use of their AI infrastructure relies on its fungibility, or the ability to accelerate the entire AI workflow – from data preparation to training to inference – on a single platform.
With NVIDIA AI, customers can use the same infrastructure for the entire AI pipeline, reallocating it to meet different demands between data preparation, training, and inference, dramatically boosting utilization, resulting in a very high return on investment.
And, as researchers uncover new advances in AI, supporting the latest model innovations is critical to maximizing the useful life of AI infrastructure.
NVIDIA AI delivers the highest productivity per dollar because it’s universal and powerful for every model, scales to any size, and accelerates end-to-end AI, from data preparation to training to inference.
Today’s results provide the latest demonstration of NVIDIA’s broad and deep expertise in AI, demonstrated in every HPC MLPerf training, inference, and cycle to date.
23 times more performance in 3.5 years
In the two years since our first MLPerf submission with A100, our platform has delivered 6 times the performance. Ongoing optimizations to our software stack have helped fuel these gains.
Since the advent of MLPerf, the NVIDIA AI platform has delivered 23x the performance in 3.5 years on the benchmark – the result of comprehensive innovation spanning GPUs, software, and large-scale enhancements. It’s this ongoing commitment to innovation that assures customers that the AI platform they invest in today, and keep in operation for 3-5 years, will continue to evolve to support state of the art.
Additionally, the NVIDIA Hopper architecture, announced in March, promises another giant leap in performance in future MLPerf cycles.
how we did it
Software innovation continues to unlock more performance on the NVIDIA Ampere architecture.
For example, CUDA Graphs, software that helps minimize the overhead of launching jobs that run on many accelerators, is widely used in our submissions. Optimized kernels in our libraries like cuDNN and preprocessing in DALI unlocked additional speedups. We’ve also implemented full-stack improvements to hardware, software, and networking such as NVIDIA Magnum IO and SHARP, which offload some AI functions to the network to further improve performance, especially at scale. .
All the software we use is available in the MLPerf repository, so everyone can get our world-class results. We continuously integrate these optimizations into containers available on NGC, our software hub for GPU applications, and offer NVIDIA AI Enterprise to provide optimized software, fully supported by NVIDIA.
Two years after the launch of the A100, the NVIDIA AI platform continues to deliver the highest performance in MLPerf 2.0 and is the only platform to submit on every benchmark. Our next-generation Hopper architecture promises another giant leap in future MLPerf cycles.
Our platform is universal for every model and framework at any scale, and provides the ability to handle every part of the AI workload. It is available from all major cloud and server manufacturers.