Power Archives - MLCommons https://mlcommons.org/category/power/ Better AI for Everyone Tue, 25 Feb 2025 16:49:36 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://mlcommons.org/wp-content/uploads/2024/10/cropped-favicon-32x32.png Power Archives - MLCommons https://mlcommons.org/category/power/ 32 32 New MLPerf Inference v4.1 Benchmark Results Highlight Rapid Hardware and Software Innovations in Generative AI Systems https://mlcommons.org/2024/08/mlperf-inference-v4-1-results/ Wed, 28 Aug 2024 14:56:00 +0000 http://local.mlcommons/2024/08/mlperf-inference-v4-1-results/ New mixture of experts benchmark tracks emerging architectures for AI models

The post New MLPerf Inference v4.1 Benchmark Results Highlight Rapid Hardware and Software Innovations in Generative AI Systems appeared first on MLCommons.

]]>
Today, MLCommons® announced new results for its industry-standard MLPerf® Inference v4.1 benchmark suite, which delivers machine learning (ML) system performance benchmarking in an architecture-neutral, representative, and reproducible manner. This release includes first-time results for a new benchmark based on a mixture of experts (MoE) model architecture. It also presents new findings on power consumption related to inference execution.

MLPerf Inference v4.1

The MLPerf Inference benchmark suite, which encompasses both data center and edge systems, is designed to measure how quickly hardware systems can run AI and ML models across a variety of deployment scenarios. The open-source and peer-reviewed benchmark suite creates a level playing field for competition that drives innovation, performance, and energy efficiency for the entire industry. It also provides critical technical information for customers who are procuring and tuning AI systems.

The benchmark results for this round demonstrate broad industry participation, and includes the debut of six newly available or soon-to-be-shipped processors:

  • AMD MI300x accelerator (available)
  • AMD EPYC “Turin” CPU (preview)
  • Google “Trillium” TPUv6e accelerator (preview)
  • Intel “Granite Rapids” Xeon CPUs (preview)
  • NVIDIA “Blackwell” B200 accelerator (preview)
  • UntetherAI SpeedAI 240 Slim (available) and SpeedAI 240 (preview) accelerators

MLPerf Inference v4.1 includes 964 performance results from 22 submitting organizations: AMD, ASUSTek, Cisco Systems, Connect Tech Inc, CTuning Foundation, Dell Technologies, Fujitsu, Giga Computing, Google Cloud, Hewlett Packard Enterprise, Intel, Juniper Networks, KRAI, Lenovo, Neutral Magic, NVIDIA, Oracle, Quanta Cloud Technology, Red Hat, Supermicro, Sustainable Metal Cloud, and Untether AI.

“There is now more choice than ever in AI system technologies, and it’s heartening to see providers embracing the need for open, transparent performance benchmarks to help stakeholders evaluate their technologies,” said Mitchelle Rasquinha, MLCommons Inference working group co-chair.

New mixture of experts benchmark

Keeping pace with today’s ever-changing AI landscape, MLPerf Inference v4.1 introduces a new benchmark to the suite: mixture of experts. MoE is an architectural design for AI models that departs from the traditional approach of employing a single, massive model; it instead uses a collection of smaller “expert” models. Inference queries are directed to a subset of the expert models to generate results. Research and industry leaders have found that this approach can yield equivalent accuracy to a single monolithic model but often at a significant performance advantage because only a fraction of the parameters are invoked with each query.

The MoE benchmark is unique and one of the most complex implemented by MLCommons to date. It uses the open-source Mixtral 8x7B model as a reference implementation and performs inferences using datasets covering three independent tasks: general Q&A, solving math problems, and code generation.

“When determining to add a new benchmark, the MLPerf Inference working group observed that many key players in the AI ecosystem are strongly embracing MoE as part of their strategy,” said Miro Hodak, MLCommons Inference working group co-chair. “Building an industry-standard benchmark for measuring system performance on MoE models is essential to address this trend in AI adoption. We’re proud to be the first AI benchmark suite to include MoE tests to fill this critical information gap.”

Benchmarking Power Consumption

The MLPerf Inference v4.1 benchmark includes 31 power consumption test results across three submitted systems covering both datacenter and edge scenarios. These results demonstrate the continued importance of understanding the power requirements for AI systems running inference tasks. as power costs are a substantial portion of the overall expense of operating AI systems.

The Increasing Pace of AI Innovation

Today, we are witnessing an incredible groundswell of technological advances across the AI ecosystem, driven by a wide range of providers including AI pioneers; large, well-established technology companies; and small startups. 

MLCommons would especially like to welcome first-time MLPerf Inference submitters AMD and Sustainable Metal Cloud, as well as Untether AI, which delivered both performance and power efficiency results. 

“It’s encouraging to see the breadth of technical diversity in the systems submitted to the MLPerf Inference benchmark as vendors adopt new techniques for optimizing system performance such as vLLM and sparsity-aware inference,” said David Kanter, Head of MLPerf at MLCommons. “Farther down the technology stack, we were struck by the substantial increase in unique accelerator technologies submitted to the benchmark this time. We are excited to see that systems are now evolving at a much faster pace – at every layer – to meet the needs of AI. We are delighted to be a trusted provider of open, fair, and transparent benchmarks that help stakeholders get the data they need to make sense of the fast pace of AI innovation and drive the industry forward.”

View the Results

To view the results for MLPerf Inference v4.1, please visit the Datacenter and Edge benchmark results pages.

To learn more about selection of the new MoE benchmark read the blog.

About MLCommons

MLCommons is the world leader in building benchmarks for AI. It is an open engineering consortium with a mission to make AI better for everyone through benchmarks and data. The foundation for MLCommons began with the MLPerf benchmarks in 2018, which rapidly scaled as a set of industry metrics to measure machine learning performance and promote transparency of machine learning techniques. In collaboration with its 125+ members, global technology providers, academics, and researchers, MLCommons is focused on collaborative engineering work that builds tools for the entire AI industry through benchmarks and metrics, public datasets, and measurements for AI Safety.

For additional information on MLCommons and details on becoming a member, please visit MLCommons.org or contact participation@mlcommons.org.

The post New MLPerf Inference v4.1 Benchmark Results Highlight Rapid Hardware and Software Innovations in Generative AI Systems appeared first on MLCommons.

]]>
New MLPerf Training Benchmark Results Highlight Hardware and Software Innovations in AI Systems https://mlcommons.org/2024/06/mlperf-training-v4-benchmark-results/ Wed, 12 Jun 2024 14:55:00 +0000 http://local.mlcommons/2024/06/mlperf-training-v4-benchmark-results/ Two new benchmarks added - highlighting language model fine-tuning and classification for graph data

The post New MLPerf Training Benchmark Results Highlight Hardware and Software Innovations in AI Systems appeared first on MLCommons.

]]>
Today, MLCommons® announced new results for the MLPerf® Training v4.0 benchmark suite, including first-time results for two benchmarks: LoRA fine-tuning of LLama 2 70B and GNN.

MLPerf Training v4.0

The MLPerf Training benchmark suite comprises full system tests that stress machine learning (ML) models, software, and hardware for a broad range of applications. The open-source and peer-reviewed benchmark suite provides a level playing field for competition that drives innovation, performance, and energy efficiency for the entire industry.

MLPerf Training v4.0 includes over 205 performance results from 17 submitting organizations: ASUSTeK, Dell, Fujitsu, Giga Computing, Google, HPE, Intel (Habana Labs), Juniper Networks, Lenovo, NVIDIA, NVIDIA + CoreWeave, Oracle, Quanta Cloud Technology, Red Hat + Supermicro, Supermicro, Sustainable Metal Cloud (SMC), and tiny corp.

MLCommons would like to especially welcome first-time MLPerf Training submitters Juniper Networks, Oracle, SMC, and tiny corp. 

Congratulations to first-time participant SMC for submitting the first-ever set of power results for MLPerf Training. These results highlight the impact of SMC’s immersion cooling solutions for data center systems. Our industry-standard power measurement works with MLPerf Training and is the first and only method to accurately measure full system power draw and energy consumption for both cloud and on-premise systems in a trusted and consistent fashion. These metrics are critical for the entire community to understand and improve the overall efficiency for training ML models – which will ultimately reduce the energy use and improve the environmental impact of AI in the coming years.

The Training v4.0 results demonstrate broad industry participation and showcase substantial performance gains in ML systems and software. Compared to the last round of results six months ago, this round brings a 1.8X speed-up in training time for Stable Diffusion. Meanwhile, the best results in the RetinaNet and GPT3 tests are 1.2X and 1.13X faster, respectively, thanks to performance scaling at increased system sizes.

“I’m thrilled by the performance gains we are seeing especially for generative AI,” said David Kanter, executive director of MLCommons. “Together with our first power measurement results for MLPerf Training we are increasing capabilities and reducing the environmental footprint – making AI better for everyone.”

New LLM fine-tuning benchmark

The MLPerf Training v4.0 suite introduces a new benchmark to target fine-tuning a large language model (LLM). An LLM that has been pre-trained on a general corpus of text can be fine-tuned to improve its accuracy on specific tasks, and the computational costs of doing so can differ from pre-training.

A variety of approaches to fine-tuning an LLM at lower computational costs have been introduced over the past few years. The MLCommons Training working group evaluated several of these algorithms and ultimately selected LoRA as the basis for its new benchmark. First introduced in 2021, LoRA freezes original pre-trained parameters in a network layer and injects trainable rank decomposition matrices. Since LoRA fine-tuning trains only a small portion of  the network parameters, this approach dramatically reduces the computational and memory demands compared to pre-training or supervised fine-tuning.

“Fine-tuning LLMs is a notable workload because AI practitioners across many organizations make use of this technology. LoRA was the optimal choice for a state-of-the-art fine-tuning technique; it significantly reduces trainable parameters while maintaining performance comparable to fully fine-tuned models,” said Hiwot Kassa, MLPerf Training working group co-chair.

The new LoRA benchmark uses the Llama 2 70B general LLM as its base. This model is fine-tuned with the Scrolls dataset of government documents with a goal of generating more accurate document summaries. Accuracy is measured using the ROUGE algorithm for evaluating the quality of document summaries. The model uses a context length of 8,192 tokens, keeping pace with the industry’s rapid evolution toward longer context lengths.

The LLM fine-tuning benchmark is already achieving widespread adoption, with over 30 submissions in its initial round.

Learn more about the selection of the LoRA fine-tuning algorithm for the MLPerf Training benchmark in this blog.

New GNN benchmark for classification in graphs

MLPerf Training v4.0 also introduces a graph neural network (GNN) benchmark for measuring the performance of ML systems on problems that are represented by large graph-structured data, such as those used to implement literary databases, drug discovery applications, fraud detection systems, social networks, and recommender systems. 

“Training on large graph-structured datasets poses unique system challenges, demanding optimizations for sparse operations and inter-node communication. We hope the addition of a GNN based benchmark in MLPerf Training broadens the challenges offered by the suite and spurs software and hardware innovations for this critical class of workload,” said Ritika Borkar MLPerf Training working group co-chair.

The MLPerf Training GNN benchmark is used for a node classification task where the goal is to predict a label for each node in a graph. The benchmark uses an R-GAT model and is trained on the 2.2 terabyte IGBH full dataset, the largest available open-source graph dataset with 547 million nodes and 5.8 billion edges. The IGBH database is a graph showing the relationships between academic authors, papers, and institutes. Each node in the graph can be classified into one of 2,983 classes.

The MLPerf Training team recently submitted MLPerf R-Gat to the Illinois Graph Benchmark (IGB) leaderboard which helps the industry keep track of the state of the art for GNN models, encouraging reproducibility. We are pleased to announce that their submission is currently #1 with a 72% test accuracy.  

Learn more about the selection of the GNN benchmark in this blog.

View the results

To view the full results for MLPerf Training v4.0 and find additional information about the benchmarks, please visit the Training benchmark page.

About MLCommons

MLCommons is the world leader in building benchmarks for AI. It is an open engineering consortium with a mission to make AI better for everyone through benchmarks and data. The foundation for MLCommons began with the MLPerf benchmarks in 2018, which rapidly scaled as a set of industry metrics to measure machine learning performance and promote transparency of machine learning techniques. In collaboration with its 125+ members, global technology providers, academics, and researchers, MLCommons is focused on collaborative engineering work that builds tools for the entire AI industry through benchmarks and metrics, public datasets, and measurements for AI Safety.

For additional information on MLCommons and details on becoming a member or affiliate, please visit MLCommons.org or contact participation@mlcommons.org.

The post New MLPerf Training Benchmark Results Highlight Hardware and Software Innovations in AI Systems appeared first on MLCommons.

]]>
MLPerf Inference v1.0 Results with First Power Measurements https://mlcommons.org/2021/04/mlperf-inference-v1-0-results-with-first-power-measurements/ Wed, 21 Apr 2021 08:08:00 +0000 http://local.mlcommons/2021/04/mlperf-inference-v1-0-results-with-first-power-measurements/ The latest benchmark includes 1,994 performance and 862 power efficiency results for leading ML inference systems

The post MLPerf Inference v1.0 Results with First Power Measurements appeared first on MLCommons.

]]>
Today, MLCommons®, an open engineering consortium, released results for MLPerf™ Inference v1.0, the organization’s machine learning inference performance benchmark suite. In its third round of submissions, the results measured how quickly a trained neural network can process new data for a wide range of applications on a variety of form factors and for the first-time, a system power measurement methodology.

MLPerf Inference v1.0 is a cornerstone of MLCommons’ initiative to provide benchmarks and metrics that level the industry playing field through the comparison of ML systems, software, and solutions. The latest benchmark round received submissions from 17 organizations and released 1,994 peer-reviewed results for machine learning systems spanning from edge devices to data center servers. To view the results, please visit https://www.mlcommons.org/en/inference-datacenter-10/ and https://www.mlcommons.org/en/inference-edge-10/.

MLPerf Power Measurement – A new metric to understand system efficiency

The MLPerf Inference v1.0 suite introduces new power measurement techniques, tools, and metrics to complement performance benchmarks. These new metrics enable reporting and comparing energy consumption, performance and power for submitting systems. In this round, the new power measurement was optional for submission with 864 results released. The power measurement was developed in partnership with Standard Performance Evaluation Corp. (SPEC), the leading provider of standardized benchmarks and tools for evaluating the performance of today’s computing systems. MLPerf adopted and built on the industry-standard SPEC PTDaemon power measurement Interface.

“As we look at the accelerating adoption of machine learning, artificial intelligence, and the anticipated scale of ML projects, the ability to measure power consumption in ML environments will be critical for sustainability goals all around the world,” said Klaus-Dieter Lange, SPECpower Committee Chair. “MLCommons developed MLPerf in the best tradition of vendor-neutral standardized benchmarks, and SPEC was very excited to be a partner in their development process. We look forward to widespread adoption of this extremely valuable benchmark.”

“We are pleased to see the ongoing engagement from the machine learning community with MLPerf,” said Sachin Idgunji, Chair of the MLPerf Power Working Group. “The addition of a power methodology will highlight energy efficiency and bring a valuable, new level of transparency to the industry.”

“We wanted to add a metric that could showcase the power and energy cost from different levels of ML performance across workloads,” said Arun Tejusve, Chair of the MLPerf Power Working Group. “MLPerf Power v1.0 is a monumental step toward this goal, and will help drive the creation of more energy-efficient algorithms and systems across the industry.”

Submitters this round include: Alibaba, Centaur Technology, Dell Technologies, EdgeCortix, Fujitsu, Gigabyte, HPE, Inspur, Intel, Lenovo, Krai, Moblint, Neuchips, NVIDIA, Qualcomm Technologies, Supermicro, and Xilinx. Additional information about the Inference v1.0 benchmarks is available at www.mlcommons.org/en/inference-datacenter-10/ and www.mlcommons.org/en/inference-edge-10/.

About MLCommons

MLCommons is an open engineering consortium with a mission to accelerate machine learning innovation, raise all boats and increase its positive impact on society. The foundation for MLCommons began with the MLPerf benchmark in 2018, which rapidly scaled as a set of industry metrics to measure machine learning performance and promote transparency of machine learning techniques. In collaboration with its 50+ founding member partners – global technology providers, academics and researchers, MLCommons is focused on collaborative engineering work that builds tools for the entire machine learning industry through benchmarks and metrics, public datasets and best practices.

For additional information on MLCommons and details on becoming a member of the organization, please visit http://mlcommons.org/ or contact membership@mlcommons.org.

Press Contact:
press@mlcommons.org

PTDaemon® and SPEC® are trademarks of the Standard Performance Evaluation Corporation. All other product and company names herein may be trademarks of their registered owners.

The post MLPerf Inference v1.0 Results with First Power Measurements appeared first on MLCommons.

]]>