MLPerf Storage Archives - MLCommons https://mlcommons.org/category/mlperf-storage/ Better AI for Everyone Tue, 25 Feb 2025 16:56:52 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://mlcommons.org/wp-content/uploads/2024/10/cropped-favicon-32x32.png MLPerf Storage Archives - MLCommons https://mlcommons.org/category/mlperf-storage/ 32 32 New MLPerf Storage v1.0 Benchmark Results Show Storage Systems Play a Critical Role in AI Model Training Performance https://mlcommons.org/2024/09/mlperf-storage-v1-0-benchmark-results/ Wed, 25 Sep 2024 14:55:00 +0000 http://local.mlcommons/2024/09/mlperf-storage-v1-0-benchmark-results/ Storage system providers invent new, creative solutions to keep pace with faster accelerators, but the challenge continues to escalate

The post New MLPerf Storage v1.0 Benchmark Results Show Storage Systems Play a Critical Role in AI Model Training Performance appeared first on MLCommons.

]]>
Storage system providers showcase innovative solutions to keep pace with faster accelerators.

Today, MLCommons® announced results for its industry-standard MLPerf® Storage v1.0 benchmark suite, which is designed to measure the performance of storage systems for machine learning (ML) workloads in an architecture-neutral, representative, and reproducible manner. The results show that as accelerator technology has advanced and datasets continue to increase in size, ML system providers must ensure that their storage solutions keep up with the compute needs. This is a time of rapid change in ML systems, where progress in one technology area drives new demands in other areas. High-performance AI training now requires storage systems that are both large-scale and high-speed, lest access to stored data becomes the bottleneck in the entire system. With the v1.0 release of MLPerf Storage benchmark results, it is clear that storage system providers are innovating to meet that challenge.

Version 1.0 storage benchmark breaks new ground

The MLPerf Storage benchmark is the first and only open, transparent benchmark to measure storage performance in a diverse set of ML training scenarios. It emulates the storage demands across several scenarios and system configurations covering a range of accelerators, models, and workloads. By simulating the accelerators’ “think time” the benchmark can generate accurate storage patterns without the need to run the actual training, making it more accessible to all. The benchmark focuses the test on a given storage system’s ability to keep pace, as it requires the simulated accelerators to maintain a required level of utilization.

Three models are included in the benchmark to ensure diverse patterns of AI training are tested: 3D-UNet, Resnet50, and CosmoFlow. These workloads offer a variety of sample sizes, ranging from hundreds of megabytes to hundreds of kilobytes, as well as wide-ranging simulated “think times” from a few milliseconds to a few hundred milliseconds. 

The benchmark emulates NVIDIA A100 and H100 models as representatives of the currently available accelerator technologies. The H100 accelerator reduces the per-batch computation time for the 3D-UNet workload by 76% compared to the earlier V100 accelerator in the v0.5 round, turning what was typically a bandwidth-sensitive workload into much more of a latency-sensitive workload. 

In addition, MLPerf Storage v1.0 includes support for distributed training. Distributed training is an important scenario for the benchmark because it represents a common real-world practice for faster training of models with large datasets, and it presents specific challenges for a storage system not only in delivering higher throughput but also in serving multiple training nodes simultaneously.

V1.0 benchmark results show performance improvement in storage technology for ML systems

The broad scope of workloads submitted to the benchmark reflect the wide range and diversity of different storage systems and architectures. This is testament to how important ML workloads are to all types of storage solutions, and demonstrates the active innovation happening in this space.

“The MLPerf Storage v1.0 results demonstrate a renewal in storage technology design,” said Oana Balmau, MLPerf Storage working group co-chair. “At the moment, there doesn’t appear to be a consensus ‘best of breed’ technical architecture for storage in ML systems: the submissions we received for the v1.0 benchmark took a wide range of unique and creative approaches to providing high-speed, high-scale storage.”

The results in the distributed training scenario show the delicate balance needed between the number of hosts, the number of simulated accelerators per host, and the storage system in order to serve all accelerators at the required utilization. Adding more nodes and accelerators to serve ever-larger training datasets increases the throughput demands. Distributed training adds another twist, because historically different technologies – with different throughputs and latencies – have been used for moving data within a node and between nodes. The maximum number of accelerators a single node can support may not be limited by the node’s own hardware but instead by the ability to move enough data quickly to that node in a distributed environment (up to 2.7 GiB/s per emulated accelerator). Storage system architects now have few design tradeoffs available to them: the systems must be high-throughput and low-latency, to keep a large-scale AI training system running at peak load.

“As we anticipated, the new, faster accelerator hardware significantly raised the bar for storage, making it clear that storage access performance has become a gating factor for overall training speed,” said Curtis Anderson, MLPerf Storage working group co-chair. “To prevent expensive accelerators from sitting idle, system architects are moving to the fastest storage they can procure – and storage providers are innovating in response.” 

MLPerf Storage v1.0

The MLPerf Storage benchmark was created through a collaborative engineering process across more than a dozen leading storage solution providers and academic research groups. The open-source and peer-reviewed benchmark suite offers a level playing field for competition that drives innovation, performance, and energy efficiency for the entire industry. It also provides critical technical information for customers who are procuring and tuning AI training systems.

The v1.0 benchmark results, from a broad set of technology providers, demonstrate the industry’s recognition of the importance of high-performance storage solutions. MLPerf Storage v1.0 includes over 100 performance results from 13 submitting organizations: DDN, Hammerspace, Hewlett Packard Enterprise, Huawei, IEIT SYSTEMS, Juicedata, Lightbits Labs, MangoBoost, Micron, Nutanix, Simplyblock, Volumez, WEKA, and YanRong Tech.

“We’re excited to see so many storage providers, both large and small, participate in the first-of-its-kind v1.0 Storage benchmark,” said David Kanter, Head of MLPerf at MLCommons. “It shows both that the industry is recognizing the need to keep innovating in storage technologies to keep pace with the rest of the AI technology stack, and also that the ability to measure the performance of those technologies is critical to the successful deployment of ML training systems. As a trusted provider of open, fair, and transparent benchmarks, MLCommons ensures that technology providers know the performance target they need to meet, and consumers can procure and tune ML systems to maximize their utilization – and ultimately their return on investment.”

We invite stakeholders to join the MLPerf Storage working group and help us continue to evolve the benchmark. Future work includes improving and increasing accelerator emulations and AI training scenarios.

View the Results

To view the results for MLPerf Storage v1.0, please visit the Storage benchmark results.

About MLCommons

MLCommons is the world leader in building benchmarks for AI. It is an open engineering consortium with a mission to make AI better for everyone through benchmarks and data. The foundation for MLCommons began with the MLPerf benchmarks in 2018, which rapidly scaled as a set of industry metrics to measure machine learning performance and promote transparency of machine learning techniques. In collaboration with its 125+ members, global technology providers, academics, and researchers, MLCommons is focused on collaborative engineering work that builds tools for the entire AI industry through benchmarks and metrics, public datasets, and measurements for AI Safety.

For additional information on MLCommons and details on becoming a member, please visit MLCommons.org or contact participation@mlcommons.org.

The post New MLPerf Storage v1.0 Benchmark Results Show Storage Systems Play a Critical Role in AI Model Training Performance appeared first on MLCommons.

]]>
MLPerf Results Highlight Growing Importance of Generative AI and Storage https://mlcommons.org/2023/09/mlperf-results-highlight-growing-importance-of-generative-ai-and-storage/ Mon, 11 Sep 2023 08:00:00 +0000 http://local.mlcommons/2023/09/mlperf-results-highlight-growing-importance-of-generative-ai-and-storage/ Latest benchmarks include LLM in inference and the first results for storage benchmark

The post MLPerf Results Highlight Growing Importance of Generative AI and Storage appeared first on MLCommons.

]]>
Today we announced new results from two MLPerf™ benchmark suites: MLPerf Inference v3.1, which delivers industry standard Machine Learning (ML) system performance benchmarking in an architecture-neutral, representative, and reproducible manner, and MLPerf Storage v0.5. This publication marks the first ever release of results from the MLPerf Storage benchmark, which measures the performance of storage systems in the context of ML training workloads.

MLPerf Inference v3.1 Introduces New LLM and Recommendation Benchmarks

MLPerf Inference v3.1 includes record participation, with over 13,500 performance results and up to 40% performance gains, from 26 different submitters, and over 2000 power results. Submitters include: ASUSTeK, Azure, cTuning, Connect Tech, Dell, Fujitsu, Giga Computing, Google, H3C, HPE, IEI, Intel, Intel-Habana-Labs, Krai, Lenovo, Moffett, Neural Magic, NVIDIA, Nutanix, Oracle, Qualcomm, Quanta Cloud Technology, SiMA, Supermicro, TTA, and xFusion. In particular, MLCommons® would like to congratulate first time MLPerf Inference submitters Connect Tech, Nutanix, Oracle, and TTA.

“Submitting to MLPerf is not trivial. It’s a significant accomplishment, as this is not a simple point and click benchmark. It requires real engineering work and is a testament to our submitters’ commitment to AI, to their customers, and to ML.” said David Kanter, Executive Director of MLCommons.

The MLCommons MLPerf Inference benchmark suite measures how fast systems can run models in a variety of deployment scenarios. ML inference is behind everything from the latest generative AI chatbots to safety features in vehicles, such as automatic lane-keeping and speech-to-text interfaces. Improving performance and power efficiency is key to deploying more capable AI systems that benefit society.

MLPerf Inference v3.1 introduces two new benchmarks to the suite. The first is a large language model (LLM) using the GPT-J reference model to summarize CNN news articles, which garnered results from 15 different submitters, reflecting the rapid adoption of generative AI. The second is an updated recommender, modified to be more representative of industry practices, using the DLRM-DCNv2 reference model and a much larger datasets, with 9 submissions. These new tests help advance AI by ensuring that industry-standard benchmarks represent the latest trends in AI adoption to help guide customers, vendors, and researchers.

“The submissions for MLPerf inference v3.1 are indicative of a wide range of accelerators being developed to serve ML workloads. The current benchmark suite has broad coverage among ML domains and the most recent addition of GPT-J is a welcome addition to the generative AI space. The results should be very helpful to users when selecting the best accelerators for their respective domains,” said Mitchelle Rasquinha, MLPerf Inference Working Group co-chair.

MLPerf Inference benchmarks primarily focus on datacenter and edge systems. The v3.1 submissions showcase several different processors and accelerators across use cases in computer vision, recommender systems, and language processing. There are both open and closed submissions related to performance, power, and networking categories. Closed submissions use the same reference model to ensure a level playing field across systems, while participants in the open division are permitted to submit a variety of models.

First Results for New MLPerf Storage Benchmark

The MLPerf Storage Benchmark Suite is the first open-source AI/ML benchmark suite that measures the performance of storage for ML training workloads. The benchmark was created through a collaboration spanning more than a dozen leading industry and academic organizations and includes a variety of storage setups including: parallel file systems, local storage, and software defined storage. The MLPerf Storage Benchmark will be an effective tool for purchasing, configuring, and optimizing storage for machine learning applications, as well as for designing next-generation systems and technologies.

Training neural networks is both a compute and data-intensive workload that demands high-performance storage to sustain good overall system performance and availability. For many customers developing the next generation of ML models, it is a challenge to find the right balance between storage and compute resources while making sure that both are efficiently utilized. MLPerf Storage helps overcome this problem by accurately modeling the I/O patterns posed by ML workloads, providing the flexibility to mix and match different storage systems with different accelerator types.

“Our first benchmark has over 28 performance results from five companies which is a great start given this is the first submission round,” explains Oana Balmau, Storage Working Group co-chair. “We’d like to congratulate MLPerf Storage submitters: Argonne National Laboratory (ANL), DDN, Micron, Nutanix, WEKA for their outstanding results and accomplishments.”

The MLPerf Storage benchmark suite is built on the codebase of DLIO, a benchmark designed for I/O measurement in high performance computing, adapted to meet current storage needs.

View the Results

To view the results for MLPerf Inference v3.1 and MLPerf Storage v0.5 and find additional information about the benchmarks please visit the following pages:

About MLCommons

MLCommons is an open engineering consortium with a mission to make machine learning better for everyone through benchmarks and data. The foundation for MLCommons began with the MLPerf benchmark in 2018, which rapidly scaled as a set of industry metrics to measure machine learning performance and promote transparency of machine learning techniques. In collaboration with its 50+ members – global technology providers, academics, and researchers, MLCommons is focused on collaborative engineering work that builds tools for the entire machine learning industry through benchmarks and metrics, public datasets, and best practices.

For additional information on MLCommons and details on becoming a Member or Affiliate, please visit MLCommmons or contact participation@mlcommons.org.

The post MLPerf Results Highlight Growing Importance of Generative AI and Storage appeared first on MLCommons.

]]>
Introducing the MLPerf Storage Benchmark Suite https://mlcommons.org/2023/06/introducing-the-mlperf-storage-benchmark-suite/ Tue, 20 Jun 2023 08:27:00 +0000 http://local.mlcommons/2023/06/introducing-the-mlperf-storage-benchmark-suite/ The first benchmark suite that measures the performance of storage for machine learning workloads

The post Introducing the MLPerf Storage Benchmark Suite appeared first on MLCommons.

]]>
The MLCommons® Storage Working Group is excited to announce the availability of the MLPerf™ Storage Benchmark Suite. This is the first artificial intelligence/machine learning benchmark suite that measures the performance of storage for machine learning (ML) workloads. It was created through a collaboration across more than a dozen leading organizations. The MLPerf Storage Benchmark will be an effective tool for purchasing, configuring, and optimizing storage for ML applications, as well as designing next-generation systems and technologies.

Striking a Balance Between Storage and Compute

Training neural networks is both a compute and data-intensive workload that demands high-performance storage to sustain good overall system performance and availability. For many customers developing the next generation of ML models, it is a challenge to find the right balance between storage and compute resources while making sure that both are efficiently utilized. MLPerf Storage helps overcome this problem by accurately modeling the I/O patterns posed by ML workloads for various accelerator types, providing the flexibility to mix and match different storage systems with different accelerator types.

How it Works

MLPerf Storage measures the sustained performance of a storage system for MLPerf Training and HPC workloads on both PyTorch and Tensorflow without requiring the use of expensive accelerators. It instead relies on a novel and elegant emulation mechanism that captures the full realistic behavior of neural network training. The first round of MLPerf Storage includes the BERT model which pioneered transformers for language modeling and 3DUNet which performs segmentation for 3D medical images. Subsequent rounds will expand the number and variety of workloads, the variety of emulated accelerators, and other features.

MLPerf Storage overview: Data loading and on-line pre-processing are performed using the ML frameworks, while model training is emulated.

The MLPerf Storage Benchmark Suite marks a major milestone in MLCommons’ line-up of MLPerf training benchmark suites. With the addition of MLPerf Storage, we hope to stimulate innovation in the academic and research communities to push the state-of-the-art in storage for ML, as well as providing a flexible tool for cluster designers to understand the interplay between storage and compute resources at scale. We are excited to see how the community will accelerate storage performance for some of the world’s most valuable and challenging emerging applications.

The MLPerf Storage inference benchmarks were created thanks to the contributions and leadership of our working members over the last 18 months.

MLPerf Storage Benchmark timeline

  • MLPerf Storage open for submissions June 19, 2023
  • All submissions are due August 4, 2023
  • Benchmark competition results publish September 13, 2023

How to Get Involved

  • Submit a benchmark result.
  • Spread the word to companies who would be interested in submitting.
  • Help us continue the development of the MLPerf Storage Benchmark by joining the working group.

The post Introducing the MLPerf Storage Benchmark Suite appeared first on MLCommons.

]]>