MLPerf Automotive Archives - MLCommons

A New Automotive Benchmark for MLPerf Inference v5.0

lori@mlcommons.org — Wed, 02 Apr 2025 14:58:55 +0000

Intro

MLPerf^® Inference v5.0 introduced a new automotive benchmark. From the many potential workloads related to autonomous driving, the automotive benchmark task force chose 3D object detection as the benchmark task. Based on input from experts in the autonomous driving space, the task force selected PointPainting as the benchmark model. PointPainting is a sensor-fusion technique proposed by Vora et al. in 2019 that works in conjunction with image-based semantic segmentation to classify points generated by a lidar system. The experts indicated that 3D object detection is the most useful compute-intensive task to benchmark. We selected PointPainting because it is reasonably accurate, fast for inference, and has representative components of classical and newer 3D detection models.

Model selection

Compared to other machine learning applications, progress in autonomous driving is relatively more cautious. New models must go through extended review to ensure safety and reliability. We wanted to choose a model that contains useful components from both more proven older models as well as newer state-of-the-art models used in ongoing research. We also wanted to pick a model that is representative of multiple sensor modalities while still favoring the most common case of using video/images for vision, and a sensor suite synergistic between lower-resolution, affordable lidar and higher-resolution cameras.

Figure 1: Overview of PointPainting using DeepLabv3+ for segmentation and PointPillars for 3D Detection. On the left, five camera images are segmented by Deeplabv3+. In the middle, the segmentation scores are then painted on the lidar point cloud. Finally, the painted points are passed to PointPillars to produce the final 3D bounding boxes.

PointPainting is composed of two models: an image segmentation model that is used to “paint” lidar points by concatenating segmentation scores to lidar points and a lidar object detection model that takes the painted points as inputs and produces 3D detection predictions. A high level diagram is shown in Figure 1. We chose DeepLabv3+ with a ResNet-50 backbone as the segmentation model and PointPillars as the lidar detector. Both are popular for their respective tasks and are fast enough for real time applications. Additionally, these model choices account for about 90% of the PointPainting workload being performed in the segmentation task. Favoring images in the workload helps make the model more representative of the most common case of vision tasks while still incorporating lidar components in our model.

Dataset and task selection

Autonomous vehicles must perform many computational tasks. Our goal was to pick a task that is computationally intensive and representative of the self-driving tasks. To select one task to benchmark, we consulted with experts in the autonomous driving field. We had unanimous agreement that 3D object detection would be the most representative task. 3D tasks are inherently more computationally expensive than 2D tasks and often incorporate 2D models within them. 3D detection is also an important component of other tasks such as trajectory prediction.

We chose the Waymo Open Dataset (WOD) because it is a large high-quality dataset. Importantly, we verified that MLPerf benchmarking is considered a non-commercial use under the terms of the WOD license. The WOD provides five camera images and one lidar point cloud per frame. The images have resolutions of 1920×1280 and 1920×886 pixels. The data was captured in San Francisco, Mountain View, and Phoenix. The WOD provides annotations for 3D bounding boxes as well as video panoptic segmentation which is needed for the segmentation network.

There is no public implementation of PointPainting based on WOD. We trained a model ourselves to determine an accuracy target and generate a reference checkpoint for the benchmark. We built on the open source work of PointPillars, DeepLabv3+, PointPainting, and mmdetection 3d to create our implementation of PointPainting. As a baseline we trained PointPillars on WOD. Our goal was not to produce the most accurate version of PointPillars but to get a model with an accuracy that is representative of what can be achieved on the WOD dataset. Our PointPillars baseline achieved a mean average precision (mAP) of 49.91 across three classes (Pedestrians, Cyclists, and Vehicles).

Our trained PointPainting model is available to MLCommons® members by request here.

Performance metrics

MLPerf Inference’s single stream scenario for edge systems represents a task that accepts a single input and measures the latency for producing the output. This scenario is the closest and most relevant for an automotive workload where frames are received at a fixed rate from sensors. We chose a 99.9% accuracy threshold, as automotive workloads are safety-critical. The 99.9% latency percentile measures the 99.9% largest latency for a submission out of the required thousands of inference samples. These requirements balance the need for strong latency constraints with the desire to ensure the benchmark runtime is on the order of one day rather than weeks.

Conclusion

The automotive benchmark taskforce followed a rigorous process for choosing and building an automotive workload for MLPerf Inference. We have undertaken careful decisions regarding the task, dataset, and model to ensure the creation of a robust and representative benchmark. Additionally, we have determined accuracy targets and performance metrics that fit within MLPerf Inference while being useful for automotive tasks. MLCommons plans to continue developing automotive benchmarks across a suite of automotive-specific tasks. We welcome feedback from the community regarding this benchmark and future developments.

Additional technical contributors: Pablo Gonzalez, MLCommons; Anandhu Sooraj, MLCommons; Arjun Suresh, AMD (formerly at MLCommons).

The post A New Automotive Benchmark for MLPerf Inference v5.0 appeared first on MLCommons.

MLCommons and AVCC Release Automotive Benchmark Proof-of-Concept

MLCommons — Tue, 18 Jun 2024 15:50:00 +0000

MLCommons^® and the Autonomous Vehicle Computing Consortium (AVCC) have achieved the first step toward a comprehensive MLPerf^® Automotive Benchmark Suite for AI systems in vehicles with the release of the MLPerf Automotive benchmark proof-of-concept (POC). The POC was developed by the Automotive benchmark task force (ABTF), which includes representatives from Arm, Bosch, Cognata, cKnowledge, Marvell, NVIDIA, Qualcomm Technologies, Inc., Red Hat, Sacramento State University, Samsung, Siemens EDA, Tenstorrent, and UC Davis amongst others.

The demand for AI-based systems in vehicles is exploding – and not just for controlling fully autonomous cars. Automotive manufacturers (OEMs) are incorporating AI in a number of other in-car systems to enhance the driving experience and increase vehicle safety, with features including:

Speech-controlled infotainment systems and online vehicle user manuals
Route/direction guidance systems
Optimizing stops for charging, refueling, and maintenance
Collision avoidance systems
Driver monitoring for drowsiness or lack of attention

Each of these features requires trained AI models and appropriate input sensors, plus underlying computing infrastructure powerful enough to meet performance demands.

Establishing common reference points

Automotive OEMs must decide which combinations of features to include in their vehicles, and need to choose where to source both the AI systems and the underlying computing infrastructure they will run on, all while ensuring that the systems can run simultaneously with acceptable performance. As they issue requests for information (RFIs) and requests for quotation (RFQs) to Tier 1 and other suppliers for system components, they need common reference points to understand the collective computing demand of the systems and the resources required to meet it. The MLPerf Automotive benchmark will provide those common reference points. This will enable OEMs to select or design the most suitable solution for their system requirements with provisioned resources.

“The adoption of AI in automotive is helping to enhance user experiences and enable safer vehicles, but it is also one of the most complex and compute intensive parts of an automotive software stack,” said Kasper Mecklenburg, ABTF co-chair and principal autonomous driving solution engineer, Automotive Line of Business, Arm. “The ABTF aims to help OEMs and Tier 1s determine what hardware and software is most suitable for these applications by providing a comprehensive benchmark suite, and this POC is the first step towards delivering on this goal.”

Video demonstration of the new AVCC and MLCommons automotive benchmark running the SSD-ResNet50 Object Detection model at 8 megapixel image input. The model was trained using the large synthetic MLCommons Cognata dataset, rendered with 8 megapixel resolution.

The POC includes a subset of the full v1.0 MLPerf Automotive Benchmark Suite, which is targeted for release at the end of 2024. It focuses on a camera-based object detection capability, which is commonly found in collision-avoidance systems and autonomous driving systems. Reaching the POC is a critical milestone that allows the task force to gather additional feedback from the automotive industry to ensure a comprehensive approach for the v1.0 release.

The POC release includes a fully-functioning reference implementation for a trained object recognition system, including:

A SSD-ResNet50 object detector model trained on the Cognata dataset and a runtime engine
The Collective Mind automation system for scripting and managing the execution of the model
A small subset of the Cognata dataset for demo purposes
Additional software components necessary to run the benchmark in a PC-based environment

The model is an adapted version of the SSD+ResNet50 residual neural network system first introduced in 2015 which is representative of the majority of visual object recognition systems in use today. SSD+ResNet50 provides a baseline for measuring performance that is well-known and understood by the industry. The version included in the POC has been retrained to work with 8-megapixel images, which is the emerging standard for camera-based systems in vehicles, to ensure the benchmark is future-proof for years to come. The model and runtime are paired with the software framework and dependencies necessary to conveniently execute in a Docker container, providing support for a wide range of computing environments.

As part of the commitment to delivering a rigorous and robust benchmark suite, the POC includes a small subset of the MLCommons Cognata dataset containing 120,000 8-megapixel images that was used to train the model. The images are synthesized street-level views from a vehicle, representing scenarios that a typical collision-avoidance system would need to process. Using synthesized images allows for the inclusion of scenarios that are too dangerous to capture in reality, such as a child running in front of a car. In addition to the subset of images included in the POC, MLCommons members will have access to the full MLCommons Cognata dataset.

“AVCC and MLCommons have invested in dedicated compute to train the full suite of models planned for the v1.0 release,” said David Kanter, MLCommons executive director. “The release of the POC and the acquisition of the Cognata dataset and training resources demonstrate our full commitment to building a comprehensive automotive benchmark suite.”

Collective Mind, also included in the POC, is a portable, extensible framework for automation and reproducibility. It is used to execute the complex machine learning and AI applications used to manage the process of running the benchmark.

Suppliers will run the automotive benchmark optimized on their own products and share the results with their potential customers. Automotive OEMs and suppliers can independently verify benchmark results; they can also re-run the benchmark after assembling their own combination of system components. Additionally, they can substitute in their own proprietary models and/or data and generate their own benchmark results.

Seeking comprehensive input for a v1.0 release

The POC release is an initial step toward a complete v1.0 release that allows for benchmarking a full set of AI components representative of those included in vehicles. The POC is not optimized for performance as suppliers will have their own optimized solutions. It is intended to provide an opportunity for automotive OEMs, Tier 1 and other suppliers, and tech industry stakeholders to provide feedback on components and benchmark key performance indicators.

“MLCommons and AVCC partnered to deliver a community-driven and open approach to benchmarking for automotive systems,” said James Goel, ABTF co-chair. “This affords all stakeholders an opportunity to access and run the POC, provide feedback and incorporate the latest industry requirements to create the best performance benchmark suite for AI systems in vehicles.” 

The ABTF invites the community of automotive OEMs, Tier 1s and other component suppliers, and tech industry stakeholders to join the conversation by downloading the Automotive POC, evaluating it, and providing feedback through the ABTF working group. Additional AVCC technical reports on benchmarking AI systems in an automotive environment are also available for reference.

About AVCC

AVCC is a global automated and autonomous vehicle (AV) consortium that specifies and benchmarks solutions for AV computing, cybersecurity, functional safety, and building block interconnects. AVCC is a not-for-profit membership organization building an ecosystem of OEMs, automotive suppliers, and semiconductor and software suppliers in the automotive industry. The consortium addresses the complexity of the intelligent-vehicle software-defined automotive environment and promotes member-driven dialogue within technical working groups to address non-differentiable common challenges. AVCC is committed to driving the evolution of autonomous and automated solutions up to L5 performance. For additional information on AVCC membership and technical reports, please visit www.avcc.org or email outreach@avcc.org.

About MLCommons

MLCommons is the world leader in building benchmarks for AI. It is an open engineering consortium with a mission to make AI better for everyone through benchmarks and data. The foundation for MLCommons began with the MLPerf benchmarks in 2018, which rapidly scaled as a set of industry metrics to measure machine learning performance and promote transparency of machine learning techniques. In collaboration with its 125+ members, global technology providers, academics, and researchers, MLCommons is focused on collaborative engineering work that builds tools for the entire AI industry through benchmarks and metrics, public datasets, and measurements for AI Safety.

For additional information on MLCommons and details on becoming a member or affiliate, please visit MLCommons.org or email participation@mlcommons.org.

The post MLCommons and AVCC Release Automotive Benchmark Proof-of-Concept appeared first on MLCommons.

AVCC and MLCommons Join Forces to Develop an Automotive Industry Standard

MLCommons — Wed, 17 May 2023 08:31:00 +0000

Today, AVCC, a global autonomous vehicle (AV) consortium that specifies and benchmarks solutions for AV computing in the automotive industry, and MLCommons®, an open, global engineering consortium dedicated to making machine learning (ML) better for everyone, are announcing the industry’s first Automotive Benchmark to take specifications and benchmarks and move them through to open-source software and certification. Initial participation in the benchmark includes Arm (co-chair), Bosch, cTuning Foundation, KPIT, Mobileye, NVIDIA, Red Hat, Qualcomm, Inc. (co-chair), Samsung Electronics, and other industry leaders.

The use of ML, especially for the perception system in autonomous vehicles, has increased dramatically over the last few years, unlocking new innovations such as automatic lane-keeping that make roads safer. As vehicles are becoming more intelligent, the industry needs a common set of ML benchmarks to enable fair and accurate comparisons between different technologies.

The goal of the partnership is to develop an industry-standard Automotive Benchmark Suite for use by OEMs and automotive suppliers using AI/ML Deep Neural Network (DNN) technology. The Suite will build upon the AVCC AI/ML Benchmark Technical Reports and the MLPerf benchmark suites developed by MLCommons. This common set of benchmarks will also help guide the industry’s collective engineering around future platforms, accelerating the development of new capabilities. Development of the automotive benchmark suite will occur in a multi-phase approach, with the goal of delivering open-sourced software solutions by the end of the year.

“We are excited to bring our expertise in machine learning to the automotive industry, working together with the AVCC,” said David Kanter, executive director of MLCommons. “We believe this opportunity will help spur innovation and encourage standards around increasingly intelligent and capable vehicles.”

“OEMs and automotive suppliers are currently challenged to understand a solution’s compute performance and system resource requirements,” said Armando Pereira, president of AVCC. “The work of our joint task force will finally give the industry an easy and certified source of information these players need to make significant decisions on their selection of suppliers and project investment.”

The technical work has begun, and broad participation from the industry is encouraged. The Automotive Benchmark initiative seeks input from all AVCC and MLCommons members, along with other industry players. If your organization is using (or looking to use) automotive AI technology, you are encouraged to participate in this joint initiative to support the development of the new Standard Automotive Benchmark Suite.

To join the Standard Automotive Benchmark Suite initiative, visit the task force’s page.
For more information about AVCC.
For more information about MLCommons.

About AVCC
AVCC is a global autonomous vehicle (AV) consortium that specifies and benchmarks solutions for AV computing, cybersecurity, functional safety, and building block interconnects. The AVCC is a not-for-profit membership organization building an ecosystem of OEMs, automotive suppliers, semiconductor and software suppliers in the automotive industry. The Consortium addresses the complexity of the AV environment and promotes member-driven dialogue within technical working groups to address non-differentiable common challenges. AVCC is committed to driving the evolution of autonomous and automated solutions up to L5 performance over the next decades. www.avcconsortium.org

About MLCommons
MLCommons is an open engineering consortium with a mission to make machine learning better for everyone through benchmarks and data. The foundation for MLCommons began with the MLPerf benchmark in 2018, which rapidly scaled as a set of industry metrics to measure machine learning performance and promote transparency of machine learning techniques. In collaboration with its 50+ founding partners – global technology providers, academics and researchers, MLCommons is focused on collaborative engineering work that builds tools for the entire machine learning industry through benchmarks and metrics, public datasets and best practices.

Press Contacts
Sarah LaLiberte
Marketing Chair AVCC
pr@avcconsortium.org

Kelly Berschauer
Marketing Director MLCommons
kelly@mlcommons.org

The post AVCC and MLCommons Join Forces to Develop an Automotive Industry Standard appeared first on MLCommons.