Medical Archives - MLCommons

MedPerf enhances transparency and integrity of real-world benchmarks using smart contracts

MLCommons — Mon, 10 Mar 2025 20:27:32 +0000

When the MLCommons Medical AI working group set out to develop MedPerf, an open framework for benchmarking medical AI in real-world private datasets, transparency and privacy were the main technical objectives. The current implementation of MedPerf ensures this transparency and integrity through a formal benchmark committee. The committee is a governance body which oversees the entire benchmark process, from committee formation to reference implementation development, dissemination and access of benchmark results, and execution of the benchmark. Critical to MedPerf is maintaining its federated evaluation approach to deliver a high level of data privacy; enforcing no transfer of medical data to perform the benchmarks.

Figure 1. Current MedPerf benchmark workflow. Red circle shows current policy implementation.

The working group, in collaboration with technical member organization Intel and academic organization Notre Dame is improving MedPerf with a new policy-enhanced workflow enforced by smart contracts. A smart contract is delivered via a trusted-execution-environment such as an Intel Software Guard Extension to ensure stronger transparency and privacy. The new workflow and smart contract protects digital assets by combining a confidential smart contract framework with private data objects by binding data policy enforcement with policy evaluation and isolated data task executions. This new approach proven through a proof-of-concept (POC) delivers a higher level of data transparency, accountability, and traceability.

The importance of policies to protect digital assets

Digital asset usage policy is of the utmost importance to protect intellectual property (IP) such as data and AI models. Unfortunately most existing solutions require copies of digital assets with access controls when sharing and rely on centralized governance for policy enforcement. In reality, this may actually reduce transparency and increase the risk of intellectual property theft or leakage.

This risk becomes more problematic in the healthcare space where data and AI models may contain personally identifiable information as well as very expensive IP. Transparency for medical data is critical to maintain traceability and accountability of various processes such as training, validation, and benchmarking. Any lack of transparency could be further magnified in decentralized settings such as data federations where assets are hosted and transferred in a decentralized manner.

Trusted execution environment-based confidential smart contract frameworks with private data objects have proven to help with policy enforcement by binding policy evaluation and isolated task executions to protect digital assets. Frameworks such as these can facilitate a high level of transparency, accountability, and traceability but to-date were largely unexplored in the medical AI space.

MedPerf smart contracts act as enforcement vehicles for healthcare dataset use policies

Current use policies for MedPerf datasets are implicitly hardcoded in the benchmark workflow. This means a data owner must execute any requested benchmark on their local data. This approach offers users full control of their data assets, but lacks any flexibility for customization and scalability such as decentralization and increasing the number of requests.

To enhance policies and provide broader flexibility and scalability in MedPerf, the working group built a trusted-execution-environment POC using a confidential smart contract framework. This approach delivers policy enforcement via a private data object by binding policy evaluation and isolated task executions to protect digital assets.

To initiate a smart contract, a MedPerf data owner creates a private data object contract that is linked to their original dataset through the dataset ID. The contract contains descriptive information on the dataset and acts as an interface to it. The data owner then securely deposits the dataset inside a digital-guardian-service trusted-execution-environment enclave. The enclave can be hosted on premise or on the cloud. When a benchmark of a model on the dataset is requested, the contract and the digital guardian perform bi-directional attestation to ensure each other’s integrity. This approach retains MedPerf’s federated evaluation ensuring the dataset never leaves and is always within the secure domain of the enclave while its use is bounded by policy enforcement.

Figure 2. Policy-enhanced benchmark workflow. Red circle shows smart-contract policy implementation.

Enhanced benchmark workflow benefits

The new policy-enhanced workflow provides more power to owners of digital assets such as healthcare datasets and models in benchmarking workflows. The POC has validated the use of embedded smart contract policies in MedPerf, and full digital asset owner control of their assets.

This new approach increases utility and potential rewards in benchmarks by supporting integrity enabling transparency in the execution cycle.

To learn more about how the policy is implemented, read the Technical Update.

The Medical AI working group calls on the broad community of academics, technologists, regulatory scientists, healthcare experts, and others to help with the roadmapping and contributions to impactful benchmarking technologies by joining the MLCommons Medical AI working group.

The post MedPerf enhances transparency and integrity of real-world benchmarks using smart contracts appeared first on MLCommons.

MLCommons Medical Working Group Co-authors Book Chapter on Collaborative Evaluation for Medical Imaging

MLCommons — Mon, 27 Jan 2025 20:32:20 +0000

Several members of the MLCommons^® Medical working group have co-authored a chapter for a new book on AI in medical imaging. The chapter, “Collaborative evaluation for performance assessment of medical imaging applications” appears in the book Trustworthy AI in Medical Imaging, part of a book series produced by the Medical Image Computing and Computer Assisted Intervention Society (MICCAI) and Elsevier. It provides an introduction to the concept of collaborative evaluation: how healthcare stakeholders form a community to jointly evaluate AI medical imaging systems.

Collaborative Evaluation: organizing to overcome barriers to evaluating medical AI applications

AI has been used in medical imaging for a while, but its adoption in clinical settings has been slow due to a lack of thorough and robust evaluation enabling trust by healthcare stakeholders. The rise of “generative AI”, or simply “GenAI”, in this field makes comprehensive evaluation even more crucial. This requires a large, diverse set of test data, which no single organization usually has. Therefore, organizations need to pool their data, but privacy and security regulations can make this difficult. A collaborative structure can help overcome these barriers.

Such a collaborative structure can organize essential evaluation activities like defining data specifications, preparing data, conducting evaluations, specifying evaluation metrics and publishing results. All these steps should happen with a clear governance structure, high level of integrity and transparency of evaluation enforced by open technology to enable stakeholders’ trust and collaboration. Teams can decide how to automate and whether to centralize or distribute these tasks. For example, a method called federated evaluation can be used to avoid data sharing, thus empowering data ownership, at the cost of more automation to handle the logistics.

The book chapter lays out a typical workflow for a collaborative evaluation process, and includes a case study of its implementation. It also delves into some of the key considerations in designing and managing collaborative evaluation, including:

Orchestration and scalability: how to design, provision and coordinate the evaluation process efficiently, across multiple sites;
Security and data/model protection: validating that the key resources are secure, free from tampering, and compliant with regulatory requirements for medical data;
Integrity and transparency: ensuring that the evaluation process is secure, confidential, and free from tampering for accountability and traceability purposes
Data quality: avoiding “garbage in, garbage out” issues by ensuring that the data used to evaluate AI imaging systems is of consistent, high quality.

The chapter closes by touching on some of the challenges and opportunities that lay ahead for collaborative evaluation. It discusses issues around sustainability: given the high cost for acquiring and preparing both data resources and AI technology systems, it is critical to ensure that stakeholders have sufficient financial incentives to continue participating. It also speaks to the need to “democratize” collaborative evaluation so that smaller industry stakeholders can join, contribute and benefit; and the need to ensure that collaborative evaluation aligns with clinical workflow processes.

Order the book today

More information about the book Trustworthy AI in Medical Imaging, published by Elsevier, can be found here. Our chapter “Collaborative evaluation for performance assessment of medical imaging applications” is available to download here.

The post MLCommons Medical Working Group Co-authors Book Chapter on Collaborative Evaluation for Medical Imaging appeared first on MLCommons.

Comprehensive Open Federated Ecosystem (COFE) showcased at MICCAI 2024

MLCommons — Thu, 21 Nov 2024 22:30:20 +0000

During the 27th International Conference On Medical Image Computing And Computer Assisted Intervention (MICCA) the MLCommons®Medical working group presented a “Federated Learning in Healthcare” tutorial. Previously presented at ISBI 2024, the tutorial taught participants about the Comprehensive Open Federated Ecosystem (COFE). COFE is an open ecosystem of tools and best practices distilled from years of industrial and academic research experience aiming to streamline the research and development of AI/ML models within the clinical setting.

Figure 1. Updated diagram of latest COFE iteration. HuggingFace Hub was integrated as part of the COFE ecosystem.

During the tutorial the team shared the latest updates in the COFE ecosystem:

Integration of Hugging Face Hub to enable a one-click solution for model dissemination for users, which allows easier research collaboration in the community. Hugging Face Hub also allows for a streamlined approach to a GaNDLF Model Zoo, where users can select appropriate licensing terms (including non-commercial use) for their models..
Full integration with the Medical Open Network for Artificial Intelligence (MONAI) enables developers to reuse engineering and research efforts from different groups while allowing a more cohesive pathway for code reproducibility.
GaNDLF-Synth is a general-purpose synthesis extension of GaNDLF. It provides a vehicle for researchers to train multiple synthesis approaches (autoencoders, GANs, and diffusion) for the same task, thereby allowing them to quantify the efficacy of each approach for their task. Additionally, providing a robust abstraction layer allows algorithmic developers to have a common framework to distribute their methods, allowing easier benchmarking against other approaches.
Privacy-enabled training across various tasks using Opacus. By integrating with Opacus, GaNDLF enables the training of models using differential privacy (DP). DP is one of the most widely used methods to incorporate privacy in trained models, and with GaNDLF, users can now tune their models for various DP settings to achieve the best model utility while training private models.
Single-command model distribution through Hugging Face.
MedPerf’s new web user interface makes evaluating and benchmarking healthcare AI models easier than ever. Designed for seamless interaction, it allows dataset owners and benchmark stakeholders to engage with the platform without requiring technical expertise. This update ensures that anyone involved in the process can contribute to benchmarking effortlessly.

Siddhesh Thakur, a Data Engineer at Indiana University School of Medicine, showcased optimization workflows and their practical application in deploying AI models for whole slide image analysis in resource-constrained environments. Thakur shared key concepts of model optimization and demonstrated how GaNDLF facilitates this process by integrating with Intel’s OpenVINO toolkit. Through online examples, attendees learned how to leverage these tools to create efficient, deployable AI solutions for digital pathology, even in settings with limited computational resources.

Cyril Zakka, a physician and head of ML Research in Health at Hugging Face, demonstrated Hugging Face’s commitment to the democratization of state-of-the-art machine learning and highlighted various open-source tools and offerings for model development, fine-tuning, and deployment across different computing budgets and infrastructures. Dr Zakka outlined the newly developed Health Team’s mission and ongoing projects, highlighting the tight integration within the GaNDLF ecosystem to enable one-click deployment and checkpointing of scientific models, further emphasizing COFE’s goal of reproducibility and scientific accountability.

The community was invited to learn more through a self-guided hands-on session. To learn more about the Medical WG and its work in real-world clinical studies please email medperf@mlcommons.org.The hands-on session of this tutorial was made possible through GitHub Codespaces with generous technical support from their engineers and program managers working in close collaboration with the tutorial organizers. The tutorial is available to everyone here.

The team would also like to thank the organizing committee of MICCAI2024 and the conference organizers.

Participants attending the “Federated Learning in Healthcare” at MICCAI 2024 in Marrakesh, Morocco.

Join Us

COFE addresses key challenges that the research community faces in the federated healthcare AI space through open community tools. As an organization, we strive to collect feedback from the broader healthcare research community to help build tools that advance AI/ML for everyone. If you want to participate in the MLCommons Medical working group, please join us here.

The post Comprehensive Open Federated Ecosystem (COFE) showcased at MICCAI 2024 appeared first on MLCommons.

MLCommons Medical WG Supports FeTS 2.0 Clinical Study with MedPerf and GaNDLF

MLCommons — Wed, 11 Sep 2024 15:00:00 +0000

Some of the most exciting applications of AI currently being developed are within the medical domain, including building models to analyze and label X-rays, MRIs, and CT scans. As digital radiography has become ubiquitous, the amount of imaging data generated has exploded. This is fueling the development of AI tools to help experts improve the accuracy of analysis of health-related issues, and to reduce the time it takes to complete it. At the same time, global regulations to protect consumers’ health data have been imposed, making it more challenging for AI developers to gain access to enough aggregated data to effectively build generalizable AI models, as well as to evaluate their quality and performance.

The MLCommons® Medical working group has been developing best practices and open tools for the development and evaluation of such AI applications in healthcare, while conforming to health data protection regulations. In particular, it has used an approach that encompasses both federated learning and federated evaluation.

Federated learning and federated evaluation in healthcare

Federated learning is a machine learning approach that enables increased data access while reducing privacy concerns. With federated learning, rather than attempting to aggregate training data across multiple contributing organizations into a single location as a unified dataset, it allows the data to remain physically distributed across the organizations’ sites and under their control. Instead of bringing the data to the model, the model is brought to the data, and is trained in installments using the data available at each location. The result is the same as if all the data were trained in one location, but with federated learning, none of the raw data is ever shared, keeping the system compliant with data protection regulations.

Successful clinical translation requires further evaluation of the AI model’s performance. Borrowing from federated learning, federated evaluation is an approach to evaluate AI models on diverse datasets.. With federated evaluation the test data is physically distributed across contributing organizations and a fully-trained model, called a “system under test” (SUT), is passed from site to site. Evaluation results are then aggregated and shared across all participating organizations.

A medical research team wishing to do a complete medical study using AI will need to do both: federated learning to train an AI model, and federated evaluation to assess the results.

Using federated evaluation to benchmark medical AI models

Federated evaluation is a core part of MLCommons’ MedPerf approach to benchmarking AI models for medical applications. MedPerf is an open benchmarking platform, which provides infrastructure, technical support and organizational coordination for developing and managing benchmarks of AI models across multiple data sources. It improves medical AI by enabling benchmarking on diverse data from across the world to identify and reduce bias, offering the potential to improve generalizability and clinical impact.

Through MedPerf, a SUT can be packaged up in a portable, platform-independent container called MLCube® and distributed to each participating data owner’s site where it can be executed and evaluated against local data. MLCubes are also used to distribute and execute a module that properly prepares test data according to the benchmark specifications, and finally an evaluation module manages the evaluation process itself. With MedPerf’s orchestration capabilities,researchers can evaluate multiple AI models through the same collaborators in hours instead of months, making the evaluation of AI models more efficient and less error-prone.

Federated evaluation on MedPerf. Machine learning models are distributed to data owners for local evaluation on their premises without the need or requirement to move their data to a central location. MedPerf relies on the benchmark committee to define the requirements and assets of federated evaluation for a particular use case (https://www.nature.com/articles/s42256-023-00652-2)

“It turns out that data preparation is a major bottleneck when it comes to training or evaluating AI. This issue is further amplified in federated settings where quality and consistency between contributing datasets is harder to assure due to the decentralized setup,” said Alex Karagyris, MLCommons Medical working group co-chair. To address this the MLCommons’ Medical working group focused on developing best practices and tools to alleviate much of the prep burden by enabling data sanity checking and offering pre-annotation through pre-trained models during data preparation. This is tremendously important when running world clinical studies such as the FeTS one.

Partnering with RANO

The MLCommons Medical Working Group, Indiana University, and Duke University have partnered with the Response Assessment in Neuro-Oncology (RANO) cooperative group to pursue a large-scale study demonstrating the clinical impact of AI in assessing postoperative or post-treatment outcomes for patients diagnosed with glioblastoma, rare brain tumors. The second global Federated Tumor Segmentation Study (FeTS 2.0) is an ambitious project involving 50+ participating institutions. It is investigating important questions that hopefully will become a role model for running continuous AI-based global studies. For the MLCommons Medical working group, the FeTS 2.0 project is an opportunity to gain frontline experience with federated evaluation in a real-world medical study.

World map with institutions participating in the clinical study that have completed dataset registration on MedPerf. As of August 2024 fifty-one (51) institutions have joined. Map generated using Plotly Express and Google’s Places API.

For the FeTS 2.0 study, MLCommons is contributing its open-source tools MedPerf and GaNDLF (generally nuanced deep learning framework), along with engineering resources and technical expertise, through its contributing technical partners. MedPerf enables orchestration of AI experiments on federated datasets and GaNDLF delivers scalable end-to-end clinical workflows. Combined they make the mechanism of AI model development, training, and inference more stable, reproducible, interpretable, and scalable, without requiring an extensive technical background.

Since the FeTS 2.0 project requires both federated learning to train AI models and federated evaluation for testing, MedPerf has been extended to integrate with OpenFL, an open source framework for secure federated AI developed by Intel. OpenFL is an easy to use python-based library that enables organizations to train and validate machine learning models on private data without sharing that data with a central server. Data owners retain custody of their data at all times.

Dr. Evan Calabrese, Principal Investigator for the FeTS 2.0 study and Assistant Professor of Radiology at Duke University said, ”global community efforts, like the FeTS study, are essential for pushing the boundaries of clinically relevant AI. Moreover, getting the support and expertise of open community organizations, such as MLCommons, can tremendously improve the outcome and impact of global studies such as this one”.

“The partnership with RANO on FeTS 2.0 has been strong and mutually beneficial,” said Karagyris. “It’s enabled us to ensure that we are on the right path as we continue to develop MedPerf to become a standard for benchmarking AI systems in the medical domain. It has also given us key insights as to where we can add even more value in the future by solving hard problems.” One such challenge is ensuring the quality and consistency of annotations created by human experts; for example, in the case of the FeTS 2.0 project, the marking and segmentation of tumors on an image. Because the image data is held separately at several institutions and can’t be shared, each participating institution has typically tasked its own expert to annotate its portion of the data. However, having people of different experience levels across organizations invites inconsistencies and different levels of quality. The group is exploring multiple alternatives to alleviate this. From deploying pretrained models to assess annotations, to measuring the expertise level, all the way to enabling a pool of experts in a central location to annotate data remotely across institutions, improving both quality and consistency.

Looking forward

“Our ambition extends well beyond the FeTS 2.0 study; through MedPerf we are building open-source benchmarks and solutions to broadly enable evaluation of AI models across healthcare,” said Karagyris. “Everyone in the field benefits from independent, objective benchmarks of AI’s application to medicine that let us measure how much and how quickly the systems are getting better, faster, and more powerful. The path to advancing the state of the art in medical AI is by opening avenues for institutions to collaborate and contribute data to medical studies as well as to collective efforts such as MLCommons benchmarking, while ensuring that they remain compliant with regulations protecting health data.”

Read the Full FeTS study report

In the spirit of transparency and openness, the group created a full report that describes current progress, technical details, challenges, and opportunities. We believe that sharing our experiences with the broad community can help facilitate future improvements in current practices.

We aim to share more updates with the community as the study progresses.

Join Us

MLCommons welcomes additional community participation and feedback on its ongoing work. You can join the AI Medical working group here.

The post MLCommons Medical WG Supports FeTS 2.0 Clinical Study with MedPerf and GaNDLF appeared first on MLCommons.

Comprehensive Open Federated Ecosystem (COFE) Presented at ISBI 2024

MLCommons — Mon, 05 Aug 2024 15:00:00 +0000

The MLCommons® Medical working group recently presented a “Federated Learning in Healthcare” tutorial at IEEE International Symposium on Biomedical Imaging (ISBI) 2024. The tutorial provided participants an introduction and hands-on experience to the Comprehensive Open Federated Ecosystem (COFE). COFE is an open ecosystem of tools and best practices distilled from years of industrial and academic research experience. It aims to streamline the research and development of AI/ML models within the clinical setting.

An AI-enabled clinical study under COFE. Each step involves actions carried out in the tools offered within COFE

In a nutshell, COFE consists of the following tools working seamlessly:

The Generally Nuanced Deep Learning Framework (GaNDLF) – https://gandlf.org/
MedPerf – https://medperf.org/
OpenFL – https://openfl.io/

A fundamental principle of COFE is its federated approach which enables researchers to train and evaluate AI across multiple institutions without sharing their sensitive patient data. In this way COFE facilitates a clinical study on real-world data in a more efficient and secure way.

During the tutorial session Spyridon (Spyros) Bakas, MLCommons Medical working group’s vice chair for benchmarking & clinical translation and director of the Center for Federated Learning in Medicine and for the Computational Pathology Division at Indiana University, introduced attendees to the benefits of federated AI by enabling large global efforts such as FeTS, and for advancing generalization of AI on diverse datasets.

Running federated AI experiments requires simplicity and efficiency given the technical complexities introduced by multi-institute participation. Sarthak Pati, software architect at Indiana University & technical lead at (MLCommons, shared new insights on how to easily train and design custom models using MLCommons’ GaNDLF in real-world settings. Specifically, attendees were presented with examples of how GaNDLF provides an easy to use solution to create end to end clinical workflows for training and deploying AI solutions across multiple problem domains (e.g., segmentation, classification, and regression).

Overview of computational tasks available with GaNDLF

To address smooth data preparation and efficient orchestration and monitoring of federated AI experiments, MedPerf was introduced at Nature Machine Intelligence in 2022. Hasan Kassem, ML engineer at MLCommons, demonstrated through hands-on examples how MedPerf simplifies data preparation and experient orchestration in a real-world clinical study through hands-on examples. To accomplish this, MedPerf is built on a simple architecture relying on transparent actions performed via key roles and responsibilities of stakeholders in a medical AI experiment.

Orchestration of a federated AI experiment using MedPerf

Siddhesh Thakur, data engineer at Indiana University School of Medicine, showcased optimization workflows and their practical application in deploying AI models for whole slide image analysis in resource-constrained environments. Thakur’s presentation explored key concepts of model optimization, demonstrating how GaNDLF facilitates this process by integrating with Intel’s OpenVINO toolkit. Through online examples, attendees learned how to leverage these tools to create efficient, deployable AI solutions for digital pathology, even in settings with limited computational resources.

When it comes to federated AI experiments a key step is training AI models in a federated learning fashion. Federated learning can take place in various topologies (vertical vs horizontal) adding technical complexity in real world federations, such as data splits. Meanwhile securing the process is paramount for federated settings, increasing further complexity. Intel’s OpenFL, hosted by The Linux Foundation, is a flexible, extensible and easily learnable tool, designed for data scientists, that aims to build federated learning settings to address these issues. Walter Riviera, AI technical lead at Intel and Ph.D. candidate at University of Verona, demonstrated running federated learning on various topologies, and shared the challenges of implementing FL across both axis. In particular, new challenges with data, model, hardware, and frameworks architecture heterogeneity reflect the need for the research community to continue to expand on the utility of aggregation functions, improve security, and explore better ways to enhance communication speed.

The Federated Learning in Healthcare tutorial was made possible through GitHub Codespaces with generous technical support from their engineers and program managers working in close collaboration with the tutorial organizers. More information about the tutorial can be found here.

We would also like to thank the organizing committee of ISBI 2024 and in particular the conference organizers: Konstantina S. Nikita Conference. Christos Davatzikos, Spyretta Golemati, and Elisa E. Konofagou.

Federated healthcare AI experts presenting components of COFE in Athens at ISBI 2024. From left to right: Hasan Kasem presented MedPerf, a federated evaluation platform to run benchmarks of medical AI models in real-world data securely and privately. Siddhesh Thakur presented an example of federated learning in digital pathology in low-resource environments using GaNDLF. Spyridon Bakas introduced the audience to the importance of federated datasets in healthcare as well as federated learning concepts

Join Us

COFE addresses key challenges that the research community faces in the federated healthcare AI space through open community tools. As an organization we strive to collect feedback from the broader healthcare research community to help build tools that advance AI/ML for everyone. If you are interested in participating in the MLCommons Medical working group please join here.

The post Comprehensive Open Federated Ecosystem (COFE) Presented at ISBI 2024 appeared first on MLCommons.

Announcing MedPerf Open Benchmarking Platform for Medical AI

MLCommons — Mon, 17 Jul 2023 08:20:00 +0000

Medical Artificial Intelligence (AI) has tremendous potential to advance healthcare and improve the lives of everyone across the world, but successful clinical translation requires evaluating the performance of AI models on large and diverse real-world datasets. MLCommons®, an open global engineering consortium dedicated to making machine learning better for everyone, today announces a major milestone towards addressing this challenge with the publication of Federated Benchmarking of Medical Artificial Intelligence with MedPerf in the Nature Machine Intelligence journal.

MedPerf is an open benchmarking platform that efficiently evaluates AI models on diverse real world medical data and delivers clinical efficacy while prioritizing patient privacy and mitigating legal and regulatory risks. The Nature Machine Intelligence publication is the result of a two year global collaboration spearheaded by the MLCommons Medical Working Group with participation of experts from 20+ companies, 20+ academic institutions, and nine hospitals across 13 countries.

Validating Medical AI Models Across Diverse Populations

Medical AI models are usually trained with data from limited and specific clinical settings, which may lead to unintended bias with respect to specific patient populations. This lack of generalizability can reduce the real world impact of Medical AI. However, getting access to train models on larger diverse datasets is difficult because data owners are constrained by privacy, legal, and regulatory risks. MedPerf improves medical AI by making data across the world easily and safely accessible to AI researchers, which reduces bias and improves generalizability and clinical impact. “Our goal is to use benchmarking as a tool to enhance Medical AI,” said Dr. Alex Karargyris, MLCommons Medical co-chair. “Neutral and scientific testing of models on large and diverse datasets can improve effectiveness, reduce bias, build public trust, and support regulatory compliance.”

Federated Evaluation Enables Validation of AI While Ensuring Data Privacy

Critically, MedPerf enables healthcare organizations to assess and validate AI models in an efficient and human-supervised process without accessing patient data. The platform’s design relies on federated evaluation in which medical AI models are remotely deployed and evaluated within the premises of data providers. This approach alleviates data privacy concerns and builds trust among healthcare stakeholders, leading to more effective collaboration.

Intel Labs’ Research Scientist and MedPerf technical project lead Micah Sheller said, “Transparency lies at the core of the MedPerf security and privacy design. Information security officers must know about every bit of information they are being asked to share and with whom it will be shared. This requirement makes an open-source, non-profit consortium like MLCommons the right place to build MedPerf.”

Federated evaluation on MedPerf. Machine learning models are distributed to data owners for local evaluation on their premises without the need or requirement to extract their data to a central location.

MedPerf’s Orchestration Capabilities Streamline Research from Months to Hours

MedPerf’s orchestration and workflow automation capabilities can significantly accelerate federated learning studies. “With MedPerf’s orchestration capabilities we can evaluate multiple AI models through the same collaborators in hours instead of months,” explains Dr. Spyridon Bakas, Assistant Professor at the University of Pennsylvania’s Perelman School of Medicine and the vice chair for benchmarking & clinical translation for the MLCommons Medical working group.

This efficiency was demonstrated in the Federated Tumor Segmentation (FeTS) Challenge, the largest federated experiment on Glioblastoma. The FeTS Challenge spans 32 sites across 6 continents and successfully employed MedPerf to benchmark 41 different models. Thanks to active involvement by Dana-Farber, IHU Strasbourg, Intel, Nutanix, and University of Pennsylvania teams, MedPerf was also validated through a series of pilot studies representative of academic medical research. These studies involved public and private data across on-prem and cloud technology including brain tumor segmentation, pancreas segmentation, and surgical workflow phase recognition.

“It’s exciting to see the results of MedPerf’s Medical AI pilot studies, where all the models ran on hospital’s systems, leveraging pre-agreed data standards, without sharing any data,” said Dr. Renato Umeton, Director of AI Operations and Data Science Services in the Informatics & Analytics department of Dana-Farber Cancer Institute and Co-Chair of the MLCommons Medical Working Group. “The results reinforce that benchmarks through federated evaluation are a step in the right direction toward more inclusive AI-enabled medicine.”

MedPerf partnered with Sage Bionetworks to build ad-hoc components required for the FeTS 2022 and the BraTS 2023 challenges atop the Synapse platform serving 30+ hospitals around the world, as well as with Hugging Face to leverage its Hub platform and demonstrate how new benchmarks can utilize the HF infrastructure. To enable wider adoption, MedPerf supports popular ML libraries that offer ease of use, flexibility, and performance – e.g., fast.ai. It also supports private AI models or AI models available only through API, such as Microsoft Azure OpenAI Services, Epic Cognitive Computing, and HF inference points.

Extending MedPerf to Other Biomedical Tasks

While initial uses of MedPerf focused on radiology, it is a flexible platform that supports any biomedical task. Through its sister project GaNDLF, which focuses on quickly and easily building ML pipelines, MedPerf can accommodate multiple tasks such as digital pathology and omics. And supporting the open community, MedPerf is developing examples for the specialized low-code libraries in computational pathology, such as PathML or SlideFlow, Spark NLP, and MONAI, to fill the data engineering gap and provide access to state-of-the-art pre-trained computer vision and natural language processing models.

A Foundation to Evaluate and Advance Medical AI

MedPerf is a foundational step towards the MLCommons Medical Working Group’s mission to develop benchmarks and best practices to accelerate Medical AI through an open, neutral, and scientific approach. The team believes that such efforts will increase trust in Medical AI, accelerate ML adoption in clinical settings, and ultimately enable Medical AI to personalize patient treatment, reduce costs, and improve both healthcare provider and patient experience.

The team would like to acknowledge the publication co-authors for their valuable contributions.

Call for Participation

To continue to drive Medical AI innovation and bridge the gap between AI research and real-world clinical impact, there is a critical need for broad collaboration, reproducible, standardized, and open computation, and a passionate community that spans academia, industry, and clinical practice. We invite healthcare professionals, patient advocacy groups, AI researchers, data owners, and regulators to join the MedPerf effort.

Publication

Karargyris, A., Umeton, R., Sheller, M.J. et al. Federated benchmarking of medical artificial intelligence with MedPerf. Nat Mach Intell 5, 799–810 (2023).

The post Announcing MedPerf Open Benchmarking Platform for Medical AI appeared first on MLCommons.

Launching GaNDLF for Scalable End-to-End Medical AI Workflows

MLCommons — Tue, 16 May 2023 08:33:00 +0000

By Sarthak Pati, Alexandros Karargyris, Spyridon Bakas

Today we are excited to share the publication of the Generally Nuanced Deep Learning Framework (GaNDLF) in the Nature Communications Engineering journal. This is a major milestone for the MLCommons® Medical Working Group, as GaNDLF describes the outcome of a global collaboration by distinct research groups spanning industry and academia. GaNDLF is a powerful framework following zero/low code principles and hence alleviating programming requirements for medical researchers to easily take advantage of the latest AI developments and analyze healthcare data. It enables analysis of multiple data types (e.g., radiology, histology) and tasks (e.g., segmentation, classification), while being agnostic to anatomies and pathologies. GaNDLF is the first step towards creating a standardized mechanism to define end-to-end healthcare AI applications into larger clinical workflows.

Creating a General-Purpose Framework for Healthcare AI

GaNDLF was inspired by our work on the Federated Tumor Segmentation (FeTS) initiative. We realized that we needed a general-purpose framework for healthcare AI that could define end-to-end clinical workflows for various healthcare use-cases such as tumor segmentation and molecular classification. To address this need, we built GaNDLF, which is the first healthcare-focused framework that can handle multiple types of data and AI workloads (see Figure 1).

Figure 1: Current amalgamation of the functionality of GaNDLF.

A Zero/Low Code Approach

There is tremendous potential for AI to solve problems in healthcare such as tumor segmentation, but adoption in a clinical setting is very challenging. And while clinical researchers have a deep understanding of different problems in healthcare, they often lack expertise in AI and may not be able to design robust solutions.

GaNDLF addresses this issue through a zero/low code approach to help clinical researchers focus-in on the healthcare challenges and abstract away the complexity of AI tools. For example, using GaNDLF, a medical researcher can efficiently train a model without writing a single line of code. GaNDLF performs automatic post-training optimization so trained models do not require any specialized hardware during inference, easing deployment. Additionally, GaNDLF seamlessly integrates with OpenFL, enabling researchers to easily use federated learning across multiple institutions and patient populations, greatly increasing the quality and diversity of data for training and evaluation and ultimately leading to better and more robust treatments.

Faster Prototyping and Baseline Results

GaNDLF’s focus on zero/low code principles and state-of-the-art algorithms enables fast prototyping and generation of baseline results in a fraction of the time of traditional methods. The ability to quickly and easily deploy trained models on a variety of host systems without requiring specialized hardware ensures that models trained using GaNDLF are usable in low resource environments. GaNDLF has been extensively tested and validated to ensure its accuracy, and the team is confident that it will quickly become an essential tool in medical research and practice.

Call for Participation in the GaNDLF Community-Driven Effort

Open, inclusive, collaborative efforts, such as GaNDLF, can drive innovation and bridge the gap between AI research and real-world clinical impact. To achieve these benefits, there is a critical need for broad collaboration, reproducible, standardized and open computation, and a passionate community that spans academia, industry, and clinical practice. We welcome additional participation and look forward to your joining us.

We call for the following:

Clinical researchers interested in training AI models without having to write a single line of code. GaNDLF enables them to generate baseline results for their data in a quick and reproducible manner. Additionally, models trained using GaNDLF do not require any specialized hardware for deployment.
Computational researchers interested in increasing the applicability and reach of their algorithms. By developing their method on GaNDLF, it ensures their method(s) can be applied across different data types and application domains in healthcare.
Group Leaders in academia who want to ensure reproducibility and continuity of research. GaNDLF provides a common integration API for different computational and processing pipelines, ensuring that clinical workflows can be maintained after individual researchers leave the group.
Anyone interested in researching federated learning studies can leverage GaNDLF’s seamless integration with Linux Foundation’s OpenFL and MLCommons’ MedPerf to facilitate no-code design of federations.

We would like to thank our co-authors of the Nature Communications Engineering article for their valuable contributions throughout the development and research phase of GaNDLF. This effort would not be possible without the generous support from members of the following organizations (in alphabetical order): Dana-Farber, Google, Intel, MLCommons, University of Edinburgh, and University of Pennsylvania. For more information, please read the Nature Communications Engineering paper and consider joining the MLCommons Medical working group to contribute to our open-source efforts, including GaNDLF.

The post Launching GaNDLF for Scalable End-to-End Medical AI Workflows appeared first on MLCommons.