ARPA-H announces effort to develop single data system for biomedical research

Published

ARPA-H announces effort to develop an integrated data toolbox improving access to biomedical research to bolster innovation  

Multi-organization teams selected to create bridges across data silos to make health data more accessible and usable  

The Advanced Research Projects Agency for Health (ARPA-H), an agency within the U.S. Department of Health and Human Services (HHS), today announced performer teams to develop the Biomedical Data Fabric (BDF) Toolbox. The BDF Toolbox will consolidate research data from a variety of health disciplines and resolve inconsistencies in how research data is currently stored and shared.   

What does this mean? Biomedical research data will be more accessible for the development of advanced health innovations and breakthroughs.   

Clinical care centers and biomedical research labs generate large volumes of data, but the ability to quickly combine data to make informed decisions and accelerate the development of next-generation health interventions remains elusive.   

“We launched this program to enable advancements in medical research that will improve health outcomes for Americans. The BDF Toolbox products will allow doctors to have easier access to a vast breadth of data so that they can make the best-informed decisions possible, with their patients,” said ARPA-H Director Renee Wegrzyn, Ph.D. 

The Toolbox’s newly discoverable data and associated algorithms will be freely available on platforms such as GitHub and other web applications. 

“Patients will be able to be more proactive in seeking the data they need by searching for information to questions they might have,” Wegrzyn added.   

The BDF Toolbox effort aims to improve patients’ health outcomes by democratizing access to biomedical data and creating open-source tools developed with rigorous metrics to ensure the algorithms can help overcome technical barriers to data integration and use.    

The large-scale program includes transition partners such as the National Cancer Institute (NCI), National Institute of Biomedical Imaging and Bioengineering (NIBIB), and National Heart, Lung, and Blood Institute (NHLBI), as well as a diversity of business and academic groups to ensure the research tools developed can be adopted for patient benefit. 

“ARPA-H's BDF program leverages research from decades of NIH's investment in data science and will develop tools that rapidly scale use and integration of data so that health care givers can apply it to patient care plans,” added Wegrzyn.    

ARPA-H's support of a broad diversity of performers enables the teamwork required to bridge across data siloes and enhance the usability of biomedical research data for patient use cases.   

The following performer teams were selected with an initial focus on cancer and rare disease data, eventually expanding to other diseases to maximize the tool’s application to a broader population:   

  • The University of Alabama, Birmingham, is leading a team to create an end-to-end toolchain for collecting, harmonizing, and transforming complex series of cancer and rare disease biomedical data, with real-time digital simulations of patients to predict outcomes, customize treatments and monitor treatment effectiveness. This research will generalize across a multitude of disease types.   
  • Boston Children’s Hospital is leading a team to provide a care delivery system at scale to provide rapid and accurate detection of adverse drug events for patients during treatment using computer-friendly electronic health record data collection.   
  • DNA HIVE*, with sites in Rockville, Md., New York City, Pasadena, Calif., Nashville, Tenn., and Hinxton, England, is leading a team that will provide infrastructure and tools to enable secure federated query and access to cancer health records and medical device data to investigate disease complexity and outcomes.   
  • Stanford University, Stanford, Calif., is leading a team to develop AI-augmented clinical decision support tools based on foundational multimodal models that can be used by clinicians to quickly diagnose and predict effective treatments for cancer patients, as well as help patients understand and manage their outcomes.   
  • Harvard Medical School, Boston, is leading a team aiming to build a data-centric, visual user interface designed to support dynamic exploration and discovery of any biomedical data resource, focused on cancer research.   
  • Charles River Analytics* in Cambridge, Mass., is leading a team that aims to apply testing and evaluation methods to foster user-centered iterative development of the BDF tools.   
  • Insilicom LLC*, based in Tallahassee, Fla., is developing tools for automatic FAIR (Findable, Accessible, Interoperable, Reusable) data collection from scientific literature and published research data using AI-assisted query with natural language inputs.   
  • Sage Bionetworks, based in Seattle, Wash., is building a toolset for automated privacy- and attribution-preserving data collection and curation, enabling data use transparency and digital dignity.    
  • Netrias*, based in the Washington, D.C. area, is developing an AI-assisted data curation tool with standard ontology terms to build an intuitive user interaction interface and enable cancer data exploration.   
  • ICF Incorporated LLC, with offices around the U.S. and globally, is leading a team focusing on bridging data silos with intuitive, no-code, AI-enabled query tools to allow diverse stakeholders to easily explore proteomic, clinical, and other data types.    
  • A team led by the University of Chicago is developing a medical imaging data hub and tools that will enable bias and equity measurements and multi-modal knowledge integration for radiology and clinical data.   
  • A team led by University of North Carolina at Chapel Hill and the Johns    
    Hopkins University is harmonizing and linking data from clinical, claims, mortality, imaging, phenotypes, and other modalities to facilitate clinical research.   
  • A team led by Northeastern University is building tools to interrogate cell signaling pathways for semantic integration and knowledge assembly to enhance human-in-the-loop interpretation of research data.   
  • A team led by Renaissance Computing Institute and University of North Carolina at Chapel Hill is integrating Large Language Models (LLMs) with drug knowledge graphs to enable federated cross-repository queries and connections for cancer therapies.   
  • A team led by New York University is creating a new kind of dataset AI search engine for meta-analysis across published manuscripts, data, supplementary materials and data repositories.   
  • A team led by the Massachusetts Institute of Technology is developing automated software for data extraction from scientific papers, medical images, and source code to formulate computational-experiment trees to accelerate new discoveries from already published data.    
  • A team led by Jataware* is building an AI-assisted notebook environment that allows users to search for data and perform tasks from within the same environment, leading to enhanced capabilities for analysis of multimodal data. Use cases focus on retrieval of relevant cancer data and differentiation of tumor classes.    

*Small business