The ARPA-H Biomedical Data Fabric (BDF) Toolbox program aims to consolidate research data from a variety of health disciplines and resolve inconsistencies in how research data is currently stored and shared.
Funding for awardees varies in amount and is contingent upon the recipient meeting aggressive milestones specific to their project.
The BDF Toolbox performers are:
- The University of Alabama, Birmingham, is leading a team to create an end-to-end toolchain for collecting, harmonizing, and transforming complex series of cancer and rare disease biomedical data, with real-time digital simulations of patients to predict outcomes, customize treatments and monitor treatment effectiveness. This research will generalize across a multitude of disease types.
- Boston Children’s Hospital is leading a team to provide a care delivery system at scale to provide rapid and accurate detection of adverse drug events for patients during treatment using computer-friendly electronic health record data collection.
- DNA HIVE*, with sites in Rockville, Md., New York City, Pasadena, Calif., Nashville, Tenn., and Hinxton, England, is leading a team that will provide infrastructure and tools to enable secure federated query and access to cancer health records and medical device data to investigate disease complexity and outcomes.
- Stanford University, Stanford, Calif., is leading a team to develop AI-augmented clinical decision support tools based on foundational multimodal models that can be used by clinicians to quickly diagnose and predict effective treatments for cancer patients, as well as help patients understand and manage their outcomes.
- Harvard Medical School, Boston, is leading a team aiming to build a data-centric, visual user interface designed to support dynamic exploration and discovery of any biomedical data resource, focused on cancer research.
- Charles River Analytics* in Cambridge, Mass., is leading a team that aims to apply testing and evaluation methods to foster user-centered iterative development of the BDF tools.
- Insilicom LLC*, based in Tallahassee, Fla., is developing tools for automatic FAIR (Findable, Accessible, Interoperable, Reusable) data collection from scientific literature and published research data using AI-assisted query with natural language inputs.
- Sage Bionetworks, based in Seattle, Wash., is building a toolset for automated privacy- and attribution-preserving data collection and curation, enabling data use transparency and digital dignity.
- Netrias*, based in the Washington, D.C. area, is developing an AI-assisted data curation tool with standard ontology terms to build an intuitive user interaction interface and enable cancer data exploration.
- ICF Incorporated LLC, with offices around the U.S. and globally, is leading a team focusing on bridging data silos with intuitive, no-code, AI-enabled query tools to allow diverse stakeholders to easily explore proteomic, clinical, and other data types.
- A team led by the University of Chicago is developing a medical imaging data hub and tools that will enable measurements and multi-modal knowledge integration for radiology and clinical data.
- A team led by University of North Carolina at Chapel Hill and the Johns
Hopkins University is harmonizing and linking data from clinical, claims, mortality, imaging, phenotypes, and other modalities to facilitate clinical research. - A team led by Northeastern University is building tools to interrogate cell signaling pathways for semantic integration and knowledge assembly to enhance human-in-the-loop interpretation of research data.
- A team led by Renaissance Computing Institute and University of North Carolina at Chapel Hill is integrating Large Language Models (LLMs) with drug knowledge graphs to enable federated cross-repository queries and connections for cancer therapies.
- A team led by New York University is creating a new kind of dataset AI search engine for meta-analysis across published manuscripts, data, supplementary materials and data repositories.
- A team led by the Massachusetts Institute of Technology is developing automated software for data extraction from scientific papers, medical images, and source code to formulate computational-experiment trees to accelerate new discoveries from already published data.
- A team led by Jataware* is building an AI-assisted notebook environment that allows users to search for data and perform tasks from within the same environment, leading to enhanced capabilities for analysis of multimodal data. Use cases focus on retrieval of relevant cancer data and differentiation of tumor classes.
*Small business