Biomedical Data Fabric Toolbox Teaming Profiles

Thank you for showing an interest in the ARPA-H Biomedical Data Fabric Toolbox. This page is designed to help facilitate connections between prospective proposers. If either you or your organization are interested in teaming, please submit your information via the form below. Your details will then be added to the list, which is publicly available.

ARPA-H anticipates that teaming will be necessary to achieve the goals of the ARPA-H Biomedical Data Fabric Toolbox. Prospective performers are encouraged (but not required) to form teams with varied technical expertise to submit a research proposal.

BDF Toolbox Teaming Profile Form

Please note that by publishing the teaming profiles list, ARPA-H is not endorsing, sponsoring, or otherwise evaluating the qualifications of the individuals or organizations included here. Submissions to the teaming profiles list are reviewed and updated periodically.

Interested in learning more about the ARPA-H Biomedical Data Fabric Toolbox?

Teaming Profiles List

To narrow the results in the Teaming Profiles List, please use the input below to filter results based on your search term. The list will filter as you type.

Organization Contact Information Location Describe your organization's current research focus areas Tell us what your organization can add to the ARPA-H BDF Toolbox and potential teaming partners Tell us about your organization's strengths and experience Tell us what your organization is looking for in potential teaming partners Which technical areas within the ARPA-H BDF Toolbox does your organization have the capacity to address?
Peraton Labs Ritu Chadha (rchadha@peratonlabs.com)
Additional: rizmailov@peratonlabs.com
Basking Ridge, NJ Peraton Labs employees focus on R&D for a broad set of customers across the DoD and IC, including DARPA and IARPA. We have deep expertise in AI/ML, both in basic research as well as development of robust AI models applied to areas like cybersecurity, networking, machine vision, healthcare, natural language processing, autonomy, etc. We are leaders in the space of adversarial machine learning applied to several of the domains listed above. Peraton Labs has strengths in AI/ML research and development. We have extensive experience in data processing, curation, and the development of ML pipelines for building ML applications. We also have strong software engineering expertise. With a distinguished heritage tracing back to Bell Labs, Bellcore, DHPC, and Telcordia, our experts build the future. We have created foundational technologies for diverse applications—​from resilient high-speed networking, critical infrastructure cybersecurity and quantum communications, to AI-powered network management, multimedia media information analytics and laser-based countermeasures.

Our legacy includes:
- Development and refinement of key information and communication technologies, including SONET, DSL, ATM, MIME (email), Video-on-Demand, DRM, Voice over IP, 3G / 4G / LTE, IPv6, and more
- Primary contributions to ‘firsts’ of the past including, the first multimedia email attachment, first wide-area ATM / SONET gigabit network, the oldest blockchain in the world, and the Army’s first state-of-the-art laboratory for testing infrared laser-based countermeasures
- Significant and leading roles in the development of open architectures and national and international standards

We continue to build on this foundation of innovation to deliver mission critical technology research and high-value solutions to customers across defense, telecom, energy, finance, government, transportation, life sciences and the intelligence community.
We are seeking to complement our expertise in AI/ML and software development with experts in the domain of healthcare.
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
Andromeda Tech Chris Pope (chris.pope@androm.tech)
Additional: omar.tabbaa@androm.tech
Columbus, OH Andromeda Tech is a medtech start-up developing advanced analytics to support the biopharma, med device, and healthcare industries with evidence-based decision-making for new therapy development and implementation. We leverage our advanced analytics platforms and unique datasets to conduct primary and secondary research to validate the clinical need, understand existing treatment workflows and patient sequencing, and provide micro segmentation to inform new and emerging therapy development. Our experience working at the interface between biopharma, medical device, and healthcare systems, provides a unique perspective of challenges and use cases associates with biomedical data, as well as the ability to draw on a breadth of potential teaming partners through our network. At Andromeda Tech we are pushing the State of the Art (SoA) through the standardization of survey design, development of a common data model that is integrable with other data sources (e.g., medical claims, EMR, clinical trials, etc.), and automation around fielding, analysis, and triangulation of data sources in our Market Intelligence Engine platform. While our current platform is tailored more towards the needs of our clients, we believe that it is scalable and is generalizable to reliably generate the biomedical data required to efficiently and effectively extract the clinical insights required to improve healthcare for the American people. Andromeda Tech is looking to operationalize a breadth and depth of biomedical data and is looking to partner with data generators/holders and subject mater experts.
  • Technical area 1: High-fidelity, automated data collection
Sparta Science Tim Clark (tim@spartascience.com)
Additional: scott@spartascience.com
Menlo Park, CA Sparta Science is a commercial healthcare technology company focused on providing solutions for understanding and enabling Movement Health - the patterns of human movement such as physical activity, sleep, and sedentary behaviors, and how these are associated with and potentially cause downstream health and performance outcomes. We currently work with several DoD organizations looking to improve warfighter human performance through an enhanced understanding of how neuromuscular readiness, fatigue, recovery, and stressors impact individuals and groups. We focus on robust collection and management of movement health data from wearable sensors, force plates, subjective questionnaires, and laboratory measurements, and use these data to support the development of machine learning models for predicting health-related outcomes.

We have several academic and industry partners with whom we work to validate our findings with various populations. For example, our research evaluating the associations between machine learning-generated balance measures using force plates and assertions of neuromuscular state has been applied across domains ranging from human performance (e.g. warfighter readiness) to elder care (e.g. fall risk modeling).
Movement health data and analytics are currently underutilized when it comes to understanding and managing public health challenges. Many of the most critical public health issues, including the impacts of many cancers, are in some way driven by individual movement health patterns and profiles. We intend to support the BDF Toolbox effort by providing both technical expertise focused on collecting and managing movement health at scale, as well as deep domain knowledge about the research and clinical value of these types of data and analytic outputs.

Our particular interest in BDF Toolbox participation is to extend the value of movement health data by building a more inclusive participation environment. Data from commercial wearables is currently limited to a small subset of the American public. We would like to challenge this limitation through the application of privacy-preserving and enhancing technologies - at the platform level - to promote buy-in for individuals to participate in a public health data enterprise.
Sparta Science offers a unique balance of commercial product development and deep research and development capabilities, particularly in support of government R&D organizations. Our focus on all dimensions of movement health data and analytics, as well as our commercial objectives for building and maintaining a robust Movement Health Platform have contributed to our commercial and R&D success since our founding in 2014. We are specifically looking for partners with a desire to leverage movement health data to understand the role movement profiles play in the prevention and treatment of various cancers. We seek to join a team with the level of cancer research experience expressed in the BDF Toolbox topic description.
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
DataBiologics Leah Braddell (leah@databiologics.com)
Additional: luke@databiologics.com
Gilbert, AZ DataBiologics is solving a major problem with the lack of evidence and transparency for safety and efficacy in the emerging field of Regenerative Medicine. Today, these therapies are primarily cash-pay and use the patient’s own tissue healing response or cells as the source of treatment, such as those found in blood, fat, and bone marrow.

Our software enables physicians with a simple, yet highly-tailored data capture solution to personalize the treatment decisions and experience for patients, and also ethically drive adoption of new techniques by physicians.

Our proprietary database, generated exclusively by our software application, combined with machine learning and AI algorithms, is delivering never-before accessed insights to advance patient access to emerging treatments.
DataBiologics has broad experience across emerging technologies and therapies in medicine and how to extract real-world data that better enables adoption and scale. Medicine is notoriously slow to adopt new therapies due to access, belief, and coordination barriers. Today, we have built the worlds largest comprehensive registry in the field of Regenerative Medicine with our simple and automated data collection application.

Our simple and intuitive app has successfully driven a 72% capture rate for long-term patient reported outcomes and combined that with nuanced treatment details, including cellular composition data, product utilization, and patient profile characteristics. Due to the cash-pay nature of these treatments, there is virtually no unified and high volume data available in any existing sources such as EHR's or claims.

We can provide both unique data and expertise on developing useful registry for emerging treatments to any of our partners.
Our founders are physicians who have dedicated their entire careers to studying, teaching, and advancing alternative regenerative treatments for musculoskeletal conditions, which is one of the broadest categories of disease and dysfunction affecting the population today. Their networks, expertise and real-world workflow and adoption experience has directly influenced software and database development.

Our CEO & Chief Data Officer brings 20 years broad experience in healthcare, with deep expertise in developing novel databases and making insights simple and accessible to broad stakeholders ranging from clinicians, to hospital administrators, to payers, and patients. She has vast experience marketing, selling, and technically implementing disruptive and emerging technologies as well as adapting critical change-management principles to drive evidence-based adoption.
Today, our strengths are in the areas of automated data collection and intuitive exploration. We would like to partner with others who have expertise in AI-assisted exploration. We have so far curated our database to deploy supervised machine learning, but would like to quickly expand our capabilities to ingest additional data and augment these processes so that full-scale AI can become a reality.
  • Technical area 3: Enhance data usability with intuitive exploration
Illumina Mike Lelivelt (mlelivelt@illumina.com)
Additional: nmagallanes@illumina.com
San Diego, CA To integrate next generation genotypic data with population level phenotypic data using a common technical infrastructure for the purposes of discovery, diagnostic development, and population use cases. Illumina's informatics infrastructure - collectively Illumina Connected Software - offers both researchers and clinicians a shared infrastructure to support lab operations at scale to integration multiomic data analysis at the primary, secondary and tertiary analysis. Leverage Illumina's infrastructure with direct integration with the largest fleet of sequencers to power bioinformatics discovery. Illumina was founded in 1998 in San Diego, CA, USA based on microarrays and the genome. 25 years later, we are a $4.5B+ annual company serving a wide variety of sequencing applications and the informatics associated with it. Parts of our system are highly functional, but could still grow and develop. The UK BioBank solved the population genomics problem by centralizing data in a single location. For multi-omits data to scale, systems of federated data sharing need to be developed. We are looking for the resources and customer base to drive that development. We are willing for others to leverage our current systems to help drive our mutual improvements. Our current systems are well set up for multi-omics with AI capabilities.
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
New York University Juliana Freire (juliana.freire@nyu.edu)
Additional: csilva@nyu.edu
New York, NY An overarching goal of our research is to methods and systems that enable a wide range of users to obtain trustworthy insights from data. Our research spans topics in large-scale data analysis, curation and integration, visualization, HCI, machine learning, provenance management, and data discovery. Some of our recent work related to BDF includes the use of foundational models to perform column type annotation and enable automated data integration. In the area of data discovery, we have been developing sketching strategies to support data discovery in large, distributed data repositories through data-driven queries (e.g., given a dataset D, discover other datasets that are correlated with D). Links to some of our papers can be found at https://scholar.google.com/citations?user=sSzAlq0AAAAJ&hl=en, and tools can be found in our github repository https://github.com/VIDA-NYU. We have done research on many of the challenges involved in TA2 and TA3
In computer science, we bring the interdisciplinary expertise of the NYU VIDA Center, including in data management and engineering, machine learning, visualization, as well as the development of open-source tools. We have been performers and lead investigators in several DARPA projects, including Memex, D3M, and PTG. We also have a history of successful collaborations with biomedical researchers. On the biomedical research side, we bring expertise in computational biology and genomics, and also a long track record of developing methods for integrating data from multiple technologies—including mass spectrometry, sequencing, and microscopy— which have provided a wide array of powerful tools to discover and verify biomarkers and therapeutic targets in cancer.
We can contribute to both TA2 and TA3.
see above
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
MyLigo Jim St.Clair (jim.stclair@myligo.io)
Additional: ladd.hanson@myligo.io
Austin, TX MyLigo offers a patient-centric identity and privacy application to verify identity and support privacy-preserving data exchange is a JSON XML format, using internationally supported open source components. Provide province-preserving and verifiable patient data to support federated learning and privacy preserving machine language (PPML) models Our platform is based on the research effort that created the Medilinker project at the University of Texas (Austin). We need additional strength in UI/UX development for our existing back-end infrastructure. We wish to partner to support privacy preservation and data provenance as part of the data fabric toolkit.
  • Technical area 1: High-fidelity, automated data collection
MDIX, Inc. Kenneth (Ken) Lord (klord@mdixinc.com)
Additional: smuir@mdixinc.com
Boston Area, MA MDIX focuses on lowering the time, resources, and costs of providing information exchange and semantic normalization in the healthcare domain. Recently, through work with the Administration for Children and Families and the HL7 Human and Social Services Working Group, we have extended these efforts to include the human and social services domains.
We, along with others, have recently completed work for Version 2.0 of the Object Management Group’s Model Driven Message Interoperability (MDMI) Standard. The focus of this Version 2 standard is to link the data models with semantic models to provide clear, precise, unambiguous concepts and terms required for interoperability and the development of stakeholder-defined contextual views.
Behind all of these efforts, MDIX uses a model-driven approach using the appropriate modeling languages. Our research continues to link different contextual models with the goal of providing simple, seamless, federated views driven by a specific situational context.
MDIX has tooling based on OMG’s MDMI, other OMG, and W3C standards. We have contextualized this tooling with HL7 and other healthcare and human services Standards. Our model-based approach has been integrated into and with other frameworks.
MDIX’s tooling automates the production of computable transformation models while providing an extensible content and contextual knowledge base necessary with reusable, configurable software components to produce lightweight, custom software services and interfaces.
Teaming partners who have been and could be involved in the above efforts are the VA, Mayo Clinic, Oracle / Cerner, Google, SmileCDR, Model Driven Solutions, HHS ACF, HSS SAMHSA, the ONC, and Apex Evaluation. With these partners, MDIX has delivered open-source solutions for organizations ranging from large healthcare systems to Health Information Exchanges to small community-based organizations.
MDIX has intellectual capital in model-driven application development in the healthcare and human services domains. The two founders of MDIX originally worked together on different charter projects for the Open Health Tools consortium. One was the MDMI project and the other was the MDHT (Model Driven Health Tools) project. The latter was sponsored by the VA and used by NIST and the ONC for validation testing of HL7 exchange files. MDIX has integrated these two model-driven technologies in our tooling.
As important, MDIX has demonstrated its ability to collaborate with standards organizations, government and research organizations, and large and small commercial and non-profit organizations. We have found that collaboration is necessary. A specific organization whom we have been collaborating with and who offers complementary skills and experience is Model Driven Solutions which has also submitted BDF Toolbox Teaming Profile.
MDIX is hoping to team up with partners with:
Subject matter experts with domain-specific knowledge and experience in defining requirements for solutions that will help them produce better outcomes for their patients or populations.
Informaticists who can help us extend or leverage the open-source semantic models with new concepts as required.
Experts in machine learning and artificial intelligence technologies who can build solutions using views and interfaces we can generate as part of the above subject matter experts defined solutions.
User interface design experts who can help define easy-to-use, end-user software interfaces.
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
Virginia Tech Applied Research Corporation Matt Wolfe (matt.wolfe@vt-arc.org)
Additional: charles.joesten@vt-arc.org
Arlington, VA VT-ARC delivers tailored analysis, research, and engineering to address problems of national and global importance. Specializing in forming highly collaborative, trusted partnerships across government, industry, and academia, our teams develop technical and operational approaches that lead to the innovation of technologies with enduring and highly contextual impact. We have passions in achieving impact with our work through the application of science in the development of commerce, trade, industry, and in the improvement of the human condition nationally and globally. Using a convergence approach, we integrate our expertise in information and decision science, sensor and communications engineering, cyberphysical systems security, and DevSecOps with domain-specific expertise of our partners (e.g., agriculture, medical devices, and supply chains) to develop demonstrable solutions for our sponsors' mission challenges. VT-ARC brings expertise in requirements gathering, decision support tool development, data science and automation, and testing and evaluation capabilities to its project teams. For the ARPA-H BDF Toolbox, VT-ARC can develop requirements-based data management approaches, build automated collection, processing, and visualization pipelines, and provide testing environments for prototype evaluations. Further, we apply programmatic rigor to large teams with diverse research and development experience such as academia with scholarly research as a priority, industry's product-market fit perspective, and the government's need to achieve mission objectives to ensure alignment across the team and deliverables satisfy mission requirements. VT-ARC excels at building teams of diverse thought leaders and practitioners driven by a common understanding of a problem and passion for developing a solution. We use a convergence approach to team building and project execution that leverages the different perspectives and experiences of the team to derive technical solutions that are mission specific. Our team has/is developing communications, decision support, and data analytic solutions for federal government customers addressing national security and food security challenges, and for industry customers pursuing innovative business solutions and technology roadmaps. VT-ARC seeks partners from the public and private sector with domain knowledge in medical data access, management and protection, and those familiar with the shortcomings of the current approaches to medical data collection and analysis. We also seek partners with experience IRB requirements and process.
  • Technical area 1: High-fidelity, automated data collection
Datavant Nick Messina (nickmessina@datavant.com)
Additional: EmmaWyllie@datavant.com
San Francisco, CA Real World Data linkage. Span's all healthcare research domains to bring fit for purpose data to various research initiatives. Ecosystem of Real World Data to assess research applicability prior to acquiring data for use. NIH, NCATS - N3C. VA VINCI. NIH All of Us. Data linkage for various research initiatives across these programs. TA3, we would like to bring researchers the ability to explore Real World Data as well as data from health systems and government to assess it's usability for various healthcare research initiatives.
  • Technical area 3: Enhance data usability with intuitive exploration
Model Driven Solutions, Inc. Cory Casanave (cory-c@modeldriven.com)
Additional: ed-h@modeldriven.com
Morganton, NC Model Driven Solutions focuses on extracting simplicity from complexity using models, semantics, and context.

What problems do we address? Our world and the data describing it is complex and colored by stakeholder’s needs, context and background. There are multiple dimensions describing anything or any event we encounter, yet no one individual can or needs to comprehend this multi-dimensional complexity. Models tend to be either overly complex to use or to simple to apply to multiple uses, resulting is stovepipes and data islands.

A key research focus is how to have both; simplified contextual views applicable to a problem or stakeholder as well as cross-domain semantics that federates the contextual views. Contextual views are derived from and federated by common cross-domain semantic models but retain stakeholder’s simplified terminology and use cases. We focus on the model of the business concepts and the relationship between business concepts and problem-centric data with the goal of leveraging these relationships to provide business solutions.

We have contributed to multiple standards in the Object Management Group including “MDMI” (Model Driven Message Interchange), which is focused on the healthcare domain.
Our foundation of strong semantics, standards, and development of models and modeling languages provides a unique combination of skills that directly serves the needs of the Data Fabric. This expertise has already been applied to a foundational healthcare semantic model to serve as the bases for both the simplified views as well as the foundational fabric.

We anticipate teaming with subject matter experts and bio-informatics organizations to ground the theory with practice and practical applications. Our expertise in modeling, semantics, context, and stakeholder views complements existing data and systems to bring that data into the fabric and then make it relevant for specific use cases and users.

Teaming partners who have been and could be involved in the above efforts are the Oracle, Mayo Clinic, MDIX , SmileCDR, HHS ACF, HSS SAMHSA, and the ONC.
Our strengths include the ability to combine “top down” theory with “bottom up” pragmatics and leverage the result for solutions. Our strength is in our ability to cross domains and relate domains with semantic integrity. Our weakness is that we have yet to fully apply these methods to the biomedical domains, we need to partner for this knowledge and to integrate these solutions into existing systems.

Our modeling and modeling language expertise has been applied to other domains such as finance, systems engineering, government and defense as well as the above referenced OMG MDMI standard. We have contributed to multiple standards including UML (Unified Modeling Language), ODM (Ontology Definition Metamodel) and SysML (Systems Modeling Language).

We sponsor and contribute to open source efforts as well as provide software development of modeling, semantic, and model driven architecture tools and models for enterprise clients.
MDS is hoping to team up with partners with the following:

Subject matter experts with domain-specific knowledge and experience in defining requirements for solutions that will help them produce better outcomes for their patients or populations.

Informaticists who can help us extend or leverage the open-source semantic models for new concepts as required.

Organizations with systems and sample data sources to prove and refine both semantic models and their mapping to real-world data.
  • Technical area 3: Enhance data usability with intuitive exploration
Karlsgate Regina Gray (regina.gray@karlsgate.com)
Additional: brian.mullin@karlsgate.com
Budd Lake, NJ Karlsgate offers a revolutionary, privacy-first, approach to data connectivity and delivery with automation at its core:

Innovative Privacy Protection:

Groundbreaking distributed cryptographic protocol enables Zero Trust data matching, eliminating PII propagation from all data flows and mitigating re-identification risks.

Actionable De-Identification allows you to de-identify data while still retaining its value and utility. It's a game-changer for those looking to comply with data privacy regulations and best practices without sacrificing the insights and analysis that quality data can provide.

Privacy-preserving data integration allows you to automatically bring data together from any number of data sources, match at an individual level, and consolidate into a single record for each individual, without any data source sharing any of the identifiable information.

Unparalleled Automation:

Karlsgate automates complex data processes while ensuring security and privacy. Karlsgate Identity Exchange (KIE) acts as an agent, facilitating communication with the privacy-enhancing protocol.

Our tools remove and simplify the workload involved in preparing data to be connected. From data format detection and normalization to the ease of data refresh, the lengthy and costly steps required before data is connected are all automated.
Karlsgate addresses the first of the four technical area requirements - Automated data collection that lowers barriers to high-fidelity, timely, and automated data collection across labs and health record systems, in a privacy-protecting way.

Karlsgate technology was designed to provide a privacy-enhancing layer that is easily integrated into all data operations by embedding directly into existing workflows. Designed to be solve the real-world complexity of data processing, our tools provide fully automated data connectivity - including normalization, standardization, robust matching, de-identification, de-duplication, and consolidation/integration, as well as de-identification. All of this is done with a Privacy-by-Design architecture, which allows data sources to maintain complete control of their PII or PHI, with no personal data ever leaving their environment, and never leaving any residual data with any data partner or a third-party platform or service provider. In short, we make tools that make data connectivity scalable, safe, and easy.
At our core, the Karlsgate team is made up of veteran data technologists and scientists. Each member has over 25 years of experience in data matching, data consolidation, master data management processing, and data analytics. We are experts in privacy technology and are pioneering new cryptographic protocols with innovations on our product being developed weekly.

Karlsgate is a client-centric and product-first organization, where innovative design comes before positioning and politics. We believe that truly great organizations deliver transformational solutions by operating with client service at the forefront of everything they do and fostering a culture of collaborative problem solving and rapid solution development.
Karlsgate’s focus is on privacy-preserving data connectivity. As such, we are perfectly positioned to deliver on the first of the four technical areas as well as the preparation and connectivity of data at scale for the second technical requirement.

We would like to identify teaming partners who match our rapid development cycles and who are innovative and collaborative in their approach to solution design and development.

The ideal partner for Karlsgate would be someone focused on semantic analysis of attributes to prove K-way anonymity.
  • Technical area 1: High-fidelity, automated data collection
RTI International Melissa McPheeters (mmcpheeters@rti.org)
Additional: rboyles@rti.org
RTP, NC • Health Research Expertise spans quantitative and qualitative analysis methods, basic and applied research, health economics, epidemiology, genetics, and bioinformatics.
• Statistics Biostatisticians conduct statistical analyses for support research programs across laboratory, social and health sciences in single and multi- site projects with designs that ensure the quality, validity, and reliability of the research results.
• Genomics Our experts study the application of biomarkers and tools and the impact of molecular events on health and disease, independent evaluation of tools and tests in clinical and health practice; and surveillance of conditions and health outcomes.
• Data Science RTI approaches data science through a combination of thoughtful design, problem solving, and analytic rigor to extract, merge, analyze, and visualize historic and real-time data.
• Technology RTI is a leader in developing computer technology and software applications including those for wearables. Our computer and data management professionals specialize in developing efficient and user-friendly systems for research data acquisition, storage, and transfer; data management and access; study monitoring; and communications, offering expertise for projects that span outcome collection, clinical trials, data collection from electronic health records.
• Dissemination/Translation Working collaboratively with implementers, communities, advocacy organizations, and researchers to develop approaches contextually and culturally appropriate.
RTI offers multi-disciplinary expertise with advanced methods such as informatics, statistics, and data science, with a deep bench of staff to support our experts. We bring a breadth of domain expertise in clinical and public health across a wide variety of domains that are pivotal to our nation’s health such as substance abuse, infectious diseases, and cardiovascular disease. Our staff are committed to objective and independent work in pursuit of achieving the mission of our client’s with science-based solutions, unbeholden to any other interests. RTI is adept at bringing the right experts to bear on a problem in rapid fashion, working with academia and industry alike. RTI currently leads 329 projects with a university on the team. RTI also currently leads projects where a contractor (i.e., not a university) is a subcontractor to RTI. RTI also serves more projects as a subrecipient to universities, commercial for-profit entities and not for profits organizations. RTI’s headquarters in Research Triangle Park, is conveniently located minutes from Raleigh Durham Airport, and is accessible by public transportation. RTI owns or leases more than 1 million square feet of space in RTP, with laboratory, computer, and related facilities for all RTI programs. Class A office space, conference rooms, collaboration spaces, state-of-the-art audio/visual equipment, an excellent onsite restaurant, and extensive parking will all be accessible to ARPA-H staff.

RTI has a stellar reputation as a trusted partner with a track record of creating effective consortium (e.g., North Carolina Consortium for Optimizing Military Performance) and public-private partnerships (e.g., TB Alliance) that bridge academia, industry, non-profit and public entities. RTI has managed more than $800M through OTAs, including executing more than 80 project-OTs since 2018. Our established processes, procedures, quality system and expertise can dynamically meet the needs of ARPA-H in rapidly issuing and managing sub-OTs and supporting spokes in delivering for the government. RTI has identified common risk indicators and has established a dynamic continuous risk assessment and mitigation process that reduces impact on program and project performance.
Vendors with pre-developed open-source tools for Data Harmonization, Data Standardization, Data Linkage and Data de-identification.
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
Ginkgo Bioworks Sbuckhoutwhite@ginkgobioworks.com
Additional: dbayley@ginkgobioworks.com
Boston, MA Ginkgo Bioworks is currently developing new capabilities related to the automation of mammalian laboratory capabilities. Further we are interested in the development of tools that can synthesize data across experimental data types. Ginkgo Bioworks provides a unique capability for the development of automated tools and data synthesis methodologies. Our data connected, high throughput workflows allow for rapid prototyping of new tools and methodologies related to data management and synthesis. Regarding distributed laboratory networks, Ginkgo represents the industry laboratory model with the benefit of experience in high throughput work related to the development of cancer therapeutics. Further our extensive existing data collections, represent a unique prototyping environment of the ingest and synthesis of multiple data types. Ginkgo Bioworks is the leading horizontal platform for cell programming. Ginkgo innovates in Pharmaceuticals and other industries by deploying reusable, standardized Design–Build–Test–Learn workflows in high throughput, automated laboratories (>550K sq. ft). Its proprietary sequence database has ~20x more gene editors than Genbank. Ginkgo has >100 programs with >40 USG & commercial customers, and is a current and past performer on >20 DoD, DARPA, IARPA, NIH & CDC programs. From a data perspective we have ~2 Billion protein sequences in our proprietary DNA database, 5 Million+ enzyme designs built and tested in our foundry. Applied to therapeutic development we have ~720,000 data points generated in a single recent CAR-T experiment and 100,000+ AAV capsids partially or extensively characterized for use in gene therapy. Ginkgo is looking for partners interested in the ability to prototype their tools using high throughput experimental data. We are interested in collaborators who can prototype their tolls on our breadth of foundry capabilities, and on our breadth of existing tracked data. Specific TA1 automation capabilities desired include flow cytometry, high content imaging and multi-omics data. TA2 data synthesis capabilities including analytical chemistry tools as well as omics data.
  • Technical area 1: High-fidelity, automated data collection
Resilience Analytics LLC Josh Trump (josh@resilienceanalytics.com)
Additional: eugene@resilienceanalytics.com
Manassas, VA we are focusing on the use of graph database, AI and LLM to support stress testing of complex systems to enhance resilience. We are working on integrating social and physical science in connecting data with missions of decision makers. Current projects include integration of LLM and AI to visualize critical functions and need of decision makers and guide AI and ML. We are small business. We have projects with DOE and FDA as well as with industry. We can connect LLM and AI with decision analytics to address data governance and medical/drug discovery processes
  • Technical area 3: Enhance data usability with intuitive exploration
Duality Technologies Kurt Rohloff (krohloff@dualitytech.com)
Additional: info@dualitytech.com
Hoboken NJ Duality Technologies develops software-based capabilities to securely share highly sensitive PII information in a privacy-protected manner. We came from the DARPA community where we drove the development of of privacy technologies for applied privacy-protected data collaboration using homomorphic encryption. Duality Technologies has experience supporting data collaboration, and we have extensive success supporting ARPA-style programs that enable secure privacy-protected collaboration over multiple decades. Duality has strength in software engineering, data collaboration environments, applied privacy technologies. We have supported past projects for CDC and NIH related to oncology and rare cancer research collaborations. We are looking to support team members where we can bring value in secure data collaboration capabilities. We can also support technology transition.
  • Technical area 1: High-fidelity, automated data collection
Model Driven Solutions, Inc. Cory Casanave (cory-c@modeldriven.com)
Additional: ed-h@modeldriven.com
Morganton, NC Model Driven Solutions focuses on extracting simplicity from complexity using models, semantics, and context.
What problems do we address? Our world and the data describing it is complex and colored by stakeholder’s needs, context and background. There are multiple dimensions describing anything or any event we encounter, yet no one individual can or needs to comprehend this multi-dimensional complexity. Models tend to be either overly complex to use or to simple to apply to multiple uses, resulting is stovepipes and data islands.
A key research focus is how to have both; simplified contextual views applicable to a problem or stakeholder as well as cross-domain semantics that federates the contextual views. Contextual views are derived from and federated by common cross-domain semantic models but retain stakeholder’s simplified terminology and use cases. We focus on the model of the business concepts and the relationship between business concepts and problem-centric data with the goal of leveraging these relationships to provide business solutions.
We have contributed to multiple standards in the Object Management Group including “MDMI” (Model Driven Message Interchange), which is focused on the healthcare domain.
Our foundation of strong semantics, standards, and development of models and modeling languages provides a unique combination of skills that directly serves the needs of the Data Fabric. This expertise has already been applied to a foundational healthcare semantic model to serve as the bases for both the simplified views as well as the foundational fabric.
We anticipate teaming with subject matter experts and bio-informatics organizations to ground the theory with practice and practical applications. Our expertise in modeling, semantics, context, and stakeholder views complements existing data and systems to bring that data into the fabric and then make it relevant for specific use cases and users.
Teaming partners who have been and could be involved in the above efforts are the Oracle, Mayo Clinic, MDIX , SmileCDR, HHS ACF, HSS SAMHSA, and the ONC.
Our strengths include the ability to combine “top down” theory with “bottom up” pragmatics and leverage the result for solutions. Our strength is in our ability to cross domains and relate domains with semantic integrity. Our weakness is that we have yet to fully apply these methods to the biomedical domains, we need to partner for this knowledge and to integrate these solutions into existing systems.
Our modeling and modeling language expertise has been applied to other domains such as finance, systems engineering, government and defense as well as the above referenced OMG MDMI standard. We have contributed to multiple standards including UML (Unified Modeling Language), ODM (Ontology Definition Metamodel) and SysML (Systems Modeling Language).
We sponsor and contribute to open source efforts as well as provide software development of modeling, semantic, and model driven architecture tools and models for enterprise clients.
MDS is hoping to team up with partners with the following:
Subject matter experts with domain-specific knowledge and experience in defining requirements for solutions that will help them produce better outcomes for their patients or populations.
Informaticists who can help us extend or leverage the open-source semantic models for new concepts as required.
Organizations with systems and sample data sources to prove and refine both semantic models and their mapping to real-world data.
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
MuleSoft Darrell Lee (darrell.lee@salesforce.com)
Additional: avneet.bakshi@salesforce.com
McLean VA In order to make data FAIR (Findable, Accessible, Interoperable, Reusable) and to make data easier to share and reuse, the MuleSoft platform can be leveraged to enable an integration approach that will abstract and insulate researchers from the arduous task of automating the ingest and extraction of data including EHR data via out-of-the-box connectors. Enabling researchers and scientists the ability to seamlessly access, ingest, and expose data with embedded governance and security will be key in facilitating ARPA-H programs and initiatives. MuleSoft is the leader in the Enterprise Integration and API Management space, working with over 200+ government customers including HHS NIH, FDA, CDC, CMS and others. MuleSoft is ideally suited to support the ARPA-H BDF Toolbox with a proven, highly scalable and secure solution to enable automated data collection including laboratory and research data capture, as well as, out of the box connectors to easily integrate EHR data.
MuleSoft offers data lifecycle management capabilities including validation of data, (i.e. de-identification & deduplication), enrichment, transformation, and harmonization. Our platform can be leveraged as a data integration layer across multiple cloud providers, as well as, on-prem systems.
MuleSoft supports the full API lifecycle, from designing, cataloging, and implementing APIs & integrations to testing, deploying, securing, and analyzing/monitoring on one, single platform. Our embedded API Management allows you to manage, secure, and govern both MuleSoft and non-MuleSoft APIs. For years, MuleSoft has been a Gartner leader for both Enterprise Integration Platform as a Service and API Management functionalities.
MuleSoft is the leader in Enterprise Integration Platform and API Management space, working with over 200+ government customers including many HHS OpDivs such as NIH, FDA, CDC, CMS and others. MuleSoft provides the most widely used integration platform (40+ ATOs from US Federal Agencies) for connecting enterprise applications in the cloud and on-premise. MuleSoft is delivered as a single platform (Anypoint Platform) and built on proven open source technology for the fastest, most reliable on-premises and cloud integration without vendor lock-in. MuleSoft can play an integral role in meeting ARPA-H’s requirements as stated in the BDF Toolbox RFP. MuleSoft’s approach to alleviate challenges around data integration, data usability, and data standardization are reflected in the capabilities that it brings within Integration Platform as a Service and a full lifecycle API Management platform. We welcome the opportunity to work closely with teaming partners interested in providing ARPA-H with a proven, modern approach to address the Biomedical Data Fabric requirements.
  • Technical area 1: High-fidelity, automated data collection
Amazon Web Services (AWS) Tyler Willis (tylrwill@amazon.com)
Additional: aws-arpah-team@amazon.com
Arlington, VA AWS is unifying two distinct data architecture approaches, data mesh and data fabric. Data Mesh advocates for a decentralized and domain-specific data architecture that promotes agility and empowers data producers and consumers to harness the full value of organizational data. Data Fabric, in contrast, emphasizes creating a unified environment for disparate systems to enhance data value. Thus, a modern biomedical data fabric merges the intra-organizational or intra-domain goal of the data fabric with interconnectivity and interoperability facilitated by a data mesh. For example, the automation capabilities of data fabric can accelerate and enhance data preparation stages within a domain for publication to the data mesh, resulting in more accurate data harmonization and machine learning models. AWS is also investing heavily in the use of ML tools, such as large language models. We provide customers a straightforward way to: find and access high-performing foundation models (FMs) that give outstanding results and are best-suited for their purposes, the ability to seamlessly integrate applications without having to manage huge clusters of infrastructures or incur large costs, and take base FMs and build differentiated apps using their own data, while ensuring complete protection, security, privacy, and control over how data is shared and used. AWS seeks to support TA1, TA2, and TA3. AWS is focused on helping customers access data from various federated sources; utilize consistent application programming interfaces (APIs); build curation services for semantic interoperability and annotation services for enriched datasets; and create intuitive exploration tools for persons of various technical and scientific backgrounds. At AWS we refer to this as the Modern Data Community. Further, our modern data architecture uses machine learning and automation for end-to-end integration of various environments and data pipelines by building a technology layer over customers underlying infrastructure that cohesively integrates and presents data to technical and non-technical users. Thus, we build a biomedical data fabric by integrating the Modern Data Community with ML-powered data services to eliminate low-value, effort-intensive tasks from data teams. Further, we free capacity to focus on high value, differentiating work, and enable researchers to leverage vast amounts of data without the need for specialist skills. These capabilities empower the research, healthcare, and life science communities to more rapidly reduce costs and deliver innovative solutions which save lives. AWS seeks to support TA1, TA2, and TA3. AWS has significant strengths in building scalable, accessible, and well-managed data systems, addressing key data management principles such as easy data access, rapid data growth, diverse data sources, and efficient data governance. We are experts in reducing friction in data access, empowering users with varying technical expertise, and fostering a culture of data sharing and ownership. Equally, we leverage automation and technology stacks to accelerate connectivity to data sources. We enable decentralized data ownership to accelerate data access and insights, as data is managed by those who know its value, while also centralizing data management, fostering a unified view. Further, AWS has market leading ML capabilities to employ native language tooling; automate the process of data ingestion, cleaning, and transformation; identify patterns and trends in data; and automate the selection of models to fit desired outcomes, tuning models, and detecting drift and bias. Additionally, AWS builds collaborative workspaces with various technical and functional benefits to improve productivity, efficiency, and collaboration. This includes ensuring data is shared between producers and consumers in this environment in a secure way, including aggregated and anonymized user information to protect user privacy, while providing non-personally identifiable information. AWS seeks to support TA1, TA2, and TA3. AWS is seeking systems integrators and independent software vendor partners experienced in: 1) human-centered and product-centric perspectives to tackle the hurdles arising from the heterogeneous nature of contemporary data sources, 2) decentralized and domain-specific data architecture that promotes agility and empowers data producers and consumers to harness the full value of organizational data, 3) cohesive and metadata-driven strategy to connect various data sources under a single virtual layer that is oftentimes managed by a single governing entity to ease governance, enhance access, and support integration, and 4) prepare data and build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows. We seek partners that, like AWS, are pioneering scalable, accessible, and well-managed data systems, addressing key data management principles such as easy data access, rapid data growth, diverse data sources, efficient data governance, and innovative data analysis. We also seek partners that understand decentralized data architectures can be complex because it places significant focus on organizational and people / process change around data ownership, quality, and compliance.
  • Technical area 1: High-fidelity, automated data collection
Lifebit Biotech Inc. Nate Raine, Global Director of Federal Health & Life Sciences (nate@lifebit.ai)
Additional: hadley@lifebit.ai
New York, New York Lifebit's federated technology serves as a versatile tool for Federal Health agencies, offering unparalleled flexibility in data integration and analytics. It is designed to be data-agnostic, meaning it can seamlessly interact with various types of health data. This technology acts as a catalyst for establishing robust connectivity across diverse health data ecosystems. By facilitating data interoperability, it empowers these agencies to conduct more comprehensive and efficient analyses, thereby enhancing decision-making and ultimately patient care. Overall, Lifebit aims to be a linchpin in optimizing the health data landscape for federal organizations. Lifebit offers a suite of capabilities that can significantly enhance the ARPA-H BDF Toolbox, making us an ideal partner for collaborative initiatives. Our competencies include:

1. EHR Integration and Harmonization: We excel at integrating Electronic Health Records (EHRs) and harmonizing them across formats for seamless interoperability.
2. Federated Secure Data Enclave: Our federated technology ensures the secure storage and transmission of sensitive health data across different platforms.
3. Public Surveillance: Our tools can facilitate data collection and analysis for monitoring public health trends and emergencies.
4. Population Genomics: We offer robust analytics to unlock insights from genomic data at a population level.
5. Patient Registries: We create and manage centralized databases that enable detailed patient tracking and outcomes measurement.
6. Data Management: Our systems simplify the lifecycle management of data, from collection to analytics to storage.
7. Secure Collaboration: Our platform allows secure, collaborative work environments for researchers and healthcare providers.

Trusted by global leaders, Lifebit equips Federal Health Agencies with the necessary tools to turn health data into actionable insights. By ensuring secure, interoperable, and usable data, we help pave the way for groundbreaking therapeutics and improved patient outcomes.
Lifebit stands as a frontrunner in the field of precision medicine, consistently meeting and exceeding benchmarks set by national healthcare initiatives, academic research bodies, and nonprofit health organizations. Originating from the minds behind Nextflow, Lifebit brings unparalleled expertise to the table. We have successfully harmonized an extensive range of datasets, including Real-World Evidence, Real-World Data, and clinical trial data, aggregating them from diverse sources into millions of harmonized sets. Our proficiency lies not just in data integration but also in transforming this complex data into actionable insights. Overall, Lifebit combines technological innovation with deep healthcare domain knowledge to deliver solutions that drive meaningful advances in both research and patient care. Notable clients and endeavors such as Flatiron Health, Orien Network, Genomics England, Jackson Laboratory among other global Life Sciences organizations and federal agencies. We aim to support multi-source data preparation and curation for analysis at scale. We are seeking partners to complement our expertise in data harmonization and Platform infrastructure in areas such as Machine Learning and subject matter expertise within the cancer healthcare ecosystem.
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
CorrDyn, Inc. Ross Katz (ross.katz@corrdyn.com)
Additional: james.winegar@corrdyn.com
Nashville, TN We develop custom data infrastructure for biotech firms to get the most value from their data. Our data infrastructure includes data pipelines, data lakes, data warehouses, platforms for analysis and machine learning model development, custom frontend applications, machine learning operations, and natural language processing-based research automation tools. We excel at partnering with domain experts to understand requirements and delivering data infrastructure and tools to domain experts that meet their needs while embodying best practices in data architecture, maximizing performance, and minimizing cost. We have developed public-facing data tools for researchers outside healthcare in the past. We bring a breadth of healthcare and biotech experience, but we are not medical researchers. Our primary experience is in developing data infrastructure for biotech firms, especially high-throughput data pipelines, low-latency reporting, machine learning models, and machine learning operations platforms. We also developed the data infrastructure for the National Labor Exchange (NLx) https://nlxresearchhub.org/, which solves a similar problem at a smaller scale. Our leadership brings data science, data architecture, and machine learning operations expertise with enough healthcare and biotech experience to bridge the gap from domain experts to scalable data solutions. We are looking for partners who have the medical research expertise and can define requirements for which datasets need to be targeted and why and serve as the primary drafters of the proposal. We bring extensive expertise in requirements gathering, data architecture development, and data solution development, along with the ability to understand scientific context, draft high-quality data solution proposals, and deliver solutions within budget and time constraints. We believe we could support technical areas 1-3, even though we are only encouraged to select one.
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
Palantir Technologies Inc. Philomena Fritz (pfritz@palantir.com)
Additional: dkucz@palantir.com
Headquarters in Denver, CO. US Government Office in Washington, DC. Research at Palantir focuses on developing and improving innovative software tools and methodologies for working with data on a mass scale. In particular, our published work in health informatics is focused on navigating the challenges of working with large scale datasets—particularly real world data (RWD)—and improving methodologies for gleaning insights from such datasets, including by leveraging emerging technologies such as artificial intelligence (AI)/machine learning (ML). Overall, this work aims to help researchers and informaticians better leverage large scale data for applications such as population health, identifying therapeutic targets, drug discovery, and more, and evaluating the impact of therapeutics. Most recently, our team has published papers on topics such as:

* Applying algorithms to electronic health records (EHR) data to infer pregnancy timing for more accurate maternal health research
* A novel method for resolving EHR data quality issues for clinical encounters in a large scale dataset
* Improving transparency of AI in phenotyping for Long COVID research by demonstrating reproducible machine learning computable phenotypes
* Navigating data quality challenges for multisite RWD databases
* An EHR-based target trial evaluating the effect of Nirmatrelvir/Ritonavir (Paxlovid) on hospitalization among adults with COVID-19
Palantir is a technology partner that offers a commercial software solution—the Foundry configuration of the Palantir Platform (“Foundry”)—that integrates and makes data usable for research on the mass scale required for ARPA-H’s BDF Toolbox. Foundry provides flexible data integration, harmonization, mapping, analysis, and artificial intelligence/machine learning capabilities that are used by researchers across the federal health landscape and beyond. For this project, we envision providing a data infrastructure that offers ultimate interoperability with any system and solution required for biomedical research. This includes interoperating with the innovations and/or novel tools of our proposed partners. Palantir brings extensive experience configuring and leveraging our software solution—Foundry—to support critical health research missions across the public and private sectors. Most relevant to BDF, Palantir collaborated with NCATS to configure Foundry as the backing infrastructure for the National COVID Cohort Collaborative (N3C) Data Enclave, which integrates EHR data of COVID-19 patients from 77 provider sites and improves the accessibility and usability of data for medical research on COVID-19. N3C provides a secure, national resource of EHR and related data that can serve as a foundation and approach for future multi-stakeholder, multi-system medical research efforts. N3C provides an expanding data and research asset comprised of harmonized data representing over 20 million patients, powering team science among a community of more than 3,700 researchers from 320 institutions. In addition to the unique technical experience of standing up N3C, Palantir is a thought leader in AI with experience applying AI and ML to improve the speed and accuracy of research to prevent and treat diseases more effectively. We anticipate applying our expertise to achieve high-impact results for ARPA-H with cancer and other diseases. Palantir has a rich history of collaboration with a wide range of government and commercial partners and is open to different partnering scenarios. We are seeking partners with clinical and modeling expertise and/or entities that can offer novel approaches and methodologies and hard-to-access data that we can operationalize in our highly interoperable data infrastructure. In addition, we seek partners from research institutions such as academic medical centers that can help provide academic research experience and framework for rigorous investigation. We are also open to partnerships within the commercial healthcare sector, and believe that a combination of all of the above will be required to create a comprehensive approach to the ARPA-H BDF Toolbox effort.
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
L7 Informatics Oana Lungu (oana.lungu@l7informatics.com)
Additional: joshua.boyles@l7informatics.com
Austin, TX L7 Informatics is focused on transforming traditional laboratory operations by breaking down information silos in the Diagnostics, Therapeutics, and Research spaces. Our software, L7|ESP, provides solutions through a single data and process automation platform. We enable connectivity throughout the laboratory, from unified data models, to instrument integrations, automated workflows, and sample management. L7 Informatics software provides tools for laboratories to break down information silos by connecting data, including complex instrument data, sample management, data models, ELN, all connected with an automated workflow engine. We would like to work with teaming partners on how to best utilize these tools in the BFD Toolbox project. Our strength comes from our experience in helping laboratories automate their complex scientific processes to create streamlined data flows, and moving from pen-and-paper systems to digitalization. We are looking for potential teaming partners who expertise in complex data capture, such as Natural Language processing. We are also seeking partners who are interested in providing complex inputs including data, processes, instrumentation, SOPs, for tedious laboratory processes, and who would like to partner in the laboratory automation space.
  • Technical area 1: High-fidelity, automated data collection
Global Action Alliance Shazveen Saleem (ssaleem@globalactionalliance.net)
Additional: halaigh@globalactionalliance.net
Ashburn, Virginia Our research areas are that in the field of life science. We are focusing on ensuring all life science data can be categorized accurately and of the highest quality by disease family. Our technical prowess exists with data collection through to visualization. We are currently agnostic to disease and have confidence in any and all type of data. Within the WISE-R platform, the process of Data Aggregation, Cleaning, Analysis, Insights, and Visualization (ACIAV) unfolds seamlessly to empower data-driven decision-making in the biomedical field.
Data Aggregation: WISE-R aggregates data from diverse sources, consolidating patient records, clinical trial data, research publications, and more. This unified data repository simplifies the accessibility of critical information, serving as the foundation for subsequent analysis.
Data Cleaning: Clean, high-quality data is essential for reliable insights. WISE-R employs advanced algorithms and validation techniques to ensure data cleanliness by identifying and rectifying inconsistencies, duplicates, and errors.
Data Analysis: WISE-R's AI-powered analytics dive deep into the data, uncovering hidden patterns, trends, and correlations. These analyses provide invaluable insights across various biomedical domains, from optimizing clinical trial protocols to identifying potential drug candidates.
Insights Generation: WISE-R doesn't stop at analysis; it generates actionable insights. Researchers and healthcare professionals can leverage these findings to make informed decisions, advancing healthcare and research.
Data Visualization: Complex data is transformed into interactive visualizations within WISE-R. These visuals, including charts, graphs, and dashboards, simplify the communication of findings, enabling stakeholders to grasp information swiftly.
Our strengths and experience are within the realm of data collection, aggregation, cleaning (AI and natural language processing), followed by analysis and insight generation. Furthermore we have a very talented UI/UX team for the design of the dashboard/s. We are actively seeking disease domain expertise as potential teaming partners. Specialized knowledge and experience in diseases, whether it's in the areas of epidemiology, diagnostics, treatment, or patient care, we invite you to collaborate with us. This expertise is invaluable in shaping the development of data-driven solutions that can have a profound impact on disease management, prevention, and research. In addition to seeking disease domain expertise, we are actively looking for individuals and organizations interested in participating in user testing. Valuable input and feedback will play a critical role in refining and validating our offerings to ensure they meet the highest standards of accuracy, usability, and effectiveness.
  • Technical area 1: High-fidelity, automated data collection
HealthVerity Jason Meyer (jmeyer@healthverity.com)
Additional: dkwasny@healthverity.com
Philadelphia PA HealthVerity leverages Privacy Preserving Record Linkage (PPRL) technology to support de-identification and linkage of patient level Real Word Data in a HIPAA compliant fashion to support public health. HealthVerity's Privacy Preserving Record Linkage (PPRL) technology can help ARPA-H link data and breakdown data silos. Additionally, HealthVerity hosts the largest fully interoperable RWD ecosystem in the US covering over 330M patients across over 75 unique data sources. HealthVerity has a strong past performance supporting PPRL and RWD focused projects across NIH, CDC, CMS, FDA and NSF. HealthVerity's PPRL technology has a FedRAMP moderate ATO. HealthVerity supports data linkages across siloed datasets to provide a longitudinal and interoperable view of patients across healthcare and consumer data.
  • Technical area 3: Enhance data usability with intuitive exploration
Velsera Nadav Weinberg (nadav.weinberg@velsera.com)
Additional: tony.patelunas@velsera.com
Boston, MA Velsera is focused on accelerating precision medicine from R&D through clinical applications. Our software ecosystems support scalable, reproducible, and interoperable low-code to full-code analysis environments; secure multi-source / multi-cloud data aggregation and management; sample track-and-trace, audit, and data aggregation; genomic analysis and reporting for clinical decision support. Today, our focus is on secure, managed access to data and democratizing discoverability and analysis of multi-omic (genomic, transcriptomic, proteomic, etc), clinical, and imaging (light and radiographic) data. With >10 years of experience driving the evolution of NIH data ecosystems, Velsera brings innovative data management and interoperability experience. As an active participant in the GA4GH and CWL communities we bring experience developing open-source standards for biomedical data management and analysis.

We also have an experienced researcher education and engagement team that works with the breadth of ARPA-H BDF archetypes (academic PIs, early career researchers, pharmaceutical scientists, government researchers, government administrators, etc) to identify use cases, test products, and generate rapid adoption.
Velsera is a science and medicine first company with >30% of our company holding terminal degrees. Our focus has been on driving discovery and democratization of biomedical data for >10 years with an eye on what researchers will need in the future. We develop and drive standards for the Global Alliance for Genomics and Health (GA4GH) and the Common Workflow Language that support our analysis platforms for the National Cancer Institute’s Cancer Research Data Commons (Cancer Genomics Cloud), National Heart, Lung, and Blood Institute’s BioData Catalyst (BioData Catalyst Powered by Seven Bridges), and NIH Common Fund’s Gabriella Miller Kids First (Cavatica). Additionally, we've built data management and analysis ecosystems for leading patient advocacy groups and top pharmaceutical companies. Partners with deep expertise in AI/ML development. Partners with innovative approaches to developing interoperable data models.
  • Technical area 3: Enhance data usability with intuitive exploration
Clairity, Inc. Rosemary Shults (rshults@clairity.com)
Additional: civers@clairity.com
Austin, TX Clairity is developing a machine-learning algorithm that unlocks pixel-level image data embedded in screening mammograms to predict each patient’s risk of developing breast cancer within the next 5 years. Clairity will be on the frontline in making data available to a broad range of stakeholders who will ultimately adopt the technology and optimize its use in clinical practice to improve outcomes. Tools include: 1) the development of a centralized data repository that will bring together disparate datasets for use by a broad range of stakeholders, including patients, physicians, researchers, care guideline organizations, and payers; 2) creation of novel image-based tools that will radically enhance those activities and enable completely new analyses and insights; 3) development of AI-generated narrative data exploration
  • Technical area 3: Enhance data usability with intuitive exploration
Cairn Biosciences Inc. Amin Fehri (afehri@cairnbio.com)
Additional: sribeiro@cairnbio.com
San Francisco, CA Cairn Biosciences has built a scalable drug discovery platform that leverages the power of live cell imaging to deliver translationally relevant simulations of dynamic disease biology. Our platform is applicable across therapeutic areas and our team’s current focus is in oncology.
We leverage innovations in cell-based modeling, microscopy and machine learning to build and learn from experimental simulations of live disease biology. We automate the collection and curation of multiscale live cell microscopy datasets encode time-varying disease signaling and phenotypic profiles that underpin therapeutic responses. Our current data collection priority is to scale our data sets to enable a broader map of the functional consequences of cancer treatment. Our current data curation priorities include further scaling and standardization of our multiscale cell feature, analysis and storage framework and its extension to enable integration of additional data sources, e.g. sequencing, and incorporation of new capabilities, e.g. knowledge graphs.
Our multidisciplinary team has deep expertise and significant prior accomplishments in the fields of live-cell assay data generation, management and analysis, and oncology drug discovery. Our extensive experience encompasses the following skills that align with TA1 and TA2:
- Data knowledge in cell biology and oncology drug development.
- Data collection: cell line engineering and advanced cell-based disease models; fluorescent microscopy and live-cell microscopy; and capture of high-fidelity live-cell imaging data and metadata and streamlined ingestion in computational systems.
- Our computational team has successfully tackled multiple challenges of working with large-scale live-cell imaging datasets by establishing data standards and scalable workflows, and by leveraging cutting-edge advances in Machine Learning (ML), Computer Vision (CV) and High Performance Computing (HPC) technologies.
Our team has unique experience in the generation and analysis of high-throughput microscopy sets for oncology drug discovery. We are also experienced performers on multiple NIH SBIR contracts. We have significant strengths and deep, hands-on expertise in the following areas:
- Software: software engineering, machine learning and computer vision research and development for high-content imaging (HCI) analysis, data engineering to streamline data and metadata structuring, curation, processing and saving from the bench to cloud databases, web-based dashboards and applications, familiarity with capabilities and limitations of proprietary commercial microscope and HCI data acquisition and analysis packages;
- Hardware: microscopes (confocal, widefield, HCI), cameras, instrumentation microfluidics, instrument automation and control;
- Wetware: cell-based disease modeling, live-cell imaging and immunofluorescence microscopy;
- Drug discovery and oncology: extensive cell-based assay development experience for drug discovery with a focus on oncology.
In the context of TA2, we are actively seeking to team up with partners that have:
- Expertise in open-source data standards to enhance the interoperability within the microscopy ecosystem
- Expertise in knowledge graphs, biomedical ontologies and system modeling for integration of single-cell phenotypic data with genetic data
- Expertise in live cell imaging data generation focused in any disease modality

In the context of TA1, we are also interested to team up with partners seeking sources of live-cell/HCI data.
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
OHSU Emek Demir (demire@ohsu.edu)
Additional: saulm@ohsu.edu
Portland,OR We are working towards building an AI-assisted curation system for regular and multiplexed pathology imaging modalities and apply it to early cancer detection. We have extensive capabilities in imaging pipelines, deep learning technologies and have the samples and capacity to produce large imaging datasets. We also have deep expertise in data standardization, integration and construction of data centers and portals for biomedical data. Our focus is currently on gland segmentation, annotation and classification for ductal cancers with focus on early cancer detection. We have been traditionally very strong in different imaging modalities and systems biology of cancer. We have substantial strength experience and resources in biological data science, cancer early detection and precision oncology. We are a part of a large early detection collaboration network with Stanford, CRUK and others. We are looking for partners that can collaborate on advanced AI assisted curation aspects including smart assistants, detection of uncertainty and integration of foundation models to UX. We are also interested in integration of imaging and other single cell modalities as well as standardization of single cell data.
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
Clarivate Mark Hyer (mark.hyer@clarivate.com)
Additional: faith.garrison@clarivate.com
Alexandria, VA Clarivate data and analytics teams help the pharmaceutical/medical device industries:
-Understand the right R&D strategy to best drive innovation
-Plan clinical trials to best address patient needs
-Improve the probability and speed of regulatory approvals
-Align IP strategy with R&D strategy
-Pursue the right portfolio strategy to maximize investment returns
-Build the right commercial strategy to gain market access and drive adoption
Clarivate Real-World Data Product is a comprehensive and integrated solution, providing a robust and longitudinal view of patient care and ensuring you have recent, reliable data reflective of the prevalence of disease and treatment utilization.

Track post-launch drug performance against patient outcomes to make timely changes to your strategy and improve drug adoption and adherence in target patient populations.

46B+ Medical claims with view into 300M+ U.S. lives over 3 years.

32B+ Unique patient claims and EHR records

120M+ Electronic health records

1B+ Patients across 1M+ HCPs accessible for direct messaging at point of care

1000 + Distinct specialty drug formulations captured for valuable perspective into the treatment journeys of over 12M patients with chronic, complex, and rare diseases.

2M+ Practitioners and their affiliations across 3200+ health systems

100% Of population enrollment data by coverage type
Deep biopharma experience- Accelerate development with consultation from our team of subject matter experts in epidemiology, pharmacology, and specific therapy areas.

Powerful predictive analytics- Capture reliable insights by incorporating the predictive analytics and A/I models.

Data science excellence- Obtain higher quality data harmonization across your internal and external data sources from our team of highly experienced data scientists.

Ontologies API- 22 Life Sciences controlled vocabularies.
Connected datasets:
- Preclinical and Drug Discovery
- Systems Biology
- Clinical Trials Optimization
- Market landscape intelligence- Drugs, Deals, Ontologies, Drug Timeline and Success Rates
- Generics- Active Pharmaceutical Ingredients Manufacturers
- Disease Surveillance
- Regulatory/ Product Approvals/ CMC
- Disease Landscape and Forecast, Epidemiology
- Patents, Trademarks, Case Law
- Published Material (Web of Science)

Medtech data repository includes claims / EHR, healthcare provider affiliations, purchase order data, hospital device usage, epidemiology and more.
Clarivate is looking for a prime contractor to work with plus teaming partners with expertise in data aggregation, AI, and working with ontologies.
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
BlueHalo Erin Gibson (erin.gibson@bluehalo.com)
Additional: elliot.otchet@bluehalo.com
Arlington, VA Custom software development specializing in secure adaptive integrations and cloud scalable solutions, our research focus areas evolve from the close collaborations with our research partners/clients for example;
Highly specialized solution that aggregates and federates large data sets for spearheading cancer research analytics Cancer research
Cardiac research development support
Development of software solutions for remote patient monitoring of cognitive and physical readiness in patients including but not limited to patients with cancer and patients recovering from traumatic brain injury
An adaptive, flexible solution that will overcome data silos, fetch and aggregate data from multiple diverse data sources creating / supporting fast, customizable, easy queries to locate and return valuable information in a computable format.
Reference implementation of FHIR that supports Query, Bulk Data, and Subscriptions
Next-generation approach that harnesses the power of large language models (LLMs) and knowledge bases (KBs) to democratize multidimensional diverse biomedical data access and empower a wider range of users.
Investigating trade-offs to manage the cost of infrastructure for running LLMs via informed selection of dimensionality reduction methods, specialized LLMs, and cloud-based solutions.
Expertise and past performance for innovation in real-time data aggregation across billions of data elements per day
Experience and past performance working with the US intel community to counter terrorism and with large hospitals to detect and deter cybersecurity threats
Member/Leader of healthcare interoperability community, driving innovation through open-source software including contributions to our award-winning open source FHIR implementation, Cancer Data Aggregator, and NIH’s AnVIL via contributions to TERRA
Expertise in big data querying, use of large language models in explainable machine learning, and advance geometric multi-resolution analysis of high dimensional omics data
Our nxtHeath experience brings secure, GovCloud, private cloud, and edge device end-to-end processing of PHI to train AI models on specimen data to advise clinicians of possible health concerns.
Shares drive for healthcare innovation - to not just collect data, but to intelligently analyze data to support the overall health of the overall public
Cloud providers to enable data lake activities needed to store, process and update federated data
Developer Outreach to help facilitate the usage of our software that will enable integration at the data supplier
Data anonymization analysts and scientists to help develop algorithms that ensure that there is sufficient anonymization occurring on all data that egresses the data lake or is used in AI training.
Clinical / Research fellows/experts/SMEs
  • Technical area 1: High-fidelity, automated data collection
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
  • Technical area 3: Enhance data usability with intuitive exploration
Platypus Leigh McCormack (leigh@platypus.health)
Additional: corey.todaro@platypus.health
Chattanooga, Tennessee Our mission is to connect diverse data, representative algorithms, and unique data scientists across the industry to collaborate to build effective, unbiased analytics that improve health outcomes, increase quality and safety, and lower costs for diverse populations. Platypus is an AI enablement that removes healthcare organizations' most common barrier in consuming and creating equitable data and analytics. We leverage on privacy-preserving technology to address organizations' need for robust data to move towards equitable algorithms. This underlying technology allows organizations to collaborate, create, and consume without exposing data or algorithmic intellectual property. Our strengths are in working to improve data science efficiencies and increase the accessibility of equitable algorithms. The Platypus team comprises individuals with strong data science and healthcare backgrounds. Our operational technology allows organizations to execute and update algorithms in their data environment without exposing their data to the algorithm owner. We are currently working with multiple types of organizations, data structures, and algorithms. Platypus is focused on data and knowledge sharing to improve algorithmic insights. We are interested in augmenting our capabilities with teaming partners that can provide expertise in linking individual-level data across sources. Additionally, we are looking to be an asset for teaming partners who are interested in ways to enhance their algorithms without having to move, store, or be exposed to external data.
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
SoftServe Kimberly Schaefer RN, BSN and Joby Kennedy (kscha@softserveinc.com)
Additional: jkenne@softserveinc.com
Austin, TX SoftServe is an IT consulting and digital services provider with 30+ years of achieving meaningful outcomes for our clients as advisors, engineers, and designers. We solve business challenges with innovative technology solutions on a global scale. We take pride in our 7 Centers of Excellence which include Intelligent Enterprises, Innovation, Strategy, Digital Platforms, and Research & Development. Our R&D Team has a strong focus on healthcare and life science solutions and is comprised of 100+ cross disciplinary engineers, researchers, and PhDs who work with cutting edge technology backed by scientific publications. SoftServe experts design advanced deep learning pipelines, on-device model compression, and optimal training approaches. We deliver cutting edge AI science to our clients’ production projects that range from analysis of network infrastructure stability to application of Generative AI finding novel targets for therapeutics and generate novel molecular structures with desired properties. Our cross-disciplinary experience helps us to overcome complex industry challenges in life science, computational physics, and human-computer interaction. We have commercial verticals dedicated to Healthcare, Life Science, and Health Tech that are supported by industry domain experts to assist our clients in identifying innovative ideas with the greatest probability of success. SoftServe transforms these ideas into reality with a proof of concept (PoC), helps define the solution vision, and builds the foundation of the implementation roadmap. SoftServe empowers our clients to find the most promising use cases for commercialization. With our clients’ strategic initiatives in mind, we identify technologies with the greatest benefit and use cases for their business from UX, risk and time management and quality assurance. Leveraging our user research expertise for improved focus group design and early concept validation. We run the experiments, collect data, and perform studies to ensure that the technology aligns with our clients’ business needs. SoftServe seeks to support TA1, TA2 and TA3. SoftServe is focused on helping clients access and make sense of data coming from a myriad of sources that is often siloed and unstructured using our AI/ML capabilities. Leveraging our technical expertise, we work to build “fit for purpose” solutions using composable architecture and pre-built accelerators that help our clients get their concept quickly while minimizing cost. In addition to our healthcare and life science industry domain experts and our certified technical engineers, we work with partners such as AWS and Mulesoft, and countless others, to create the best solutions for our clients. For over 30 years, we have been the trusted advisor and provider enabling our clients to build transformative patient experiences, gain insight from the data, and accelerate business outcomes across the healthcare continuum.

SoftServe delivers innovative approaches across the interconnected healthcare ecosystem. We strive to improve efficiency while reducing costs to help our clients provide better patient value and exceptional health outcomes. We understand the need for solutions that can work with an immense variety and quantity of data quickly and efficiently to enable clinical experts such as researchers, healthcare professionals, and life science communities to rapidly uncover insights that will lead to better processes, improved treatments, and new discoveries that will ultimately save lives.
SoftServe seeks to support TA1, TA2, and TA3 leveraging its robust capabilities in constructing scalable, user-friendly, and efficiently maintained data systems. These capabilities encompass essential data management principles, including facilitating data accessibility, accommodating rapid data expansion, integrating various data origins, and optimizing data governance. While specializing in minimizing barriers to accessing data, enabling users with diverse technical backgrounds, and promoting a data-sharing and ownership culture. Additionally, we harness automation and technology stacks to expedite connections to data sources. Facilitating decentralized data ownership to expedite data accessibility and understanding while allowing data to be overseen by individuals who recognize its significance while consolidating data administration to promote a harmonized perspective. We have worked on numerous AI/ML projects in the past 3 years with 100’s of industry technology partners including industry leaders such as AWS who is a premier partner. We use AI/ML technologies for automation of processes to predictive and perspective capabilities based on client project goals. We have a 98% customer retention rate, an 84% NPS, and we have completed over 20,000 projects for clients. We are certified and comply with all regulatory and compliance standards with a strong commitment to facilitating the secure sharing of data between producers and consumers in this context. This includes the sharing of aggregated and anonymized user information to safeguard user privacy, all while offering non-personally identifiable data. SoftServe seeks to support TA1, TA2, and TA3. SoftServe is a systems integrator (SI) and custom software developer looking to implement our own architecture as well as be the glue that helps connect partner solutions together. SoftServe excels at working with clients existing systems and agnostically developing the best solutions for our clients.
1) We endorse adopting human-centered and product-centric approaches to address the challenges posed by the diverse nature of modern data sources
2) We employ a decentralized and domain-specific data architecture that fosters flexibility and enables both data creators and users to unlock the complete potential of organizational data
3) We endorse a unified approach, driven by metadata, to link diverse data repositories within a singular virtual framework, typically overseen by a single governing body. This approach simplifies governance, improves accessibility, and facilitates integration.
4) We specialize in data preparation and the development, training, and deployment of machine learning (ML) models for any application using fully managed infrastructure, tools, and workflows.
We are looking for partners that align with SoftServe's philosophy of constructing scalable, easily accessible, and efficiently managed data systems to address fundamental data management principles. These principles encompass facilitating seamless data access, accommodating rapid data expansion, integrating diverse data sources, ensuring effective data governance, and promoting innovative data analysis. Additionally, we are interested in collaborators who grasp the intricacies of decentralized data architectures, as they introduce a significant emphasis on reshaping organizational structures and processes related to data ownership, quality, and compliance.
  • Technical area 1: High-fidelity, automated data collection
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
  • Technical area 3: Enhance data usability with intuitive exploration
Certara Brennan Murphy (brennan.murphy@certara.com)
Additional: robin.braun@certara.com
Princeton, NJ Certara currently supports 2300 global pharmaceutical companies develop drugs and other medical interventions covering every stage of the development lifecycle with software and services. Since 2014, 90% of FDA approved drugs have been advanced by Certara customers. 18 Global Regulatory agencies including FDA use Certara technology to evaluate drug applications. Among many other things, we help organizations locate critical datasets for medical research and we have a collection of 2 decades of biomedical research. Lately, we've been developed a no code data fabric with generative AI and large language model capabilities that can accelerate the traditional process to utilize data for discovery of medical interventions. Certara has deep domain knowledge covering every aspect of gaining FDA approval for medical interventions. The BDF Toolbox is designed to streamline the process to get data to researchers. We understand how this process works today because we offer it as a service to many organizations around the globe. At the same time, we have a novel data fabric capability that could enable data sourcing at scale along with some of the platforms and toolsets researchers use today when they have access to that data. Applying large language models and generative AI are key enablers we offer. Over 2 decades of experience helping 2300 companies pursue FDA approval of drugs, medical devices and other interventions. We have an encyclopedic understanding of every stage of gaining FDA approval--from research, clinical trials, bioinformatics, simulation/modeling, and even how to price drugs for specific markets around the globe. We're interested in partnering with organizations who have developed expertise integrating lots of different types of data into consumable format for research, Additionally, we're interested in organizations who are deeply focused on particular types of data such as genomics. We can help focus top level integration and incorporating new data types into areas where new patterns can be discovered based on our experience of helping companies do this today. Also, another desirable potential partner is someone who has a bit more experience supporting government research, NIH or other organizations, etc.
  • Technical area 1: High-fidelity, automated data collection
Guava Health, Inc. Dylan Wenzlau (dylan@guavahealth.com)
Additional: hello@guavahealth.com
Santa Barbara, CA Guava builds technology primarily for consumers including tools to easily and automatically import and merge their health data from many sources including EHRs, devices, personal logs/journaling (meds, symptoms, etc.), PDFs, images/scans, DICOM, etc. The Guava team can bring novel algorithms and processes for extracting and normalizing unstructured/erroneous data from EHR systems or from scans/PDFs/paper which were once part of EHR systems. We have a testbed of tens of thousands of patients already actively using some of our early technology which can help accelerate quality feedback and adoption. We think Guava has strong impact potential on both TA1 and TA2. Guava can build user-friendly software very quickly. Our team has deep expertise in large, complex, and structured/unstructured datasets as well as modern AI / machine learning. The Guava founders' previous company was acquired by Amazon and the tech they built powered ~90% of Amazon Alexa's informational answers. Guava is interested in partnering with organizations who can widen the reach of the technology we invent to positively impact more Americans in a shorter time period.
  • Technical area 1: High-fidelity, automated data collection
Thoughtworks Federal Jonah Czerwinski (jonah.czerwinski@thoughtworks.com)
Additional: david.marsh@thoughtworks.com
Chicago, IL Thoughtworks emphasizes the importance of federated data architectures in creating an effective data ecosystem that spans several domains and technologies. This approach enables users to self-discover data products, collaborate, and understand how to comply with data governance policies, ultimately fostering data-driven decision-making.

A key advantage of this approach to a Biomedical Data Fabric Toolbox is its ability to create an ecosystem offering an effective balance between the autonomy of decentralized solution development and centralized governance.

Thoughtworks' real-world experience building user-friendly, modern, and intuitive front-end systems to federated data architectures allows researchers to make their data products that meet the DATSIS standards of being Discoverable, Addressable, Trustworthy, Secure, Interoperable, and Self-describing.

Thoughtworks' approach to data discovery and assessment has proven beneficial in various sectors. For instance, it has enabled healthcare companies to refine their data-driven patient-centric services rapidly, improving the visibility of their clinical research. Similarly, it has been instrumental in enhancing the development of new patient care applications in commercial health organizations.

Thoughtworks' experience in building federated data architectures has delivered flexible, expandable, and manageable solutions to evolutionary data ecosystems with a focus on delivering user-friendly, efficient, and effective solutions that can adapt to the varying needs of different stakeholders.
At Thoughtworks, our expertise in agile software development and continuous delivery can significantly contribute to the ARPA-H Biomedical Data Fabric (BDF) Toolbox. Our approach to software architecture emphasizes evolutionary design, enabling us to build flexible systems that can adapt to changing requirements, a necessity for ARPA-H's cross-disciplinary goals.

We are adept at creating robust automated data collection systems, which aligns with TA1 objectives. Our experience with privacy-centric design ensures that we meet the privacy-protecting mandate of ARPA-H. For TA2, our data scientists and AI specialists can develop advanced algorithms for AI-assisted curation, ensuring that multisource data is harmonized effectively for large-scale analysis.

Our human-centered design philosophy underpins our potential contributions to TA3, ensuring intuitive exploration of complex data sets. Additionally, our global presence and broad industry experience mean we have a diverse pool of users for TA4, enabling effective user testing to evaluate data usability.

For teaming partners, we seek collaborations with organizations that have complementary strengths, such as domain-specific knowledge in biomedicine, expertise in cloud computing and big data platforms, and a proven track record in delivering healthcare technology solutions. This would ensure a comprehensive and innovative approach to developing the BDF Toolbox.
Thoughtworks is a pioneer in software design and delivery, renowned for its revolutionary methodologies in agile software development and its robust approach to enterprise architecture. Our strengths lie in our ability to blend cutting-edge technology with strategic business acumen to drive digital transformation across various industries.

Our global team is composed of some of the industry's most talented developers, designers, and consultants who are dedicated to creating impactful software that meets complex business challenges. We have a rich history of delivering scalable, resilient, and sustainable systems that are designed to evolve with our clients' needs. Thoughtworks has consistently led the charge in advocating for ethical tech and inclusive practices, ensuring that the solutions we architect benefit all stakeholders equitably.

Our experience spans multiple sectors, including healthcare, finance, retail, and more, giving us a diverse and comprehensive perspective on technological innovation. We have also been instrumental in contributing to and leading open-source projects, showcasing our commitment to collaborative advancement and shared knowledge within the tech community. At Thoughtworks, we don't just build systems—we build the strategies and the teams that transform organizations.
At Thoughtworks, we seek teaming partners who prioritize the end-user experience as the cornerstone of innovation. Potential partners should exhibit a strong commitment to creating ethically designed technology solutions that are inclusive, accessible, and have a positive impact on society.

We value partners with diverse perspectives that challenge conventional thinking and help us push the boundaries of what's possible in creating user-centric designs. They should bring deep domain expertise, whether in advanced data analytics, AI, healthcare, or other fields, to complement our interdisciplinary approach.

A collaborative spirit is essential, with a willingness to engage in open and continuous dialogue, co-creation, and knowledge sharing. We appreciate partners who are agile, able to adapt to the evolving needs of projects, and who share our passion for excellence and innovation in crafting cutting-edge solutions.

Experience in iterative development, rapid prototyping, and testing with real-world users is critical. We look for partners who are not just looking to meet the current standards but are eager to set new ones, driving the industry forward with responsible and impactful technology.
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
EQTY Life Sciences Michael Kolzet (Mike@eqtylab.io)
Additional: alistair.dootson@eqtylab.io
Chapel Hill, NC A start-up that specializes in applied cryptography to authenticate existing systems and data repositories that provide immutable providence, seamless traceability, and lineage to govern the development of models. We can mathematically prove the integrity of datasets, models, and responsible SOPs. Furthermore, we can provide third parties the capability to validate this authenticity themselves without accessing the dataset or reproducing the analysis results. By generating proof of authenticity and separating it from the authenticated datasets, data and processes can be trusted and federated learning can occur as you do not need to move the underlying training data outside the HIPPA-compliant ecosystems. We automate the collection, authentication, and cataloging of multi-source datasets and provide governance controls over existing systems and processes to eliminate black-boxes. Our parent company's core technology has been in development for four years and has been proven in other industries (defense, media, and climate). Our Life Sciences company, a joint venture, was recently founded. We have created and are entering beta testing now with a global sponsor for our first authentication solution that solves operational inefficiencies and integrity challenges surrounding the FDA New Drug Application. Specifically, we are disrupting the cycles required to trace clinical analysis only to reproduce the results, solely to trust the data. We are partnering with the FDA, SAS, and sponsors to design our second solution, enabling real-world data submitted as evidence for drug approval. Our company does not conduct any machine learning and we do not have domain expertise in bioinformatics and drug development where we are seeking to develop future partnerships and solutions.
  • Technical area 1: High-fidelity, automated data collection
Progress | MarkLogic Abby Potter (Abby.Potter@progress.com)
Additional: abby.potter@marklogic.com
McLean, VA MarkLogic's main research focus area is data management and integration, with an emphasis on unstructured and semi-structured data. They have expertise in areas such as data governance, metadata management, search and discovery, and semantic technology. MarkLogic's metadata AI capabilities can align with ARPA-H BDF Toolbox in several ways. For instance, it can help in managing large amounts of metadata and enable better search and discovery of data. Additionally, MarkLogic can facilitate data integration and interoperability across different systems, making it easier for organizations to share and collaborate on data. This can be particularly useful for ARPA-H BDF Toolbox, which aims to accelerate biomedical innovation by enabling data-driven research and development. By leveraging MarkLogic's capabilities, the toolbox can potentially enhance its data management and analysis capabilities, leading to more efficient and effective research outcomes. One of MarkLogic's key strengths is its ability to handle large volumes of complex and diverse data, making it ideal for use cases in government agencies that need to manage and analyze vast amounts of information. MarkLogic has helped agencies such as DoD and CMS to improve their data management and analysis capabilities, resulting in better decision-making, increased efficiency, and improved outcomes. MarkLogic is in search of prime partners to support TA2 and partner on future TA5, positioned for additional capabilities.
  • Technical area 2: AI-assisted, multi-source data preparation and curation for analysis at scale
Pluto Bioinformatics Rani Powers (arpa-h-bdf@pluto.bio)
Additional: rani.powers@pluto.bio
Denver, CO Launched in 2021 from the Wyss Institute at Harvard University, Pluto’s mission is to make it possible for every researcher to gain immediate insight from their data and run complex computational biology algorithms without any of their own infrastructure or code. To this end, we are building a modern platform empowering researchers to run bioinformatics analyses, customize publication-ready plots, and collaborate on biological interpretation in an intuitive and user-friendly experience. Combining decades of combined experience in software development across the life sciences and other industries, Pluto’s interdisciplinary team of scientists and software engineers has pioneered a novel “infinite canvas” user experience that allows scientists to tell their scientific story via flexible application of different algorithms for a variety of -omics assays. We currently work with scientific teams from small academic labs, top-tier institutes and departments, biotech companies, and large public pharma doing research on a wide range of topics including cancer, autoimmune and neurodegenerative diseases, diabetes and metabolic disorders. Additional research focus areas internally at Pluto include novel algorithm development for context-specific gene expression and pathway analysis, and AI-assisted biological interpretation (i.e. a scientific “co-pilot” embedded in online user experience). Pluto is well-positioned to support the mission of the Biomedical Data Fabric Toolbox by offering both consulting expertise and a white-labelable, modern, cloud-based software built specifically to connect scientific data. Pluto’s team bring expertise in -omics data analysis and a cutting-edge, user-friendly platform to enable novel insight generation at scale. Pluto’s platform is API-driven and built to streamline bioinformatics workflows, including data management, analysis, and communicating final results everywhere that science needs to get shared. Pluto already integrates with s3 and GCS, Graphpad Prism, Benchling, Microsoft Office, and other tools commonly used by scientists, and provides natural integration points for interacting with large data repositories. Plots generated with Pluto are extensible, shareable via link for real-time commenting and collaboration, accessible via API for meta-analysis, and can be embedded on third-party platforms, where the plot and its dynamic methods automatically stay up to date. If the BDF, Pluto would be a value “front end” to science enabling seamless visualization for a wide variety of bioinformatics projects. Founded by Dr. Rani Powers out of the Wyss Institute at Harvard University, Pluto has brought together a team of interdisciplinary experts with decades of combined experience working at the intersection of life sciences, software, product, and design. Pluto’s science team is composed of PhDs in Computational Biology, Molecular Biology, and Pharmacology. Our software engineers have successfully built and launched numerous, successful, mobile- and web-based applications in a variety of industries. At the intersection of the two disciplines, Pluto’s team brings years of experience developing, testing, deploying, and executing large-scale computational pipelines using auto-scaling cloud infrastructure. By prioritizing performance and user experience in our current platform, Pluto has built a unique scientific software product recognized by leading industry voices (e.g. Novo Nordisk, Andreesen Horowitz) that integrates seamlessly with data storage systems, pipelines, and other scientific tools. We look forward to the opportunity to partner with large data consortiums, especially those that store next generation sequencing data (e.g. RNA-seq, scRNA-seq, ChIP-seq, CUT&RUN, ATAC-seq, scATAC-seq) and are interested in making the data more accessible and intuitive to biologists. We are also interested in collaborating on novel pipeline development, large data “portal” creation, and novel research projects, such as those including meta-analysis of the aforementioned data types.
  • Technical area 3: Enhance data usability with intuitive exploration