ARPA-H launches Exploration Topic to improve chatbots for patient-facing applications


ARPA-H launches new Exploration Topic to improve chatbots for patient-facing applications 

The Advanced Research Projects Agency for Health (ARPA-H) today launched the Chatbot Accuracy and Reliability Evaluation (CARE) Exploration Topic (ET) to fund the development of novel technical approaches to improve the testing and evaluation of chatbot outputs for patient-facing applications. More than half of American households use the internet for health-related activities, including researching health information. However, up to 50% of the outputs from state-of-the-art chatbots contain at least one serious instance where content is incorrect or unsubstantiated.  

The CARE ET aims to produce tools and technology to better evaluate medical chatbots for patient-facing applications with the efficiency of computational methods and the accuracy of human experts. The CARE ET will develop and test proof-of-concept evaluation technologies for large language models (LLMs) across a variety of use cases, providing a critical resource for those developing LLM applications for health care and helping to de-risk ARPA-H’s future LLM portfolio. The technologies created through the CARE ET will not only assess safety by detecting hallucinations but will address previous biases of medical chatbots by considering stakeholder desires and concerns.  

“Despite their growing use and promising applications, large language models suffer from reliability and accuracy issues that prevent them from being more widely adopted,” said ARPA-H Resilient Systems Office Director Jennifer Roberts. “By funding technologies to improve the testing and evaluation of medical chatbot outputs, ARPA-H is taking a big step in its mission by helping improve access to accurate online health information for everyone.” 

To develop this technology, the CARE ET will pursue innovation within a single Technical Area divided into two parallel and interconnected subsections: (1) improve prompt generation technology to effectively examine the full range of LLM responses and criteria that inform the evaluation of trustworthiness of LLM outputs, and (2) develop novel chatbot evaluation technologies that perform at the speed of computational methods with the accuracy of expert human review.  

The CARE ET will focus on public-facing chatbots, but the tools and procedures developed will have far broader applications, including clinician support, biomedical research, input for regulatory guidance, and other health-related areas. 

ETs are fast-paced efforts that pursue topics strategically aligned with ARPA-H Mission Offices and provide foundational proofs-of-concept to be used in future research. ETs allow for a streamlined solicitation and acquisition approach. For more information, view the CARE ET on