April 30, 2024
CrossCare is a research initiative that explores the world of large language models (LLMs), specifically focusing on their applications in healthcare.
Benchmarks play a crucial role in evaluating the performance, limitations, and robustness of LLMs. Well-known benchmarks like GLUE and SuperGLUE have been foundational in assessing language understanding and task performance. However, the challenges today go beyond these scopes, touching on aspects like domain knowledge, safety, hallucinations, and biases, especially in sensitive areas like healthcare. These issues are crucial because they can influence disparities in healthcare outcomes and the quality of care delivered.
Our research specifically targets representational biases in LLMs concerning medical information. We analyze how biases in the data used to train these models can affect their outputs, particularly how diseases are associated with different demographic groups. By studying data from "The Pile," a large dataset used for training LLMs, we examine these biases and their impact on model behavior.
We compare the model likelihoods of disease across demographic groups to actual disease prevalences in the United States among various demographic groups. This comparison helps us understand the discrepancies between how models perceive the world and the real epidemiological data.
Our work contributes to the field by:
Our website crosscare.net allows users to explore this data further and download detailed findings for use in further research on model interpretability and robustness.