Bitterman Lab

Introducing Cross-Care

Our new benchmark to assess the healthcare implications of pre-training data on language model bias

April 30, 2024

Written byBittermanLab

What is CrossCare?

CrossCare is a research initiative that explores the world of large language models (LLMs), specifically focusing on their applications in healthcare.

The Importance of Benchmarks

Benchmarks play a crucial role in evaluating the performance, limitations, and robustness of LLMs. Well-known benchmarks like GLUE and SuperGLUE have been foundational in assessing language understanding and task performance. However, the challenges today go beyond these scopes, touching on aspects like domain knowledge, safety, hallucinations, and biases, especially in sensitive areas like healthcare. These issues are crucial because they can influence disparities in healthcare outcomes and the quality of care delivered.

Investigating Representational Biases

Our research specifically targets representational biases in LLMs concerning medical information. We analyze how biases in the data used to train these models can affect their outputs, particularly how diseases are associated with different demographic groups. By studying data from "The Pile," a large dataset used for training LLMs, we examine these biases and their impact on model behavior.

Bridging the Gap Between Model Perceptions and Reality

We compare the model likelihoods of disease across demographic groups to actual disease prevalences in the United States among various demographic groups. This comparison helps us understand the discrepancies between how models perceive the world and the real epidemiological data.

Key Take-Aways

Models don't know true disease prevalence

Models fit their "understanding" of disease prevalence to co-occurrences in the pre-training data (how often demographic terms show up near disease terms)

Alignment methods deepen these biases

SFT/DPO of base models on certain language datasets (usually English only) only impacts their understanding of these concepts in English

Contributions and Tools for the Community

Our work contributes to the field by:

Analyzing the associations between demographic groups and disease keywords in training datasets.

Examining how these biases are represented across different models, regardless of their size or architecture.

Comparing model-derived perceptions to real-world data to spotlight the inconsistencies.

Our website crosscare.net allows users to explore this data further and download detailed findings for use in further research on model interpretability and robustness.