Bitterman Lab

RESEARCH

Our Purpose.

Our research is focused on preparing clinical AI for real-world impact through rigorous evaluation, alignment, and clinical testing. As an interdisciplinary lab of clinicians and computer scientists, our research spans foundation model safety and oversight, natural language processing methods development, and clinical translational AI efforts. Our core research areas are described below. You can find our full publication list here.

Foundation Model Evaluation, Alignment, and Oversight

Foundation models such as large language models learn an implicit knowledge representation from their pretraining data, but their quality, factuality, alignment, and biases are imperfect. Our lab is interested in evaluating gaps and inaccuracies in clinical knowledge and behavior in large language models and vision language models, and how we can improve oversight of these models within the clinical workflow.

Selected Papers:

Gallifant J, Afshar M, Ameen S, Aphinyanaphongs Y, Chen S, Cacciamani G, Demner-Fushman D, Dligach D, Daneshjou R, Fernandes C, Hansen L, Landman A, Lehmann LS, McCoy L, Miller T, Moreno A, Munch N, Restrepo D, Savova G, Umeton R, Gichoya JW, Collins GS, Moon KGM, Celi LA, Bitterman DS. The TRIPOD-LLM reporting guideline for studies using large language models. Nature Medicine. 2025 Jan;31(1):60-69. doi: 10.1038/s41591-024-03425-5.

Gallifant J *, Chen S *, Moreira P, Munch N, Gao M, Pond J, Celi LA, Aerts H, Hartvigsen T, Bitterman DS. Language models are surprisingly fragile to drug names in biomedical benchmarks. EMNLP Findings 2024.

Chen S *, Gallifant J *, Gao M, Moreira P, Munch Nikolaj, Muthukkumar A, Rajan A, Kolluri J, Fiske A, Hastings J, Aerts H, Anthony B, Celi LA, La Cava WG, Bitterman DS. Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias. NeurIPS 2024.

Chen S, Kann B, Foote MB, Aerts HJWL, Savova GK, Mak RH, Bitterman DS. Use of artificial intelligence chatbots for cancer treatment information. JAMA Oncol. 2023 Aug 24;e232954. doi: 10.1001/jamaoncol.2023.2954. PMID: 37615976; PMCID: PMC104505842023

Mining Data from the Electronic Health Records

Our lab develops natural language processing (NLP) methods to automate the extraction of clinical data from the electronic health records. Much of the information collected on patients is entered only in the unstructured free text of clinical notes, radiology reports, and pathology reports – rendering them inaccessible for large-scale analyses and automated monitoring. Informed by our clinical expertise, we develop models that extract information that tends to only be documented in these texts. We are also interested in data augmentation methods to reduce the resources needed to develop these models.

Selected Papers:

Chen S, Gallifant J, Guevara M, Gao Y, Afshar M, Miller T, Dligach D, Bitterman DS. Improving clinical NLP performance through language model-generated synthetic clinical data. arXiv:2403/19511.

Guevara M *, Chen *, Thomas S, Chaunzwa TL, Franco I, Kann BK, Moningi S, Qian J, Goldstein M, Harper S, Aerts HJWL, Savova GK, Mak RH, Bitterman DS. Large language models to identify social determinants of health in electronic health records. NPJ Digit Med. 2024 Jan 11;7(1):6. doi: 10.1038/s41746-023-00970-0. PMID: 38200151; PMCID: PMC10781957

Chen S, Guevara M, Ramirez R, Murray A, Warner JL, Aerts HJWL, Miller TA, Savova GK, Mak RH, Bitterman DS. Natural language processing to automatically extract the presence and severity of esophagitis in notes of patients undergoing radiotherapy. JCO Clin Cancer Inform. 2023 Jul;7:e2300048. doi: 10.1200/CCI.23.00048. PMID: 37506330

Bitterman DS, Goldner E*, Finan S, Harris D, Durbin EB, Hochheiser H, Warner JL, Mak RH, Miller T, Savova GK. An end-to-end natural language processing system for automatically extracting radiotherapy events from clinical texts. Int J Radiat Oncol Biol Phys. 2023 Mar 26;117(1):262-273. doi: 10.1016/j.ijrobp.2023.03.055. PMID: 36990288; PMCID: PMC10522797

Translating AI into the Clinic

We are committed to the evaluation and safe, ethical translation of AI advances into the clinic. We approach this from three angles: Measure, Monitor, and Translate.

Measure: We create new benchmarks to measure model bias and performance, and to assess the impact of AI assistance on clinical decision-making.

Monitor: We develop new metrics and methods for monitoring model performance and their impact on clinical decision-making.

Translate: We investigate the impact of AI on health outcomes via clinical evaluations, including clinical trials.

Selected Papers:

Chen S, Guevara M, Moningi S, Hoebers F, Elhalawani H, Kann BH, Chipidza FE, Leeman J, Aerts HJWL, Miller T, Savova GK, Gallifant J, Celi LA, Mak RH, Lustberg M, Afshar M, Bitterman DS. The effect of using a large language model to respond to patient messages. Lancet Digit Health. 2024 Apr 24:S2589-7500(24)00060-8. doi: 10.1016/S2589-7500(24)00060-8. PMID: 38664108.

Moore AC, Bitterman DS. Toward clinical-grade evaluation of large language models. Int J Radiat Oncol Biol Phys. 2024 Mar 15;118(4):916-920. PMID: 38401979.

Bitterman DS, Kamal A, Mak RH. An Oncology Artificial Intelligence Fact Sheet for Cancer Clinicians. JAMA Oncol. 2023 May 1;9(5):612-614. doi: 10.1001/jamaoncol.2023.0012. PubMed PMID: 36951824.

Hosny A, Bitterman DS, Guthier CV, Qian JM, Roberts H, Perni S, Saraf A, Peng LC, Pashtan I, Ye Zezhong, Kann BH, Kozono DE, Christiani D, Catalano PJ, Aerts HJWL, Mak RH. Clinical validation of deep learning algorithms for radiotherapy targeting of non-small cell lung cancer: an observational study. Lancet Digit Health. 2022 Sep 1;4(9):e657-666. doi.org/10.1016/S2589-7500(22)00129-7. PMID: 36028289; PMCID: PMC9435511.

Bitterman DS, Cagney DN, Singer LL, Nguyen PL, Catalano PJ, Mak RH. Master Protocol Trial Design for Efficient and Rational Evaluation of Novel Therapeutic Oncology Devices. J Natl Cancer Inst. 2020 Mar 1;112(3):229-237. doi: 10.1093/jnci/djz167. PMID: 31504680; PMCID: PMC7073911.

Ethical Implementation of Health Technologies

A common thread throughout our research is promoting socially responsible use of advanced healthcare technologies. We work to define ethical standards for AI implementation that is safe, ethical, and empowers patients and clinicians.

Selected Papers:

Gallifant J, Celi LA, Sharon E, Bitterman DS. Navigating the Complexities of Artificial Intelligence-Enabled Real-World Data Collection for Oncology Pharmacovigilance. JCO Clin Cancer Inform. 2024 May;8:e2400051. doi: 10.1200/CCI.24.00051. PMID: 38713889.

Perni S, Lehmann LS, Bitterman DS. Patients should be informed when AI systems are used in clinical trials. Nat Med. 2023 May 23. doi: 10.1038/s41591-023-02367-8. PMID: 37221381.

Bitterman DS, Aerts HJWL, Mak RH. Approaching autonomy in medical artificial intelligence. Lancet Digit Health. 2020 Sep;2(9):e447-e449. doi: 10.1016/S2589-7500(20)30187-4. PubMed PMID: 33328110.

Bitterman DS, Bona K, Laurie F, Kao PC, Terezakis SA, London WB, Haas-Kogan DA. Race Disparities in Proton Radiotherapy Use for Cancer Treatment in Patients Enrolled in Children's Oncology Group Trials. JAMA Oncol. 2020 Sep 1;6(9):1465-1468. doi: 10.1001/jamaoncol.2020.2259. PMID: 32910158; PMCID: PMC7411938.