ARTICLE
30 November 2023

GPT-4 Outperformed Simulated Human Readers In Diagnosing Complex Clinical Cases

FL
Foley & Lardner

Contributor

Foley & Lardner LLP looks beyond the law to focus on the constantly evolving demands facing our clients and their industries. With over 1,100 lawyers in 24 offices across the United States, Mexico, Europe and Asia, Foley approaches client service by first understanding our clients’ priorities, objectives and challenges. We work hard to understand our clients’ issues and forge long-term relationships with them to help achieve successful outcomes and solve their legal issues through practical business advice and cutting-edge legal insight. Our clients view us as trusted business advisors because we understand that great legal service is only valuable if it is relevant, practical and beneficial to their businesses.
OpenAI's GPT-4 correctly diagnosed 52.7% of complex challenge cases, compared to 36% of medical journal readers...
United States Food, Drugs, Healthcare, Life Sciences

OpenAI's GPT-4 correctly diagnosed 52.7% of complex challenge cases, compared to 36% of medical journal readers, and outperformed99.98% of simulated human readers, according to a study published by the New England Journal of Medicine.

The evaluation, conducted by researchers in Denmark, utilized GPT-4 to find diagnoses pertaining to 38 complex clinical case challenges with text information published online between January 2017 and January 2023. GPT-4's responses were compared to 248,614 answers from online medical journal readers.

Each complex clinical case included a medical history alongside a poll with six options for the most likely diagnosis. The prompt used for GPT-4 asked the program to solve for diagnosis by answering a multiple choice question and analyzing full, unedited text from the clinical case report. Each case was presented to GPT-4 five times to evaluate reproducibility.

Alternatively, researchers collected votes for each case from medical journal readers, which simulated 10,000 sets of answers, resulting in a pseudopopulation of 10,000 human participants.

The most common diagnoses included 15 cases in the field of infectious disease (39.5%), five cases in endocrinology (13.1%), and four cases in rheumatology (10.5%). Patients in the clinical cases ranged from newborn to 89 years of age, and 37% were female.

OpenAI's GPT-4 correctly diagnosed 52.7% of complex challenge cases, compared to 36% of medical journal readers, and outperformed99.98% of simulated human readers, according to a study published by the New England Journal of Medicine.


The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

See More Popular Content From

Mondaq uses cookies on this website. By using our website you agree to our use of cookies as set out in our Privacy Policy.

Learn More