№173|05:11 AM ET
Independent reporting on technology, markets & policy
TechEchelon
№01 / Anchor·ARTIFICIAL INTELLIGENCE

Harvard Study Finds OpenAI's o1 Model Outperformed Physicians on Emergency Room Diagnoses

A study published in Science by Harvard Medical School and Beth Israel researchers found OpenAI's o1 model correctly diagnosed emergency room patients more often than two internal medicine physicians, though authors cautioned the findings are far from a clinical green light.

SM
Sara Montes de Oca
MAY 4, 2026 · 01:04 AM ET · 3 MIN READ
Editorial

A study published this week in the journal Science found that OpenAI's o1 large language model reached the correct or near-correct diagnosis more frequently than two internal medicine attending physicians across a set of real emergency room cases, raising questions about the role AI could play in clinical decision-making.

The research was led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center. The team drew on 76 patients who were seen in the Beth Israel emergency room, presenting the same electronic medical record data to both human physicians and to OpenAI's o1 and 4o models.

Two separate attending physicians then evaluated the diagnoses without knowing which came from a human and which came from an AI.

On initial triage — the point at which the least information is available and the urgency is highest — the o1 model reached "the exact or very close diagnosis" in 67% of cases. One of the two physicians hit that mark 55% of the time, while the other did so 50% of the time.

"We tested the AI model against virtually every benchmark, and it eclipsed both prior models and our physician baselines," said Arjun Manrai, who leads an AI lab at Harvard Medical School and is one of the study's lead authors, in a press release.

The researchers stressed that the AI models received no pre-processed data — they worked from the same records available to clinicians at each diagnostic moment.

The study stopped well short of arguing that AI is prepared to make autonomous treatment decisions. Its authors instead called the findings evidence of "an urgent need for prospective trials to evaluate these technologies in real-world patient care settings."

The researchers also acknowledged a significant constraint: the study examined only text-based inputs, and existing research suggests current AI models perform less reliably when reasoning over non-text data such as imaging.

Adam Rodman, a Beth Israel physician and co-lead author, said there is "no formal framework right now for accountability" around AI diagnoses and that patients still "want humans to guide them through life or death decisions [and] to guide them through challenging treatment decisions."

The findings also drew scrutiny from practicing clinicians. Kristen Panthagani, an emergency physician, described the study as "an interesting AI study that has led to some very overhyped headlines," pointing out that the AI was compared to internal medicine physicians rather than emergency room specialists.

"If we're going to compare AI tools to physicians' clinical ability, we should start by comparing to physicians who actually practice that specialty," Panthagani said. She also challenged the framing of the diagnostic task itself: "As an ER doctor seeing a patient for a first time, my primary goal is not to guess your ultimate diagnosis. My primary goal is to determine if you have a condition that could kill you."

The study adds to a growing body of research examining where large language models may augment or, in some contexts, match clinical judgment — while underscoring that regulatory and accountability infrastructure has yet to catch up with the technology's demonstrated capabilities.

SM
━ ABOUT THE REPORTER
Sara Montes de Oca

Sara Montes de Oca is the Editor in Chief of TechEchelon. Previously a correspondent and producer in Washington, D.C., covering business, finance, and politics.

More from Sara
● THE BRIEF · DAILY NEWSLETTER

Five stories every morning. Before the opening bell.

Written for readers who already know the basics — markets, AI, and the policy decisions that shape both.

Mon — Fri · 06:30 ET · Free

No spam · Unsubscribe anytime