GPT-4 Outshines Doctors in Medical Exams, Sparks Ethical Debate

Harvard physician and computer scientist Dr. Isaac Kohane, along with colleagues, put GPT-4 to the test in a medical setting, aiming to evaluate the latest AI model from OpenAI. The results, detailed in the forthcoming book “The AI Revolution in Medicine,” are astonishing, as Kohane found GPT-4 performed better than many doctors in certain aspects.

In the book, Kohane reveals that GPT-4 answered US medical exam licensing questions correctly over 90% of the time, surpassing previous ChatGPT AI models and even some licensed doctors. GPT-4 not only excelled at test-taking but also demonstrated prowess in translation, simplification of technical jargon, and providing suggestions on bedside manner. Additionally, GPT-4 can swiftly summarize lengthy reports or studies and exhibit human-like intelligence in problem-solving.

Despite its impressive capabilities, GPT-4 acknowledges its limitations in understanding and intentionality. In a clinical thought experiment based on a real-life case, GPT-4 correctly diagnosed a rare condition called congenital adrenal hyperplasia, leaving Kohane both impressed and horrified. The challenge lies in ensuring GPT-4’s advice is safe and effective for millions of families.

However, GPT-4 isn’t always reliable, as the book provides examples of its errors, from simple clerical mistakes to incorrect math. These errors, if undetected, could lead to serious consequences in prescribing or diagnosis. The authors suggest various error-catching techniques, such as having GPT-4 verify its work in a new session or asking it to show its work.

While GPT-4 holds the potential to free up time and resources in clinics, it also raises ethical concerns. The authors urge for a consideration of a world with increasingly intelligent machines and the implications of such advancements on society.