KI-Chatbots besiegten Ärzte bei der Diagnose von Krankheiten | Eine kleine Studie ergab, dass ChatGPT menschliche Ärzte bei der Beurteilung medizinischer Fallgeschichten übertraf, selbst wenn diese Ärzte einen Chatbot verwendeten.
https://www.nytimes.com/2024/11/17/health/chatgpt-ai-doctors-diagnosis.html
7 Comments
From the article: “In a [study](https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2825395), doctors who were given ChatGPT-4 along with conventional resources did only slightly better than doctors who did not have access to the bot. And, to the researchers’ surprise, ChatGPT alone outperformed the doctors.
The chatbot, from the company OpenAI, scored an average of 90 percent when diagnosing a medical condition from a case report and explaining its reasoning. Doctors randomly assigned to use the chatbot got an average score of 76 percent. Those randomly assigned not to use it had an average score of 74 percent.
The study showed more than just the chatbot’s superior performance. It unveiled doctors’ sometimes unwavering belief in a diagnosis they made, even when a chatbot potentially suggests a better one.
The experiment involved 50 doctors, a mix of residents and attending physicians recruited through a few large American hospital systems, and was published last month in the journal JAMA Network Open.
The test subjects were given six case histories and were graded on their ability to suggest diagnoses and explain why they favored or ruled them out. Their grades also included getting the final diagnosis right.
The graders were medical experts who saw only the participants’ answers, without knowing whether they were from a doctor with ChatGPT, a doctor without it or from ChatGPT by itself.
The case histories used in the study were based on real patients and are part of a set of 105 cases that has been used by researchers since the 1990s. The cases intentionally have never been published so that medical students and others could be tested on them without any foreknowledge. That also meant that ChatGPT could not have been trained on them.”
EDIT: added more details on the experiment design since the article is soft paywalled
Damn, quite damming research in favor of AI. I was surprised how with the bot doctors barely did better than without it. It’s like they barely took it into account. Was it bias towards their own limited experience or bias against the chatbot’s output.
I can’t say I’m surprised, the article itself notes that we don’t really understand how doctors make the decisions they do to begin with, and as such it’s a very difficult skill to teach. Ultimately doctors are human and can’t be expected to maintain the entire database of medical history inside their head at all times – you can see that in how the doctors use the chatbot; just asking it questions instead of feeding it the entire case text for analysis. There’s a good training opportunity there.
On top of that you only need to see how common complaints are among those with chronic inflammatory conditions and mental illnesses that their symptoms and experiences are brushed off. It’s still possible in this day and age to find doctors which simply don’t believe well-documented and established conditions exist at all. Bias is a bigger factor in some medical sectors than we may be willing to admit.
Isn’t this the whole point of proper AI?
Having friends who work in healthcare, this doesn’t shock me. At the end of the day a doctor is just a person, and as the old joke goes “What do you call a guy who was last in his class in med school? You call him doctor.”
My point is diagnostic medicine is basically taking the totality of data about the patient, and the totality of diagnosable conditions, and finding the match. That’s something GenAI is great at. It’s also something that a human will especially struggle with if they have biases, ignorant ideas about certain types of people, outdated knowledge, or just plain laziness, all of which enough doctors have to be a problem.
I’m no AI evangelist, I’d be hesitant to just trust AI on its own to do this like this shouldn’t become WebMD “guess what you have cancer” 2.0, but doctor is a job like any other, and anyone can suck at any job.
another smal study, another spurious result
As a physician, the part AI is missing and why I’m not worried they will take my job anytime soon… is that diagnosing a condition once you have the data is the —easy— part. How you get the data is the challenging part of making a diagnosis.
Patients do not come in with a script of relevant data points, but rather they bring a puzzle. Find the important needle in the haystack of “it all started after Jim bobs wedding 27 years ago when I just didn’t feel right.” “It’s not a pain, it’s a discomfort” and “I’m pretty sure it’s my kidneys because I ate some onions yesterday”.