Study: Consumer-Facing AI Chatbots Aren’t Ready for Prime Time

Research out of the UK’s Oxford University finds that consumers asking an LLM for a diagnosis and treatment recommendation aren’t getting the right results – because they’re human. New research out of the UK is dampening down expectations that consumers could use AI to diagnose their health issues. The study, conducted by the University of Oxford’s Nuffield Department of Primary Care Health Sciences and the Oxford Internet Institute, finds that consumers using LLMs for medical advice didn’t fare any better on getting the right advice than did a control group accessing “traditional sources of information.” The problem? According to researchers, it’s the human element. Billed as the largest user study of LLMs, researchers conducted a randomized trial involving nearly 1,300 participants. One group was tasked with asking an AI chatbot to diagnose their health complaint and offer a recommended course of action. The complaints used by the participants were developed by doctors and ranged from a young man developing a severe headache after a night out with friends to new mother feeling constantly out of breath and exhausted. After analyzing the results and comparing them to the control group as well as standard LLM testing strategies, which do not involve human users, researchers found that the AI chatbots failed because they interacted with real people. googletag.cmd.push(function() { googletag.display(“dfp-ad-hl_native1”); }); Specifically, the study identified three challenges: Consumers often don’t know what information they should provide to a chatbot in order to get a good diagnosis and treatment recommendation; Any variation in…

Continue reading →

Work & Theory on February 12, 2026 Uncategorized