HARDER THAN JEOPARDY
In 2011, just two days after Watson beat two human champions at Jeopardy!, IBM announced that their brilliant Artificial Intelligence (AI) would be turning its considerable brainpower towards transforming medicine. Stand aside, Sherlock: Dr. Watson is on the case.
Yet by 2018, the fanfare surrounding Watson had evaporated. While capable of analysing significant quantities of data, Watson’s NLP struggled to understand medical texts and make intuitive leaps based upon subtle clues in the way a human doctor would. It also could not mine patients’ medical records as planned, impacting the accuracy of time-dependent information like therapy timelines. Even relatively straightforward genomics applications in oncology were a struggle, with Watson producing either obvious or questionable recommendations. As a result, confidence in Watson was low: “The discomfort that I have—and that others have had with using it—has been the sense that you never know what you’re really going to get…and how much faith you can put in those results,” said Lukas Wartman of the McDonnell Genome Institute at the Washington University School of Medicine in St. Louis.
Rather than producing the preeminent oncology diagnostics tool they promised, IBM ran into myriad challenges that produced, instead, what amounts to an AI diagnostics assistant or medical librarian at significant financial loss.
While IBM Watson may have been the highest profile AI to struggle in the medical field in 2018, they were not alone. Babylon Health’s diagnostic triage chatbot came under scrutiny for potential misdiagnoses within its app, which the NHS planned to adopt in hopes of reducing pressure on emergency services. Despite claims that their computerized diagnostic decision support (CDDS) program scored higher on a medical exam than doctors and achieved “equivalent accuracy” with human physicians, it missed symptoms of a hypothetical pulmonary embolism during external testing, prompting an MHRA inquiry. Babylon Health subsequently conducted and passed four separate audits of their Quality Management System to ensure compliance with UK and European regulations, but the episode demonstrated once again the challenge and the high bar to success.
Here is the crux of the matter: People do not trust AI, and certainly not with their lives. Bad press, combined with limited understanding of this relatively new technology leads to sceptical users seeking human opinions. To make AI an effective healthcare tool, people must trust it, and this requires a necessary level of regulation.