Recognizing the Risks of Speech-to-Text in Medicine and Acting on Them
The power of Generative AI and its capabilities is seductive. We want so much to improve our work, processes, and lives, that we run to bring these new technologies into our workflow.
For example, many hospitals have begun using speech-to-text technologies such as OpenAI’s Whisper to transcribe patient consultations with doctors. Their ability to turn speech into text that can be summarized and shared is phenomenal. Of course, we know these AI tools often make up what they think they're hearing, yet they're still used as though hallucinations are not a major concern. But we want to deploy them and innovate so badly that we push them forward even when we know what might go wrong.
The problems are clear: first, these systems can get words wrong simply because they can't always match sounds accurately to words. Second, when there's silence, they might guess what's next based on past context, adding false bits of information. These issues stretch beyond basic transcription into the creation of summaries, where the same problem of wrong or made-up details persists.
OpenAI's Whisper stands out for its abilities in speech-to-text, but it's still not perfect. The error rate is small but significant, especially when lives might depend on accurate medical communication. Even though we know the risks, we convince ourselves doctors will double-check the output and correct any errors. This mindset is risky in that we tend to get used to systems, and our checks on them fade over time. There is a human in the loop, but after a time, the human is simply not paying attention. And that leads to perfunctory review which results in false medical advice slipping through.
To tackle this, it's important to use multiple systems working together and involve people at key points to catch discrepancies. We need to build the checks and balances into the systems themselves, and when we want a human in the loop, we have to make sure that they are playing a substantive role.
In the end, while these technologies are impressive, their allure shouldn't blind us to their faults. We must demand high accuracy and effective error management to stop misleading tech-generated advice from sneaking into the healthcare sector. Recognizing the problem isn't enough. We need action to prevent these tools from spreading potential misinformation.
Kristian Hammond
Bill and Cathy Osborn Professor of Computer Science
Director of the Center for Advancing Safety of Machine Intelligence (CASMI)
Director of the Master of Science in Artificial Intelligence (MSAI) Program