A closer look at AI-powered speech recognition in radiology

Pixel-based artificial intelligence (AI) has dominated market attention in radiology in recent years. However, a more well-known and less heralded technology has been evolving for at least two decades and is fueled by major advances with the growth of cloud computing.

What is the technology? Speech recognition based on artificial intelligence.

To put it more informally, today’s radiology speech recognition solutions are not your parents’ speech recognition technology. In fact, they have far surpassed the ones you may have been using just a few years ago.

Speech recognition is so embedded in clinical workflows that many radiologists and other clinicians take it for granted. Indeed, there can only be a peripheral sense of how far the technology has progressed. Advances in deep learning and natural language processing, based on massive amounts of speech data, have vastly improved the speed and accuracy of speech recognition engines. The rapid expansion of cloud-hosted AI has further fueled the growth and evolution of speech technology.

The early software required users to train the speech recognition engine by reciting prepared training text. Users also had to be careful about rating and correcting recognition errors. Accuracy was dependent on input device quality, background noise, and other factors. Accents and special vocabularies were often problematic. Fortunately, the possibilities steadily increased as machine learning technology evolved, and developers were constantly improving the software based on user feedback.

The widespread adoption of cloud computing over the past five years has accelerated neural network and deep learning techniques. By continuously training speech recognition technology with securely anonymized speech data, the engine gets “smarter” as more users interact with it. Nuance Communications’ next-generation speech recognition technology extracts information from thousands of terabytes of speech data while predicting what the user will say next. The technology anticipates and prepares to render what is being spoken based on context, user patterns and speech characteristics such as accent. Nuance Communications’ cloud-based radiology reporting system is hosted in Microsoft Azure and enables users to immediately take advantage of this continuous learning process in ways never before possible.

Speech recognition will be the new UX for radiologists. In fact, ambient speech is the current state-of-the-art speech technology used in solutions such as Nuance Dragon Ambient eXperience (DAX) and PowerScribe† The ambient capabilities recognize and understand the relevant clinical context of conversational speech and convert it into structured, organized output for radiology reports and other applications.

Advances in natural language understanding automatically turn free-form dictation into structured data. Structured data supports the American College of Radiology’s Common Data Elements initiative, which aims to create a common ontological framework that standardizes meaning from the moment of read to the point of concern. In PowerScribe One, it helps to create organized, consistent reports based on spoken stories and provides real-time support for clinical decisions and evidence-based follow-up recommendations. Structured data also enhances interoperability with other systems, including PACS, viewers, and EHRs with bi-directional, real-time data exchange.

While pixel-based AI models and other technologies often make headlines, cloud-hosted and AI-powered speech recognition is quietly and effectively bringing a new generation of radiology reporting. Instead of users wondering about the accuracy of speech recognition, they are seeing improvements in everyday radiology workflows and new ways to apply the technology to improve efficiency and improve patient outcomes.

dr. Agarwal is the Chief Medical Information Officer for Diagnostic Imaging and AI at Nuance Communications.

Leave a Comment

Your email address will not be published.