Biggest Voice AI Opportunity: Electronic Health Records?


Splashy voice AI like Alexa and Google Assistant continue to make headlines, while quietly, the biggest generator of voice AI-driven software revenue over the next 5 years will be (trumpet, please) … clinical documentation.

Didn’t see that coming, right?

In our recent report on Voice and Speech Recognition, Tractica explores eight key use cases that will drive significant revenue through 2025: healthcare (clinical documentation), automotive virtual digital assistants (VDAs), authentication & identification, voice search, voice commerce & customer service, smart home controls, analysis, and consumer robot controls. The scope of the study is the global market for advancing speech recognition and voice recognition software (sometimes referred to as speaker recognition), for the forecast period from 2016 through 2025. By further definition, the report looks at use cases where spoken word is the input, the output, or both. Speech recognition is defined as the use of AI to interpret spoken word.

So why is clinical documentation the biggest revenue opportunity? Part of the answer is the general lack of monetization arcs for most voice and speech recognition use cases.

Voice user interfaces (UIs), powered by voice and speech recognition software, are, with minor exceptions, in the early days of commercial deployment. While voice UI has attracted a lot of media attention, voice UI is not practical for a wide range of use cases because of ambient noise, privacy concerns, and current limitations to natural language understanding (NLU). Use cases for voice search and voice commerce, which will potentially reach a massive number of users, will be hard to monetize. On the other hand, that is not the case for clinical documentation.

Market Need for a Better Solution

Healthcare documentation is not only required, it is essential for appropriate care and understanding individual medical histories and contexts. A significant portion of health record data, particularly context-rich physician and nurse’s notes, lab reports, and discharge summaries, which are collectively called clinical notes, is unstructured text. Clinical notes are a subset of broader medical-related paperwork, but they do constitute a host of challenges to processing them digitally to extract longer-term value.

For many years, there has been a market for companies providing dictation and transcription services to doctors and other healthcare professionals to help smooth clinical note documentation. Speech recognition and natural language processing (NLP) have been leveraged to help automate healthcare documentation for more than 20 years. Yet these solutions were not sufficiently improving documentation efficiency. Healthcare documentation grew increasingly complex with insurance coding requirements and the introduction and market adoption of electronic health records (EHRs). According to a 2018 study by the American Hospital Association, hospitals are spending an increasing amount of time documenting quality indicators for regulators like Medicare and Medicaid and for payers.

In an average-sized hospital:

  • 59 full-time equivalents (FTEs) are dedicated to regulatory compliance
  • About 25% of these FTEs are doctors, nurses, and allied health staff
  • Administrative aspects of quality reporting costs about $709,000 annually
  • It costs approximately $760,000 per year to meet meaningful use administrative requirements

A 2017 Annals of Family Medicine article reported that U.S. physicians spend about 6 hours per day on EHR data entry. In 2015, as machine learning (ML) became increasingly affordable and scalable, speech recognition and NLP began to be combined with ML, resulting in better results, greater efficiencies, and economies of scale.

Speech recognition combined with NLP, ML, and DL software is being used to transform spoken clinical notes into EHRs that can then be mined to extract and connect information to provide much improved patient analytics and a more comprehensive view of healthcare. EHR providers, such as Epic Systems, Meditech and eClinicalWorks, are tapping speech recognition specialists to expand voice-enabled features in EHRs.

There are a growing number of players providing solutions that leverage hybrid speech recognition technologies for healthcare, including well-known players like Nuance, M*Modal, 3M, and NextGen. Other providers include, nVoq, Suki, and Nuvodia.

Key Industry Player: M*Modal

The story of M*Modal illustrates the history, market potential, and market reality for the healthcare speech recognition use case. Legacy healthcare-focused speech recognition and NLU company M*Modal merged with troubled clinical documentation company MedQuist in 2012, then, unable to shoulder $750 million in debt or the ability to handle inherited market assumptions, filed for bankruptcy in 2014. The company rebooted, leveraging the advances in hybrid speech recognition/NLP/ML. In early 2018, the company claimed more than 250,000 physicians use its clinical documentation solutions.

The company says it was the “first cloud-based solution to provide speech recognition for physician documentation directly in the EHR” beginning in 2012. According to a recent press release, “Where legacy speech recognition was focused on enabling doctors to ‘type with their tongue’, M*Modal Fluency Direct was designed to truly understand the actual patient story. M*Modal Fluency Direct continues to be the fastest to provide differentiating context awareness and market-leading accuracy when compared to competitive products.” Based on this, M*Modal has recently launched a Third Generation Computer-Assisted Physician Documentation (CAPD) solution with virtual assistant functionality. The M*Modal Virtual Provider Assistant provides help with scheduling, chart search, patient summaries, Hierarchical Condition Category (HCC) management, ordering of labs and medication, and medication monitoring. The new solution is compatible with more than 150 EHRs.

Boost Revenue Forecasts by Resolving Market Issues

According to Tractica’s forecasts, voice and speech recognition revenue for clinical documentation in healthcare will grow from $512 million in 2016 to more than $2.2 billion in 2025, at a compound annual growth rate (CAGR) of 17.9%. While clinical documentation represents the largest revenue opportunity of the use cases analyzed, Tractica believes it will be an underperforming category. The reasons for this are: 1) market-to-market regulatory issues, which cause fragmentation, 2) the continued slow adoption of automation for EHRs, and 3) the continuing evolution of NLP technologies to sort out spoken, unstructured data. Should some of these issues be resolved sooner, then voice and speech recognition revenue will likely surpass these projections.

Comments are closed.