Page 13 - AeM_November_2020
P. 13
RESEARCH
ANALYSIS
TRENDS
AI outperforms humans in speech recognition
Following a conversation and transcribing it precisely is “as errors and delays in recognition make the
one of the biggest challenges in artificial intelligence translation incomprehensible. On conversational
(AI) research. For the first time now, researchers of speech, the human error rate amounts to about 5.5%.
Karlsruhe Institute of Technology (KIT) have Our system now reaches 5.0%.”
succeeded in developing a computer system that
outperforms humans in recognizing such Apart from precision, however, the speed of the system
spontaneously spoken language with minimum latency, to produce output is just as important so students can
as reported on the Internet platform ArXiv.org. follow the lecture live. The researchers have now
succeeded in reducing this latency to one second. This
“When people talk to each other, there are stops, is the smallest reported latency reached by a speech
stuttering, hesitations, such as “er” or “hmmm”, laughs recognition system of this quality to date, according to
and coughs,” says Alex Waibel, Professor for Waibel.
Informatics at KIT. “Often, words are pronounced
unclearly.” This makes it difficult even for people to Error rate and latency are measured using the
make accurate notes of a conversation. “And so far, standardized and internationally recognized, scientific
this has been even more difficult for AI,” the speech “switchboard-benchmark” test. This benchmark
recognition expert adds. (defined by US NIST) is widely used by international AI
researchers in their competition to build a machine that
KIT scientists and staff of KITES, a start-up company comes close to humans in recognizing spontaneous
from KIT, have now programmed a computer system speech under comparable conditions, or even
that executes this task better than humans and quicker outperforming them.
than other systems.
Waibel concludes that fast, high accuracy speech
Waibel already developed an automatic live translator recognition is an essential step for further downstream
that directly translates university lectures from German processing. It enables dialog, translation, and other AI
or English into the languages spoken by foreign modules to provide better voice-based interaction with
students. This “Lecture Translator” has been used in machines. (Source: Karlsruher Institut für Technologie
the lecture halls of KIT since 2012. (KIT)). ◊
“Recognition of spontaneous speech is the most By MediaBUZZ
important component of this system,” Waibel explains,
13 November 2020: voice search & digital voice assistants as storyteller