Page 13 - AeM_November_2020
P. 13

RESEARCH
                                                                                                                 ANALYSIS
                                                                                                                  TRENDS





























       AI outperforms humans in speech recognition



       Following a conversation and transcribing it precisely is   “as  errors  and  delays  in  recognition  make  the
       one  of  the  biggest  challenges  in  artificial  intelligence   translation   incomprehensible.   On   conversational
       (AI)  research.  For  the  first  time  now,  researchers  of   speech, the human error rate amounts to about 5.5%.
       Karlsruhe   Institute   of   Technology   (KIT)   have   Our system now reaches 5.0%.”
       succeeded  in  developing  a  computer  system  that
       outperforms   humans     in    recognizing   such   Apart from precision, however, the speed of the system
       spontaneously spoken language with minimum latency,   to produce output is just as important so students can
       as reported on the Internet platform ArXiv.org.     follow  the  lecture  live.  The  researchers  have  now
                                                           succeeded in reducing this latency to one second. This
       “When  people  talk  to  each  other,  there  are  stops,   is  the  smallest  reported  latency  reached  by  a  speech
       stuttering, hesitations, such as “er” or “hmmm”, laughs   recognition system of this quality to date, according to
       and  coughs,”  says  Alex  Waibel,  Professor  for   Waibel.
       Informatics  at  KIT.  “Often,  words  are  pronounced
       unclearly.”  This  makes  it  difficult  even  for  people  to   Error  rate  and  latency  are  measured  using  the
       make  accurate  notes  of  a  conversation.  “And  so  far,   standardized  and  internationally  recognized,  scientific
       this  has  been  even  more  difficult  for  AI,”  the  speech   “switchboard-benchmark”   test.   This   benchmark
       recognition expert adds.                            (defined by US NIST) is widely used by international AI
                                                           researchers in their competition to build a machine that
       KIT  scientists  and  staff  of KITES,  a  start-up  company   comes  close  to  humans  in  recognizing  spontaneous
       from  KIT,  have  now  programmed  a  computer  system   speech  under  comparable  conditions,  or  even
       that executes this task better than humans and quicker   outperforming them.
       than other systems.
                                                           Waibel  concludes  that  fast,  high  accuracy  speech
       Waibel  already  developed  an  automatic  live  translator   recognition is an essential step for further downstream
       that directly translates university lectures from German   processing. It enables dialog, translation, and other AI
       or  English  into  the  languages  spoken  by  foreign   modules to provide better voice-based interaction with
       students.  This  “Lecture  Translator”  has  been  used  in   machines.  (Source:  Karlsruher  Institut  für  Technologie
       the lecture halls of KIT since 2012.                (KIT)). ◊

       “Recognition  of  spontaneous  speech  is  the  most                                    By MediaBUZZ
       important component of this system,” Waibel explains,




        13                                                                         November 2020: voice search & digital voice assistants as storyteller
   8   9   10   11   12   13   14   15   16   17   18