Microsoft has announced that their speech recognition technology has reached human parity. In other words, their system is more human than ever.
The word error rate of Microsoft's speech recognition tech is now at a low 5.9 percent, according to the company's researchers. That figure puts the tech almost at par with professional transcribers who participated in the tests. The transcribers and the system were asked to transcribe the same recordings. The results were not too far from each other.
Their findings led Xuedong Huang, the company's chief speech scientist, to declare that the speech recognition system has "reached human parity". He further described the feat as "a historic achievement".
The tech utilizes neural language models according to The Verge. These models group words that are similar together to allow a more efficient generalization.
Microsoft's Speech and Dialog research group did admit that the technology is still far from having the ability to understand semantics and contextual awareness. They also indicated that the real test for the speech recognition tech is for it to understand conversations in real-life situations. This will likely include the ability to recognize facial expressions and understanding different languages. It also needs to work well when faced with a wider selection of voices.
Microsoft is planning to use this technology in their personal voice assistant, Cortana. The company's first foray to speech recognition involved Skype. With the release of Skype Translator in 2015, users have been able to talk to other people who speak a different language. Skype Translator recognizes the conversation and translates them from another language.
In describing their work, Microsoft AI Research head Harry Shum said that "We are moving away from to world where people must understand computers in a world in which computers must understand us." Shum also said, as MacRumors reported, that "true artificial intelligence is still on the distant horizon."