“When machine learning works at its best, you really don’t see the effort. It’s just so natural. You see the result,” says Harry Shum, the executive vice president in charge of Microsoft’s Technology and Research group.
These days, a game console that understands voice commands, apps that can translate your conversation in real time and a virtual assistant that provides you with the numbers of nearby pizza places are all fact, not fiction.
These systems not only exist but are getting better every day, thanks to improvements in data availability, computing power and a subfield of artificial intelligence called machine learning, in which systems improve as they take in more data.
Microsoft comes up with its new feature, the speech and human recognition. The company's next goal is to improve the engine's robustness so that it can be used in real-life situations such as on crowded city streets or while driving. They also hope to eventually get it to work with multiple users simultaneously.
"We've reached human parity," Xuedong Huang, the Microsoft's chief speech scientist told the company blog. "This is a historic achievement."
Microsoft first used speech and language recognition as well as translating it with Skype using the Skype Translator.
The capacity to carry on a conversation in two or more different languages, and to see the other person's gestures and facial expressions in real time, is one of the several ways in which instant translation is becoming more and more mainstream.
"The language barrier is going to be essentially nonexistent in four years, for the major languages and the major scenarios," said Arul Menezes, who heads the Machine Translation team at Microsoft Research.