Microsoft has come out with a new language system, Speller100 which is said to be the most advanced when it comes to linguistic coverage and accuracy.
Announced via a blog post on AI Research, the company revealed that the new tool is based on a number of AI models which holds the ability to understand more than 100 languages altogether and it already carries out all the spelling corrections on Bing.
The new tool became a massive challenge for developers when dealing with languages that had little web presence because the amount of data collected wasn’t enough to train a spell-correcting model comprehensively.
Besides that, systems can’t also depend entirely on the training data to learn the correct spellings from varying languages as going into the depth, spelling correction is done by developing an error model and a language model – with a lot of errors not being considered as the same. For instance, there will be non-word errors that may exist because of the word not being available within the fed vocabulary for a particular language and, on the other hand, there can be real-word errors which may not fit in a larger context.
Hence, to be precise, Speller100 revolves around the concept of language families, or multiple languages that share similarities within them. Moreover, it also consists of zero-shot learning (a technique through which the model is set up in a way that it learns and corrects spelling without taking more help from language-specific labeled training data.
In order to make Speller100 compatible for over 100 languages, Microsoft adopted a spelling correction pre-training method which first extracted text from webpages and then was able to generate errors such as deletion, addition, rotation, and replacement. With this, the team wiped away any need for a huge dataset of misspelled searches and that eventually made Speller100 reach 50% of correction recall for top languages in the list as zero training data existed for those. Microsoft later deployed the system on its Bing search engine where 15% of searches conducted are misspelled and as a result the number of misspellings went down by 7.5%.
Furthermore, to give a more boost to the performance, Microsoft also took advantage of the orthographic, morphological, and semantic similarities that exists between the languages selected to create language family-based models. The system then moved with zero-shot benefit and now Speller100 stands good enough for runtime, along with being well-suited for languages with lesser data like Afrikaans and Luxembourgish.
As of now, Speller100 has been able to drop down the number of pages with no results by 30% and also the number of times users have to correct their spellings for a search by 5%. On the contrary, users clicking on Bing’s spelling suggestion also went up from 8% to 67%.
Microsoft will bring Speller100 to more of its products very soon!