Analyzing Pronunciation of Words by using Open Source Libraries




We developed this application for a medical doctor, author, and online course instructor specializing in ADHD.

What was the problem?


There are lots of speech recognition tools and libraries available in the market but to identify the correct word pronunciation is not possible with any ready-made tool. The major challenge is to create and train language dictionaries and modals for different languages. It is very difficult to gather a lot of data to train application for best results. There are certain other issues like noise reduction, file format, real time pronunciation recognition which makes pronunciation analysis very tough.



The basic requirement of the project was to determine how efficiently a word is being pronounced by a user and display the success score based on how far it is matched with standard pronunciation. The user was required to speak a word and then his utterance was to be captured via inbuilt microphone of his Mobile phone.

Our Approach


We used Open Source library for speech recognition and then tokenize the entire stream of spoken sentences into small recognizable sequence of phones and letters. We used combination of technologies like Java, PHP, Javascript, ffmpeg and other utilities to reduce noise and efficient speech recognition. We tried to solve problem by creating custom language modals and dictionaries. With this technique we were able to train our app to work with any language be it English, German etc. To create dictionaries for other languages we used CMUSphinx libraries and determined phonemes for each word to be used in our application for any given language.

Difference We Made


The system allows user to hear standard pronunciation of particular word and then try to emulate it in his own recording. The user recording and details are then send to backend logic for processing and the result obtained is displayed at user-end for evaluation. The system not only checks for the individual sound for pronunciation matching but also the associated prosody to ensure the nearest match to the standard sound for the language.



IOS, Android, Java, REST APIs, FFMPEG, soXcommand, PHP, Javascript, CMUSphinx