When I was learning a second language, I always found it really hard to know if I was pronouncing everything correctly. The IPA pronunciations of foreign words are widely available, but a way of knowing if your pronunciation matches the IPA isn't! That's why I built Enunciaition, so that you can analyze your voice and see the IPA letter connected to the vowel that you are saying.

Enunciate is built with React, Pixi.js, Pyodide, Scipy, and Numpy. The Web Audio API is used to launch an AudioWorklet which reads the raw PCM audio frames from the microphone. Those raw vibrations are then sent to a Python script running inside WASM that uses SciPy to do a short-term Fourier transform to find the dominant frequencies in the signal. That data is then piped to buffers, and Pixi.js sprites are created from those buffers. A WebGL fragment shader is applied to the sprites to draw the spectrogram in the canvas. Once the recording is stopped, the frequency data is then analyzed and the vowels are classified. The spectrogram can be panned and zoomed so that you can look at the whole recording.

Challenges mostly stemmed from my determination to generate the spectrogram in real time from audio samples, which involved streaming the data between different threads and using a GPU fragment shader to render the spectrogram. Figuring out how to get the Fourier transform windowing correct with the live data also took some time. There also is no backend (everything happens in the browser) which further restricted things. There is a lot of passing buffers between different parts of the code (python, javascript, and the fragment shader), and multithreading (the AudioWorklet and Python WASM both happen on separate threads).

Team Members: Serena Lynas (solo) Software Track

Built With

Share this project:

Updates