News feed

  • Codec2 vs. ETSI ACELP – spectral distortion

    After a few days of tweaking Codec2 guts, one question appeared. How much worse the LSP quantizer in ACELP (EN 300 395-2) is, as compared to that in Codec2? Both use the same excitation->filter model. The latter has excellent spectral distortion, much below of what could be called “transparent”. I remember comparing it back in 2022, but the results were never published.

    Now, back to theory. As per Wai Chu “Speech Coding Algorithms”, the formant filter’s transparency criteria are as follows:

    • average spectral distortion of less than 1 dB
    • less than 2% outliers having a spectral distortion above 2 dB
    • no outliers with spectral distortion larger than 4 dB

    Spectral distortion is defined as a “distance metric” between frequency responses of two filters (we are using log magnitude spectra here):

    SD=1n1n0n=n0n11[20logH1(n)20logH0(n)]2SD=\sqrt{\frac{1}{n_1-n_0}\sum_{n=n_0}^{n_1-1}\left[20\log{H_1(n)} – 20\log{H_0(n)}\right]^2}

    HH (frequency responses) are based on discrete Fourier transforms of the measured and reference LPC filters. FFT bins are usually set to 256. Typically, at 8 kHz sample rate, n1=100n_1=100 and n0=4n_0=4, giving 125..3,125 Hz range. Distortion outside this band is not considered perceptually critical.

    Now back to Codec2 and ETSI ACELP. We are only going to focus on the 3,200 bps rate of the former (20 ms frames, 64 bits each). ACELP runs at around 4,567bps (30 ms frames, 137 bits each).

    Codec2’s bit allocation for quantized LSPs is generous at 5 bits per LSP, yielding 50 bits total. ACELP isn’t as generous, spending only about half of that – 26 bits only.

    Codec2 uses delta-encoded spectral frequencies in frequency domain – ω\omega. ACELP utilizes split vector quantizer with LSPs in cosine domain – cos(ω)cos(\omega).

    Let’s look at the spectral distortion of both codecs:

    Now, the cumulative distribution (probability of SD below given threshold) for both:

    Pretty much the same information, just from another angle. Both plots were generated using LibriSpeech English speech corpus.

    Conclusion? Codec2 outperforms ACELP in terms of formant reconstruction but ACELP offers much, much more sophisticated excitation model. Excitation is Codec2’s biggest weakness. I really hope to be able to propose a better model in the coming months.

  • Codec2-mod – statistical tests

    I have used N=200 random files from the LibriSpeech corpus to run a comparison between Codec2 and Codec2-mod. I used ViSQOL perceptual quality estimator to calculate the Mean Opinion Score (MOS) for original-decompressed pairs for both implementations. Results are shown in the table at the bottom of this post(copied verbatim from terminal).

    The few mismatches are most likely caused by slightly different floating point maths (comparisons and fast cosine and arc cosine functions). Those may diverge slightly given edge-case input signal vectors. Still, Codec2-mod is functionally bit-exact with reference Codec2 for almost all inputs.

    The +0.006 difference in MOS is not significant (and even imperceptible). This number does not signify any perceptual improvement. I’m pretty sure that the value would converge to 0.0, given enough samples. A close to zero value means that there is no regression (which is good!).

    This is the histogram for both implementations:

    (more…)
  • Codec2-mod released for testing

    (See both Quick follow-ups down below)

    After a few days of optimizing Codec2’s code (3200 bps mode), it is time to share the results (and the code itself!).

    What the goals were:

    • provide the 3200 bps mode through a separate, clean repository (only C code, no Octave test benches, no modems etc.)
    • prepare an easy experimenting ground for further improvements (beyond the bit-exactness constraint)
    • code clean-up: remove all the unnecessary and obsolete constructs, applied optimizations
    • fully static memory allocation (including KISS FFT)

    After all of that has been done, the resulting code is still fully compatible with the original Codec2, but executes faster and with much less memory footprint. Sounds like an excellent drop-in replacement for OpenRTX and other embedded projects. See the readme file for more details.

    GitHub repository: https://github.com/M17-Project/Codec2-mod

    As always: have fun testing the code. Feedback is welcome!


    Quick follow-up:

    I have tested the modified code on an STM32F405RGTx running at 168MHz, compiled with -Os flag:

    codec2_t c2;
    codec2_init(&c2);
    
    uint8_t encoded[CODEC2_BYTES_PER_FRAME] = {0};
    int16_t speech[CODEC2_SAMPLES_PER_FRAME];
    
    for (uint8_t i=0; i<CODEC2_SAMPLES_PER_FRAME; i++)
        speech[i] = 0.5f * sinf(i/80.0f * TWO_PI);
    
    uint32_t tick = HAL_GetTick();
    for (uint16_t i=0; i<1000; i++)
    {
        codec2_encode(&c2, encoded, speech);
    }
    uint32_t tock = HAL_GetTick();
    
    dbg_print("Time: %lums\n", tock-tick);

    The execution time was 7.021s with the standard cosf() / acosf() pair in the LPC-LSP path and 7.019s with their optimized fast_ counterparts (see util.c for details). No significant ViSQOL MOS change. With -O2 flag, the gain is much larger, execution time was 6.909s.

    Quick follow-up #2:

    After replacing the decimating FIR inside nlp() with a polyphase equivalent, the execution time dropped to 6.754s (-Os). Code is upstream.

  • M17 packet mode support in gr-m17

    M17 GNU Radio Out-of-Tree blocks (gr-m17) now support text messaging. The development is still ongoing, but single-frame text messages can already be successfully transferred between the encoder and decoder blocks. The latter emits a special message at its output, signalling successful packet text message decode.

    The test code is available in the gr-m17’s ‘dev’ branch.

    Happy holidays! Have fun playing with M17 packet mode!

  • libm17 – 1.1.2 update

    libm17 has just been updated with polyphase square root raised cosine filter taps, for both 24 and 48kHz sample rates. Polyphase filters offer great speed improvement over classic FIR filter implementations. This approach is also being implemented in OpenRTX.

    Tested on an STM32F405 (modified Nokia 3310) – filtered 1,000 frames with 10x upsampling (-Os optimization):

    New code is more than 10 times faster!

  • Nokia 3310 firmware 1.2.0 is out

    Major code rewrite:

    • much better baseband handling (and DSP in general)
    • optimized T9 text entry method
    • multi-line, word wrapping message composer
    • 2 ringtones
    • various code optimizations and cleanups

    WIP: text message reception!

    Sources: https://github.com/M17-Project/M17_3310-fw

  • Running OpenWebRX on LinHT?

    Hell yeah! Vlastimil, OK5VAS, managed to prepare an example LinHT SoapySDR driver for the SX1255. This allows OpenWebRX to be ran on the device, fetching IQ baseband samples directly from the ZMQ proxy (described a few posts back).

    OpenWebRX decodes M17, along with other digi modes.

    Astonishing work! This is the innovation amateur radio world needs. Can your radio do this? 😉

  • CC1200 hotspot firmware v2.0 is out

    We prepared a major rewrite, fixing most (all?) baseband transfer problems. The blue Service LED blinks happily now 🙂

    This code is not compatible with older rpi-interface or Go M17 client. Make sure to update those!

    Hotspot firmware: https://github.com/M17-Project/CC1200_HAT-fw
    Updated rpi-interface: https://github.com/M17-Project/rpi-interface

  • Improved T9 text entry library

    Our T9 predictive text entry implementation has just been updated with a binary search. By using this kind of search method, with just a 6kB overhead for a 22kB dictionary (about 3,000 entries), the search time decreases considerably from 6.3 to about 0.27 milliseconds. The previous version required over 11 milliseconds to perform the same task (using linear method).

    That extra space is used to store word locations within the sorted dictionary (array of uint16_t).

    GitHub: https://github.com/M17-Project/M17_T9

    Search time for 1,000 operations:
    Old, linear: 11.5ms
    New, linear: 6.3ms (no additional overhead)
    New, binary: 0.27ms

    The code is yet to be tested on STM32 (modified Nokia 3310, OpenRTX targets?) and the LinHT.

    Enjoy!

  • CC1200 HAT – new firmware release

    After a few days of intense work, version 2.0 of the CC1200 RPi HAT firmware is now available on GitHub (through the dev branch). Feel free to give it a spin. Don’t forget to leave some feedback afterwards.

    The new version requires rpi-interface (again – use the dev branch), and does not yet work with Jim N1ADJ’s m17-gateway. Work in progress!

    Happy experimenting 🙂