Motivation

Homebrewing in the 21st century is going to involve a lot of software. Most free software SDR programs out there today rely on the wdsp library written by Dr. Warren Pratt, NR0V. Dr. Warren has also written a fine manual for WDSP targeted at programmers who like to use wdsp in their SDR programs. This is mostly API documentation along with some details of the algorithms. However much of the math is missing in the docs and buried in the code.

My motivation is to understand how the "NR2" noise reduction algorithm works. I have mostly achieved it. Rather than describe it, I wanted to give references to the papers here so that curious minds can find them.

The big idea

WDSP has an "NR" algorithm. I haven't explored this yet, but this is the popular Weiner filtering algorithm that can be implemented using gradient descend (Least Mean Square algorithm is how it is implemented in practice).

The main idea in all noise reduction algorithm is this:

The "observable" is y(n) which has the speech signal with noise. Our motivation is to extract an approximation of the signal buried in noise. i.e. we need to estimate noise and subtract it. This is done either in time domain or frequency domain.

y(n) = s(n) + w(n)

I believe NR using LMS is a time domain algorithm.

The other property is that noise is uncorrelated with speech signal, so the power spectrum of these two uncorrelated systems are additive as well. i.e.

|Y(f)² = |S(f)|² + |W(f)|²

If we can "estimate" the noise spectral density, then we can get an estimate of signal power spectral density as follows:

|Ŝ(f)|² = |Y(f)|² - |Ŵ(f)|²

Now, if we find the time domain values corresponding to Ŝ(f), i.e. ŝ(n), then we get back the speech signal with the noise removed. Well, I am grossly simplifying what "noise" is here. The estimate of speech that is reconstructed thus, is not without problems. There are things like musical noise artifacts that one can hear in such reconstructed speech. The quality also very much depends on the signal to noise ratio.

NR2 algorithm

NR2 uses a frequency domain method which has its origins in this famous paper by Ephraim and Malah called "Speech Enhancement using a minimum mean-square error short-time spectral amplitude estimator". Here we statistically model the speech signals and noise signals (or rather the probabilities of apriori SNR (clean signal to noise ratio) and aposteriori SNR ("observable signal" to noise ratio). A crucial step is to find an estimate of noise. This step is rather left out in the paper.

One popular way to track noise (in a time-frequency analysis) is with a method called "Optimal Smoothing Minimum Statistic". This method is detailed in the paper "Noise Power Spectral Density Estimation based on Optimal Smoothing and Minimum Statistics" by Rainer Martin.

WDSP Implementations

WDSP implements both the above steps for Noise Power Estimation (NPE in wdsp parlance). I could find clear mapping of equations in the OSMS method vs the equations in the paper) where as for the MMSE NPE, I could not fully correlate (pun intended) the equations in the code.

Any help welcome. Specifically, what I am looking for is a reference to the paper that details these steps in the mmse noise estimator. (Edit: I found some answers in section 4 of the paper "NOISE POWER ESTIMATION BASED ON THE PROBABILITY OF SPEECH PRESENCE" by Gerkmann and Hendriks. In the temporal smoothing step, it uses a smoothing factor β = 0.8. However, in the code, the variable alpha_pow is calculated differently. That is the next piece to dig into. It seem like wdsp author is trying to scale the fixed 0.8 value to work with a different overlap and sampling rate value.. The matlab code from the authors of the paper shows a fixed 0.8 value.)

Next steps

I want to understand how to introduce a new block into wdsp and start playing with variants of these algorithms.