Processing for DAB+, HD and low bitrate streams

I'm asked regularly what you have to do to process for DAB+. In general, DAB+ (or HD) is similar to streaming, but often the bitrates are very low. Instead of typing the same advise multiple times, I decided to make a post about it.

There are some things that you can do to make a signal that's encoded at a low bitrate sound better. Unfortunately, in some cases the things that you need to do will actually make the signal that you're feeding to the codec sound less nice. So, making a really nice sound without taking the codec into account will probably result in a less good sound reaching your listeners. So, let's first see what a lossy codec (generally) does.

What a codec does - part 1

Let me start with a disclaimer: There are lots of codecs out there that can all do things things differently, and in fact, for example for MP3, the only thing that's described is how information is stored, not how you should decide which information to store. So I will take some shortcuts here and some things might not be 100% accurate. And I'm only talking about MP3/AAC/Ogg/Opus like codecs. There are others that work completely differently.

So ok. Let's take an average MP3 encoder for example. If we feed it a file with audio, it will split it up into short chunks of a specific size, analyse which frequencies are present in each chunk, and then try to describe as accurately as possible which frequencies are in it within the bits that it has available (it will store both amplitude and phase).

If we feed it a single sine wave, it's pretty easy: It basically only needs to store 2 values.* If we add another sine wave, it needs to store 4 values. If we add 8 more, it has to store 20 values. As long as the number of tones is low enough, we can in principle perfectly reproduce the sound, and the codec will have a lot of unused* bits left.

But if we add too many tones, at some point we'll get in trouble. And then we need to start to make some hard choices: We need to remove some of the information about the audio. We can encode the amplitude and phase less precisely, or we can throw away some of the tones.

For example, if we have a tone at 20 kHz, it's not likely that anyone will be able to hear it, so we can remove it. And if we have a loud tone at 1000 Hz and a soft one at 1010 Hz, it's likely that you won't hear the softer tone so we can just skip it. Also, if we have a few loud tones and few really soft ones, we can probably remove the soft ones without it being too noticeable, even if they are not that close. For stereo sounds, we can store less precisely which sound goes to which channel - in fact if we really get in trouble we could just store some global information about which frequency range goes to which channel.

MP3 spectrum view. Holes are clearly visible.

Since everything is stored as a combination of sine waves, very sudden sounds are the hardest to encode. For example, a single click (1 sample that sticks out) consists of all frequencies, and they need to be aligned such that they cancel out everywhere except at the point of the click. Since it's impossible to reproduce everything, the cancelling won't be perfect, and that's where pre- and post-ringing comes from.

Click before (L) and after (R) MP3 encoding. The after signal has pre-/post-ringing and a messy spectrum.

All these things happen in MP3s. This is what causes holes in the spectrum, and if you only listen to the left minus right difference signal, it can sound extremely bad.

Processing lossy encoded files

Note that all these things also explain (see my earlier post) why you should always try to avoid performing processing after encoding to a lossy codec, all the assumptions that the codec made (there's not much L-R signal so we can just encode it with only a few bits, a specific area of the spectrum is very quiet so we can just skip it) can be voided by processing, such as stereo widening and multiband compression.

(Just as two quick examples: If you use multiband compression on the first image, the darker (softer) areas are boosted, but the gaps in them will still be gaps. They will be far more noticeable. In the second image, if you use wideband compression, the click will push the level down at the point of the click. The pre-ringing audio before it could be several times louder relative to the click than it is in this image.)

Stacking codecs

This might be the place where I should mention why stacking codecs is a really bad idea. AAC for example makes different tradeoffs than MP3, and suffers far less from the 'holes in the spectrum' issue. But, if you feed MP3 encoded audio to an AAC codec, that will force it to try to make those holes anyway (because it will also try to avoid adding sounds that weren't there), and hence use its precious bits to describe and recreate those holes. This means that you retain the badness created by the MP3 codec, and add more AAC badness because it has less bits left to encode the sound that you actually want it to encode.

What a codec does - part 2

Normal music very rarely consists of pure tones. Even a single instrument rarely consists of pure tones. A frequency analysis will hence not show a single tone but a myrad of frequencies, even if you just play on a flute or a trumpet.

Let's go back to our little chunk of audio. Let's assume that someone played a trumpet. First of all, a single trumpet tone consists of many different sine waves at different frequencies - but that's fine, we can handle a whole bunch of frequencies. But it gets worse. If the frequency changes inside the chunk - and when have you ever heard a trumpet that sounds completely constant over time, unless it was generated by a computer - or if the volume changes even marginally - who can blow with a completely constant airflow? - that information has to go somewhere too! So where does it go?

Well, since the only thing that we have is frequencies with phases, changes in the frequency are described as extra frequencies which at some places in the chunk may cancel each other out, and at other places they add up. (Any waveform can be described as a sum of a finite number of sine waves, in fact you'll never need more than half of the number of samples in the chunk). Neat. The bad news is that just for one sine wave that changes a bit, due to the way the analysis works, chances are that you already need all of those sine waves to describe it. And the phase and amplitude information must match precisely for this to work. But... there's no way to store all that information in a low bitrate format. Aaaandd... we have an oops!

So the only thing we can do is throw away some information. You'll probably not notice it unless the bitrate gets very low - in that case instruments tend to sound more "robot like". (Try encoding a trumpet with strong vibrato at an extremely low bitrate, such as 32 kbit/s or even lower - much of the details in the vibrato will disappear).

This research isn't the best I've ever seen to put it mildly, but based on what I just wrote their conclusions seem to make sense: http://www.aes.org/e-lib/browse.cfm?elib=18523

What does this mean? (Or: Why did you tell me all this?)

So we can basically conclude that the simpler the sound spectrum for each chunk is, the better the chunk can be encoded. If the volume or frequency of a sound changes during the chunk, it becomes more difficult to encode it without introducing artifacts - so if you push it too far you'll get more encoding artifacts.

Limiting and fast compression

Limiting causes very brief jumps in volume, and hence lots of extra frequencies. So using limiting before lossy encoding is a bad idea. The same is true for fast compression.

Compression

Using fast compression - either in wideband or multiband mode - will cause problems for lossy codecs, especially at lower bitrates. For example in Stereo Tool, many FM presets use the second multiband compressor to add some 'sparkle' and warmth to the sound. This works great on FM but it's really not a good idea for streaming.

Multiband compression tends to make the spectrum 'fuller', it basically lifts up the quieter parts which makes it impossible for the codec to throw away certain parts. This will mainly affect the MP3 codecs, AAC for example doesn't throw away bigger areas anyway.

I'm not suggesting to not use multiband compression at all (the tradeoff might be worth it), but you should avoid fast moving compressors.

Clipping

Pure digital clipping causes distortion over the whole spectrum, and really has dramatic effects on lossy codecs. Note that the clipper in Stereo Tool and Omnia SST can be set up to be completely clean, in that case the effect is probably less than that of both limiting or any other clipper. But reduced dynamcs still means that the spectrum is slightly fuller than before, so the effect is not completely absent. All other clippers that I'm aware of also introduce distortion and should be avoided.

Noise gating

Noise gating can be a good idea, but be careful that you don't cause lossy compression-like artifacts by overusing it!

Stereo processing

For low bitrates, avoid extreme stereo separation, especially phase differences. If you use Stereo Tool, enable "Multipath Stereo" (I added it a while ago to reduce FM multipath issues and only recently realized that beside for that, it can be extremely useful for streaming as well - I'll rename it soon.)

Declipping

This also means that declipping helps a lot, because it cleans up the audio before it goes into the processing.

What does this mean for low bitrate STL's?

If you want to use a lossy codec for an STL, everything that's described here still applies. It might seem to be a good idea to do the processing after the STL, but that's actually a very bad idea - see one of my earlier posts for an elaborate discussion of that.

If possible, use µMPX for your STL! µMPX (MicroMPX) is based on completely different techniques than MP3, AAC, Ogg, Opus and similar codecs. It cannot reach very low bitrates but at higher bitrates it very quickly surpasses the quality of those other codecs, and it doesn't suffer from the same types of artifacts. If anything, it adds white noise, which is very easily masked on FM. And at higher bitrates (320 kbit/s and up) the total effect on the audio is also smaller than that of many other codecs (in comparison, the artifact level is about 9 dB lower than that of MP3s at 320 kbit/s, and again, it's only white noise).

If you really need a very low bitrate STL, then the same things that apply for low bitrate streams applies as well. There's no real way around it, again, doing processing after the STL is even worse. Try to get a higher bitrate STL if possible.....

Conclusion

In general, for streaming you need to aim for a more natural (some people might call it boring :) ) sound. Except in some extreme cases, avoid limiting, clipping and fast compression.

* Due to how analysis methods work, it probably won't be this efficient - but that's not relevant for this article.

< Back to overview