Audio processing and lossy audio transport to transmitter sites

I'm writing this post because it happens quite often that an FM radio station asks me how they can improve their sound, and then when they tell me how they have set up their audio processing and connections, it turns out that things are - to say it mildly - sub-optimal. This post does not contain any brilliant new insights or things that nobody else knows. If you're sure that you're an expert in this field, you can stop reading now. :)

The problem

I probably don't need to tell you this, but using lossy audio encoding such as MP2, MP3 and other lossy formats has a bad effect on audio quality. But in some cases, a station might not have another option to transport audio from the studio to its transmitter site(s).

If you take a nicely processed and strictly limited or clipped signal, which you need for FM, and encode it to - for example - MP3, the result will sound nearly the same, but the peak levels will change quite dramatically - depending on the bitrate you can expect multiple dB's of peaks. This makes the signal after decoding unsuitable for feeding it to an FM transmitter, it needs to be limited or clipped first. That, or the volume must be lowered by multiple dB's to make room for the extra peaks, which is unacceptable in most markets.

Because of this, some stations choose to put do the whole processing at the transmitter site. They take the (wildly varying) audio from the studio, encode it to MP2, MP3 or another format, and at the transmitter site, decode it and process it. This may seem to make sense, but it's actually really bad. While encoding to a lossy format, the encoder will analyse the sound and try to determine which sounds you can hear and which you can't hear. That means that it will sound kinda ok if you don't change the sound anymore afterwards. Processing, which greatly changes the volume levels, spectral balance and stereo separation, will void many of the assumptions that the lossy encoder made, thus dramatically increasing the audibility of lossy encoding artifacts. Just try playing a lower bitrate MP3 on a surround system and you'll know what I mean.

Why is processing lossy encoded audio bad?

Here's a simple example to clarify what happens if you encode to MP3, and then process the audio.

If you have a sudden complex loud sound like a hi-hat, encoding to MP3 tends to cause something called pre-ringing: The sound is already audible (at a much lower level) before it should actually start. There's also post-ringing, but our ears can handle that (it's similar to reverb). Pre-ringing sounds very unnatural, and it makes music sound very non-dynamic because it smooths out transients over time. If you use a compressor with a very fast attack and release, if you feed it through that compressor first and then encode to MP3, you get a low level of pre-ringing. If you do the opposite: Encode to MP3, decode (now you have pre-ringing) and then feeding it through the compressor, the pre-ringing will not be lowered by the compressor but the transient itself is, and the audio level of the pre-ringing can easily be 6-12 dB louder than in the opposite case.

The solution

The solution is pretty simple:

All the processing that has a big effect on the audio should be done before lossy encoding.
All the processing that has a big effect on the peak level should be done after lossy encoding.

This means that the processor needs to be split into two parts.

There is an added advantage to doing this: Depending on the processor, the processing might actually make things easier for the lossy encoder. For example, if you use Stereo Tool, the Declipper removes harmonics, which makes the signal simpler (many harmonics are gone, which the lossy encoder would have had to encode otherwise), and the noise gate removes very soft sounds. Performing declipping at the transmitter site would work less well because the distinction between good (non-clipped) and bad (clipped) samples is no longer that strict after lossy encoding, and the noise gate would be removing sounds that the encoder has just spent an enormous effort on to encode them, at the cost of deteriorating other sounds. (Note however that using the noise gate afterwards might help to reduce pre-ringing a bit. See below.)

How to do it in Stereo Tool

The clipper settings in a Stereo Tool preset are tweaked to match the output of the other filters in the preset. You don't want to loose that. Fortunately, the solution is pretty simple:

At the studio:

Load an FM preset, turn the FM part off and lower the Advanced Clipper drive to a level where the audio never or almost never hits the clipper (example vaue: 0.50).
Lower the Post Amp slider by a few dB to protect the MP3 decoder from clipping (it needs a few dB of headroom; example value: 0.70).

At the transmitter site:

Load the same preset
Turn all the filters before the clipper off: Declipper, PNR Noise & Hum, Natural Dynamics, Phase Rotation and Phase Delay, Noise Gate, AGC, Multiband Compressor, Stereo Widener, Singleband Compressor, Bandpass filter.
Turn the Delossifier on. This filter detects and removes pre-ringing caused by lossy encoding.
Multiply the values of the Post Amp slider and Advanced Clipper drive at the studio side. With the example values above, 0.70 * 0.50 = 0.35. Divide 1 by this number (1 / 0.35 = 2.86). Set the Pre Amp at the transmitter site to that value. Doing this makes the level going into the Advanced Clipper equal to that in the original preset if you hadn't split it into two parts.

Together this will give you a sound that (except for the lossy encoding step) is identical to the sound that you would get when running a single Stereo Tool instance with the same preset.

A single Stereo Tool FM license can be split up like this, so to do this you don't need to buy two licenses (but you do need two computers).

Optimizing for lossy encoding

Just a final list of filters in Stereo Tool that you should really use when you need to send audio through a lossy encoding step (that also includes using it for streaming, by the way!)

Declipper
Removes harmonics, making things a lot easier for the encoder.
PNR Noise & Hum
Removes constant sounds and reduces hiss. Should also make things a bit easier, especially when combines with
Noise Gate
Reduces hiss and very soft sounds.
Stereo: AZIMUTH
Removes phase differences between left and right. Since most lossy encoders, at lower bitrates, encode a mono and (with much less data) difference channel, and the mono channel sounds pretty bad if there's a phase shift between the channels, this can have a dramatic effect on the audio quality for tracks with AZIMUTH problems.
Bandpass
It would be quite useless to encode audio upto say 18 kHz if you only broadcast the audio upto about 15-16 kHz on FM.

< Back to overview