Click before (L) and after (R) MP3 encoding. The after signal has pre-/post-ringing and a messy spectrum.
All these things happen in MP3s. This is what causes holes in the spectrum, and if you only listen to the left minus right difference signal, it can sound extremely bad.
Processing lossy encoded files
Note that all these things also explain (see my earlier post) why you should always try to avoid performing processing after encoding to a lossy codec, all the assumptions that the codec made (there's not much L-R signal so we can just encode it with only a few bits, a specific area of the spectrum is very quiet so we can just skip it) can be voided by processing, such as stereo widening and multiband compression.
(Just as two quick examples: If you use multiband compression on the first image, the darker (softer) areas are boosted, but the gaps in them will still be gaps. They will be far more noticeable. In the second image, if you use wideband compression, the click will push the level down at the point of the click. The pre-ringing audio before it could be several times louder relative to the click than it is in this image.)
This might be the place where I should mention why stacking codecs is a really bad idea. AAC for example makes different tradeoffs than MP3, and suffers far less from the 'holes in the spectrum' issue. But, if you feed MP3 encoded audio to an AAC codec, that will force it to try to make those holes anyway (because it will also try to avoid adding sounds that weren't there), and hence use its precious bits to describe and recreate those holes. This means that you retain the badness created by the MP3 codec, and add more AAC badness because it has less bits left to encode the sound that you actually want it to encode.
What a codec does - part 2
Normal music very rarely consists of pure tones. Even a single instrument rarely consists of pure tones. A frequency analysis will hence not show a single tone but a myrad of frequencies, even if you just play on a flute or a trumpet.
Let's go back to our little chunk of audio. Let's assume that someone played a trumpet. First of all, a single trumpet tone consists of many different sine waves at different frequencies - but that's fine, we can handle a whole bunch of frequencies. But it gets worse. If the frequency changes inside the chunk - and when have you ever heard a trumpet that sounds completely constant over time, unless it was generated by a computer - or if the volume changes even marginally - who can blow with a completely constant airflow? - that information has to go somewhere too! So where does it go?
Well, since the only thing that we have is frequencies with phases, changes in the frequency are described as extra frequencies which at some places in the chunk may cancel each other out, and at other places they add up. (Any waveform can be described as a sum of a finite number of sine waves, in fact you'll never need more than half of the number of samples in the chunk). Neat. The bad news is that just for one sine wave that changes a bit, due to the way the analysis works, chances are that you already need all of those sine waves to describe it. And the phase and amplitude information must match precisely for this to work. But... there's no way to store all that information in a low bitrate format. Aaaandd... we have an oops!
So the only thing we can do is throw away some information. You'll probably not notice it unless the bitrate gets very low - in that case instruments tend to sound more "robot like". (Try encoding a trumpet with strong vibrato at an extremely low bitrate, such as 32 kbit/s or even lower - much of the details in the vibrato will disappear).
This research isn't the best I've ever seen to put it mildly, but based on what I just wrote their conclusions seem to make sense: http://www.aes.org/e-lib/browse.cfm?elib=18523
What does this mean? (Or: Why did you tell me all this?)
So we can basically conclude that the simpler the sound spectrum for each chunk is, the better the chunk can be encoded. If the volume or frequency of a sound changes during the chunk, it becomes more difficult to encode it without introducing artifacts - so if you push it too far you'll get more encoding artifacts.
Limiting and fast compression
Limiting causes very brief jumps in volume, and hence lots of extra frequencies. So using limiting before lossy encoding is a bad idea. The same is true for fast compression.
Using fast compression - either in wideband or multiband mode - will cause problems for lossy codecs, especially at lower bitrates. For example in Stereo Tool, many FM presets use the second multiband compressor to add some 'sparkle' and warmth to the sound. This works great on FM but it's really not a good idea for streaming.
Multiband compression tends to make the spectrum 'fuller', it basically lifts up the quieter parts which makes it impossible for the codec to throw away certain parts. This will mainly affect the MP3 codecs, AAC for example doesn't throw away bigger areas anyway.
I'm not suggesting to not use multiband compression at all (the tradeoff might be worth it), but you should avoid fast moving compressors.
Pure digital clipping causes distortion over the whole spectrum, and really has dramatic effects on lossy codecs. Note that the clipper in Stereo Tool and Omnia SST can be set up to be completely clean, in that case the effect is probably less than that of both limiting or any other clipper. But reduced dynamcs still means that the spectrum is slightly fuller than before, so the effect is not completely absent. All other clippers that I'm aware of also introduce distortion and should be avoided.
Noise gating can be a good idea, but be careful that you don't cause lossy compression-like artifacts by overusing it!
For low bitrates, avoid extreme stereo separation, especially phase differences. If you use Stereo Tool, enable "Multipath Stereo" (I added it a while ago to reduce FM multipath issues and only recently realized that beside for that, it can be extremely useful for streaming as well - I'll rename it soon.)
This also means that declipping helps a lot, because it cleans up the audio before it goes into the processing.
What does this mean for low bitrate STL's?
If you want to use a lossy codec for an STL, everything that's described here still applies. It might seem to be a good idea to do the processing after the STL, but that's actually a very bad idea - see one of my earlier posts for an elaborate discussion of that.
If possible, use µMPX for your STL! µMPX (MicroMPX) is based on completely different techniques than MP3, AAC, Ogg, Opus and similar codecs. It cannot reach very low bitrates but at higher bitrates it very quickly surpasses the quality of those other codecs, and it doesn't suffer from the same types of artifacts. If anything, it adds white noise, which is very easily masked on FM. And at higher bitrates (320 kbit/s and up) the total effect on the audio is also smaller than that of many other codecs (in comparison, the artifact level is about 9 dB lower than that of MP3s at 320 kbit/s, and again, it's only white noise).
If you really need a very low bitrate STL, then the same things that apply for low bitrate streams applies as well. There's no real way around it, again, doing processing after the STL is even worse. Try to get a higher bitrate STL if possible.....
In general, for streaming you need to aim for a more natural (some people might call it boring :) ) sound. Except in some extreme cases, avoid limiting, clipping and fast compression.
* Due to how analysis methods work, it probably won't be this efficient - but that's not relevant for this article.