Separating Processing from its Effects

Presentation by David P. Reaves, III
Co-Owner, TransLanTech Sound, LLC
NAB 2006, Las Vegas, April 26, 2006


Introduction

For a long time, I've been trying to figure out ways to make processing less audible. I figure a lot of people would like to do the same, so that's been the focus of TransLanTech Sound's development since we formed nearly a decade ago.

But just lately it occurred to me that there are people who still assume that processing has to have a "sound," and that perhaps a few have never even considered that there might be ways to usefully and competitively process audio with little or no side effects.

And then, there are also those people who really like a "processed" sound, and this presentation is for you, too, because once you minimize any arbitrary effects, you are then free to add back in just those effects you really want, with better consistency of effect.

And that is where I got the idea for this presentation.


So, what do we mean by "Effect"?

The effects we're going to talk about are those that are, or at least were originally, unintended byproducts of control processes such as limiting, AGC compression and clipping.

Any electronic manipulation of an audio signal may result in some kind of audible change. These changes, good or bad, intentional or not, are side effects, sometimes called 'artifacts.' These effects can be audible, sometimes to the point that they draw attention to themselves. They may be caused by any of a multitude of mechanisms. Ultimately, what we must care about is what the listener eventually hears.


Effects: Transmission and Conversion

Certain effects are caused by transmission, signal conversion or modulation systems. Typical of such effects are those resulting from bandwidth-limiting on AM, pre-emphasis on FM and data reduction in digital signals. These effects are static, built into the system, and they are equal opportunity, similarly affecting all stations within a particular service.

We can't get rid of those effects or any negative impact they may have. Where careful processing can help to make these transmission or conversion effects less noticeable, in many cases aggressive processing will do the opposite, exaggerating them.


Effects: Processing

Dynamic processing effects are much more variable than the static effects of transmission. This variation is influenced by the programming material, the choice of equipment and its setup. While transmission effects are out of our control, we have great latitude in controlling Processing effects.

Before we go further, let's look at how we got to where we are.


A Brief History of Broadcast Processing: AM

In the early years of broadcasting, there wasn't any processing. All level control was manual. The technical requirements for early AM radio processing were to maximize the signal to overcome background noise while preventing overmodulation of the transmitter. So, the first processing for Broadcast consisted of light peak limiting, with fairly slow release times and fairly open compression ratios.

Early examples of automatic level processing created the first automatic processing artifacts. They probably weren't obvious, because the process wasn't attempting to do much. Meanwhile, over in the slightly more mature recording industry, a similar gradual integration of automatic processing was also taking place into popular music production techniques.


A Brief History of Broadcast Processing: FM

Since by nature FM is much more free of noise than AM, its processing requirements were not so obvious. While FM Processing systems were initially similar to that for AM, ultimately the use of pre-emphasis presented a new, serious problem, and attempts to control pre-emphasised audio created a new set of effects. Once again, though, these effects were not noticeable if the processing was used very lightly.


A Brief History of Broadcast Processing: TV

Television introduced yet a different set of complications. For one, until the late 1970s the networks fed most of their affiliates very narrow band audio, not much better than telephone quality. In order for advertisements to 'stick out,' special processing was adopted by the TV advertisement producers to build up the loudness energy in the ads. The contrast between relatively unprocessed programming and highly-processed commercials became an issue in the 1960s, and continues today.


At any rate, after automatic processing had been in use for some time and the boundaries of control parameters had been pushed a bit, some people realized that there is a certain attraction to the process effect itself for its own sake and the modern concept of processing for effect was born, very nearly simultaneously in radio and in the recording industry.


A Brief History of Broadcast Processing: The Processors

In the half-century-plus of automated level processing, we've seen many phases of evolution.
One way to look at this evolution is that there are roughly three sometimes overlapping paths that have been taken by processor makers:

Power: The path that creates a more powerful signal with the application of aggressive processing control, and with secondary regard for sound quality;

Quality: The path that attempts to find ways to refine and apply a similar amount of power in a more elegant manner to allow control with as little noticeable effect as possible, with primary concern for natural sound and secondary concern for loudness, and

Effect: Processing generally for the sake of the audible effect.

The Power and Quality paths are near mirror images. You might even say that they complement each other, leveraging upon each other's accomplishments in the field. Because every time steps are made to increase processing power to the next level, someone else will come along and finesse that technique until any side effects are greatly reduced. Then the first guy comes back and uses that new technique to escalate his design. It goes back and forth.

Meanwhile, someone falls in love with the side effect that the Power people created and the Quality people are trying to eliminate. Which brings us up to date:

Today the Broadcasting and Recording industries are both married to sophisticated audio processing.

But with all this increasing technological complexity, one important factor seems to have been left out of the equation:


Oh, the Humanity!

For the moment, let's try to consider processing and its effects viewed from the perspective of the audience, the ultimate arbiter and consumer of our product. At the outset, it's important to state the obvious:

The audience reacts to sound on a primitive, very nearly pure subconscious level.

If you ask them to explain in detail what they like, they probably can't tell you. Their conscious mind is much more focused on the programming.

As far as the majority of the audience is concerned, radio or TV sound is simply either acceptable, or not. All of the multiple ways sound can be modified that are of so much concern to us are simply not relevant to them. What is important boils down to this:

While their conscious mind is not paying attention to the sound, their subconscious is always ready to make a decision if the sound goes outside their comfort range.


Job One

Therefore, our job, and I think this is probably the most important thing I will say today, should be to:

Because as long they are satisfied, the audience members won't reach for the dial or remote control. And they will stay tuned to that which is entertaining or informing them.

So, working from that premise, one way to enhance our chances of keeping the audience might be to re-focus our processing efforts less on bells and whistles available to us and more on the two root needs that are of importance to the consumer:

With processing, these two goals can be at odds with each other, and it's our job as engineers to finesse the two.


If we accept that there is limit to the amount of processing effects that the audience finds acceptable, then it follows that there is a relationship between amount of processing effects and audience maintenance over time.

It's my contention that processing for source to source level consistency improves the listening comfort level, as long as its effects don't become excessive. However, as you increase the audibility of processing, there comes a point where more processing does not increase the audience interest but rather becomes an irritant, and begins to turn them away. Our industry calls this Listener Fatigue.


The trick to avoiding listener fatigue is to find the sweet spot that fits your audience and your programming, because there is a relation between content, audience and medium that can help determine the amount of processing required, and separately, the amount of effect tolerated.


General opinion has it that females, especially adults, are more sensitive to fatigue than males, especially young males, so if you program to 18 year old men your considerations regarding listener fatigue are not the same as for those who provide programming to 35 year old women.

Because listener fatigue increases with time, if you only expect your listener to devote a maximum of five or ten minutes to your radio station, you can afford to throw subtlety to the wind. But if you expect and desire longer listening, then attention to subtle, even the subliminal processing effects becomes much more important.


Reducing Effects, One by One

If we could reduce the effects of processing while not giving up the competitive advantages offered by the processing itself, we could use that as a 'zero base.' Then we could add in a subjective amount of effect to keep our signal interesting, while avoiding getting to the intensity where it becomes merely irritating.

Let's look at some typically problematic effects of processing: pumping, breathing, the 'busy' sound and clipping distortion and see how modern processing techniques have been able to reduce or eliminate their effects.


Pumping

The best example of processor "pumping" I can think of is that of spectral modulation, where the higher intensity of a certain range of frequencies modulates the level of the entire bandwidth See animation, below. Note that the spectrum is dominated by energy in the low frequency range. Imagine a throbbing kick drum...


Pumping is especially noticeable when the modulating frequencies, usually bass, are subsequently removed by band-limiting in transmission and/or reception, and only the modulation upon the rest of the program remains. Note in the animation, (below) that while the original, processed low frequency range (red) has been reduced, its 'pumping' artifact still pushes the overall level (green) up and down.


Eliminating pumping is usually accomplished by processing with multiple bands:

If one portion of the spectrum is louder than another, control of it will not affect the level in the weaker frequency ranges so long as they are in separate level control bands.


Breathing

Breathing is (once again, my definition), when the attack and release of automatic level control becomes noticeable. While pumping is noticeable over a relatively small change in level, breathing usually involves gross level changes. Multiple bands won't reduce or eliminate breathing, but multiple time constants or a window hysteresis characteristic release algorithm will. Using intelligence to decide when to make a change is the key to eliminating breathing.


In dynamic control, it's the transitions that create the effect. After all, if there is no change, there is no effect. So, by opportunistically stopping the processor from making a change you reduce the amount of time that the processor is making level transitions.


Windowing Release

In the above animation, the incoming audio level is represented by the blue line.

When programming is changing level, one can correct it almost with impunity. As a level goes down, a counter upward correction, if needed, is not noticeable. As a level goes up, a downward level correction is also not readily audible. About the only time you can't change levels inaudibly is when they are extremely steady. Window release exploits this fact.


Audibility of Attack/Release

One of the interesting things I discovered while developing our Ariane processors was the way that hysteresis release seems to force a psychological disconnect between attack and release. When I say psychological disconnect, what I mean is the listener does not get a reinforcing cue that they have just heard an artifact.

With traditional processing the attack of a high level segment of program is immediately followed by a release and both actions follow, rather closely, the overall envelope of the audio. The gain is in a constant state of change, up or down, a constant audible reminder that something is happening.

(See below; black line = envelope of unprocessed audio; blue = attack; green = release)

As illustrated above, the traditional processing model looks for opportunities to process, filling every gap when possible. What if we instead look for opportunities to NOT process?


When using a window-hysteresis release algorithm, the release does not immediately follow attack:

The above illustration is another way of looking at the previous window-hysteresis animation, The upper section of the drawing represents the audio waveform (in black) as it adjusts the "window." The lower stripe represents the amplitude of the resultant control signal.It's clear there are much fewer attacks (blue) and releases (green) than the previous illustration.

When you can have large lengths of time of no activity between the attack and release, the sonic reinforcement phenomenon is greatly diminished, and along with it the sensation of effect is greatly reduced.

I believe that this type of processing not only allows the most faithful reproduction of the short-term dynamics in the program, it allows previously processed or over-processed material to avoid further processing, and it has this bonus of also neatly avoiding the usual audible reminder that processing is going on.


So Breathing is eliminated by using some form of intelligence to analyze the existing dynamics with the goal of preventing gain change unless it is actually needed.


Reducing the "Busy" Sound

Most everyone in our business has seen this chart of equal loudness curves, (below), casually referred to as Fletcher-Munson curves, named after the authors of a groundbreaking study in 1933. The chart describes kind an inverted frequency response of human hearing over frequency and intensity ranges, and clearly illustrates the human sensitivity to midrange and relatively insensitivity to bass and treble.

There is another set of equal loudness curves that I find equally interesting, and especially pertinent today, from a study by Plomp and Bouman, in 1959:

It illustrates human hearing's special relationship with RMS energy, correlating perceived loudness with a given RMS energy level of increasing duration. Within a certain time range, less than about one-third of a second, the longer the length of the audio at a given level, the louder it sounds.

This phenomenon is true because the ear integrates energy over time. As the chart shows, a sound that lasts, say, 5ms would have to be 15 dB higher in level to sound equally loud as a similar sound that lasts for 300ms. (It is clear why the designers of the VU meter chose a response time of 300 ms: that value of integration time makes the meter correlate well with human hearing.)

You can also look at it from the opposite way: If a processor reacted to a signal of 5 ms duration the same way it reacted to a 300 ms signal, as far as the ear is concerned it would be overreacting to the shorter duration signal. This is why I prefer RMS detection for level control: because peak processing, while very good for protection from overload, does not naturally correlate with the human ear. Peak processing is for the meters, RMS is for the ears.

So if we are searching for a less-busy AGC, here is another good ingredient: RMS detection, over a time period that at least approaches hundreds of milliseconds.


Clipping Distortion

Distortion can be caused by unintentional or even intentional non-linearities in the audio path, or timing-related distortions resulting from AGC or limiting operating faster than the audio waveform.

Unprocessed audio:

Clipping distortion is the result of intentional hard limiting, which generates dense harmonics, and especially in FM with its pre-emphasis, unnaturally dense high frequency energy.

Clipped audio:

There have been many clever methods to reduce the audibility of clipping within the clipper itself, and some algorithms work quite well.

But much of today's recordings incorporate clipping as an effect, and this should be taken into consideration. There is presently no known way to reliably and consistently remove clipping from live program, so the best way to avoid making things worse is to avoid the use of peak limiting and clipping throughout the system unless absolutely necessary for protection, using only an intelligent averaging AGC to keep the processed material out of further distortion and to maintain loudness consistency.

By relying more on the RMS AGC rather than the peak limiting for overall level control, a given amount of clipping is less noticeable at a given loudness level. But there are limits even to this. Clipping IS after all distortion, and any amount you use should be weighed carefully against your particular audience requirements and acceptance threshold.


Data Reduction

I mentioned in passing before that there are ways to minimize artifacts when processing for media that use data compression, such as IBOC. The simplest way to reduce effect here is the same as for with clipping: make greater use of average level control, less reliance on peak control for loudness consistency, and intelligently and automatically reduce or disable those portions of the processing that are redundant whenever the programming is already severely processed. There are certainly other methods that are being investigated, but this one is known to help out right now.


Some Conclusions

The sensation of any process effect and its level of acceptance in the universe of the intended audience is, by definition, highly subjective.

The more of this subjective effect our processing adds to the original signal, the further we move away from the original. Whether this is overall positive, negative or neutral is the heart of our discussion today.

I conclude today with a few bullet points:


Even though much of what I have said today involves my opinion, I would hope that what I've offered is food for thought to those among you who are in the position of controlling the nature and extent of processing for listeners.

There's no way to avoid audible artifacts when extremely aggressive amounts of processing are used. But if the desire is to present a product which will attract an audience and encourage them to be able to listen for longer TSL, then careful use of modern no-effect or low-effect processing with just a touch of effects would seem to be a very sure route to success.