This is only a preview of the March 2009 issue of Silicon Chip. You can view 32 of the 96 pages in the full issue, including the advertisments. For full access, purchase the issue for $10.00 or subscribe for access to the latest issues. Articles in this series:
Items relevant to "A GPS-Synchronised Clock":
Items relevant to "New, Improved Theremin Mk.2":
Items relevant to "Build A Digital Audio Millivoltmeter":
Purchase a printed copy of this issue for $10.00. |
Digital Radio
Part 2: AAC+ encoders & dec
Digital Radio broadcasts will use Advanced Audio Coding (AAC+) of th
signals. AAC+ is a complex compression process which greatly reduces
RF bandwidth and transmission power necessary to broadcast a high qu
W
hen we introduced Digital
Radio last month, we explained how this completely
new technology will commence in
Australia in just a few weeks.
The panel below summarises what
is required to broadcast high quality
audio signals. Analog hifi systems and
uncompressed digital systems (eg,
compact disc) aim to produce a close
replica of the original sound but this
uses wide bandwidth and can require
high transmission power.
The AAC+ compression process
aims to transmit what your ears and
brain perceive but employs much
reduced bandwidth and transmission
power.
Digital Radio – The System
In an AAC+ encoder, the following
information is sent to the receiver:
• A loudness signal
• Pitch/timbre signal
• Spectral Band Replication signal
• Parametric Stereo signal.
Let’s take a general view of how
these signals are produced and then
Your ear measures – at the ear drum
1. Total acoustic power (volume)
2. Fundamental frequency
3. The power of each harmonic frequency
4. The sound in this ear is less powerful at high
frequencies than the other ear
5. The sound in this ear is delayed compared to
the other ear at lower frequencies
6. Reverberation or echo delay
Analog systems and uncompressed digital
systems aim to produce an exact replica of the
original sound. Uses large bandwidth and can
require high power.
Your brain hears –
1 Loundess: soft or loud
2. Pitch: low or high (the note played)
3. Timbre: which instrument is being played
4. Sound direction: left, centre, right, above, below, in front or behind
5. Distance of the sound source: close or distant
AAC and AAC+ systems transmit what your brain hears and then
recreates the sound it needs to produce the same result in your brain.
Much reduced bandwidth and lower transmission power are required.
16 Silicon Chip
we will have a more detailed look at
the AAC+ encoder.
At the bottom of the diagram (Fig.1)
at right is an 88-key music keyboard
but with grey “extensions” added at
either end. The grey keys do not exist
but match frequencies of sound if the
keys did exist. So in music all frequencies above 4kHz are harmonics of the
key used. The power of these frequencies is used to control the SBR signal.
You will also notice that the frequency difference between keys at
the low frequencies is much less than
at high frequencies. This follows the
brain’s ability to detect a change in
pitch. AAC uses a “comb” filter and
the bandwidth of each of the “teeth”
(ie, individual filters) is not equal – it is
much wider at the higher frequencies.
Loudness Signal
The average ear is most sensitive
at about 4kHz and least sensitive at
the extremes of the audio band. So
to measure the loudness of the sound
the comb filter characteristic in Fig.1
is used.
The sound power entering the microphone can vary from just audible
to the threshold of pain. This is a
dynamic range of 1:1,000,000,000,000
or 120dB.
It is accepted that if the sound is
twice as loud, the measured power increase is +10dB or 10 times the original
power. It is also true that if the sound
is half as loud the power is one tenth
of the original or -10dB.
The human ear has a logarithmic
response to sound power. This allows
us to hear sounds from the faint russiliconchip.com.au
by
Alan Hughes
coders
he digital audio
s the amount of
uality signal.
analysed by a Fast Fourier Transform
(FFT). It converts a waveshape into the
frequencies used to create this shape.
The difference in level between the
FFT signal and the level from the comb
filter is calculated.
This difference is measured and
sent to the decoder. Any signals where
the output of the FFT is less than the
comb filter output will not be sent, ie,
they are discarded on the assumption
that they will be masked and could
not be heard.
The resulting quantised filter samples are then sent to the encoder.
Spectral Band Replication
(SBR)
Fig.1: the comb filter in an AAC+ encoder has 132 centre frequencies and these
are used to generate the pitch/timbre information.
The frequency range above 11kHz
has less effect on the perceived quality
of the sound and requires a lot of data
tling of leaves to the roar of jet engines.
The digital audio signal is handled the
same way, ie, logarithmically. The end
result is the loudness signal which is
called the scaling factor.
Pitch/Timbre Signal
Another characteristic of our hearing is “masking. This is where a strong
single frequency is heard but softer
frequencies in a frequency band either
side of this single frequency cannot
be heard.
The comb filtering effect is created
by taking samples of the signal at the
centre frequency of the “teeth”. This
sampling will cause additional alias
signals to be generated.
To use this effect the digital signal is
siliconchip.com.au
The incoming sound power is shown
referenced to 1W but the exact value depends
on the microphone amplifier’s gain and the
setting of the listener’s volume control.
Fig.2: the huge dynamic range of audio signals must be compressed before
being transmitted. The compression information becomes the loudness signal or
scaling factor produced by the AAC+ decoder.
March 2009 17
R
Rr
20 bit
48kHz
STUDIO
Parametric Stereo
Most people prefer stereo sound of
reasonable quality to mono sound.
At high rates of compression the addition of direction greatly improves
the perceived quality of the sound.
But rather than transmitting two highquality channels of sound to create
stereo, it is more efficient to transmit a
mono sound signal and add direction
information.
Direction information consists of
time differences between the left
and right ears at mid-frequencies
and strength differences at higher
frequencies.
This is due to the sound having to
travel around the head. So when a
transient occurs, the time difference
is measured between the left and right
channel and this is encoded along
with signal volume differences at high
frequencies.
Normally only the strongest signals
are transmitted and as a result, the
kilosamples/s
48
Average all
channels to
mono
STRONGEST
FREQUENCY
DETECT
FFT
DETECT
SUB-BAND
< MASK
MASK
DAB+
GENERATOR
TRANSMITTER
5
24
1
132 TOOTH
COMB
LOUDNESS
WEIGHTING
DIRECTION
DETECTION
2.5kb/s max for stereo
CALC OF
REQUIRED
BIT RATE
SBR
Least
sensitive
<at> 4kHz
1-3 kb/s
C
AES Serial Data
Analog to Digital Converter
L
to reproduce. As a result the sample
rate has been halved to 24kilosamples/
second. This is done by averaging
every pair of 48kHz samples. This
will reduce the maximum frequency
to 11.3kHz. The missing harmonics
in the sound will be simulated in the
decoder.
Unless the sound is a pure tone, the
frequencies above 11kHz are harmonics of lower frequencies. The encoder
measures and sends the level of the
sound frequencies above 11kHz. This
level is sent (within the SBR signal) to
the decoder to control the level of the
regenerated harmonics above 11kHz.
If the sound is random, then random
high frequencies will be used instead
of harmonics.
18 Silicon Chip
It splits the signal
into individual
frequencies
3.072 Mbits/s
Lr
Microphones
Fig.3: an AAC+
encoder takes 48kHz
sampled data from
the studio, averages
it to 24ks/s, removes
info below 10Hz,
averages it to mono,
passes it to an
FFT and generates
masking info and
SBR info. The comb
filter produces the
loudness weighting
info. Also added
is direction info
for stereo and
parametric 5.1 info.
LOUDNESS
Most
sensitive
<at> 4kHz
SET BIT
RATE
MULTIPLEXER
BIT RATE
ADJUST
QUANTISATION
OF SUB-BANDS
ABOVE MASK
AAC+
ENCODED DATA
subtle sound reflections which cause
reverberation are removed. These can
be restored by measuring the level of
reverberant sound and it can be recreated in the decoder. The fact that the
sound has been steered in the right
direction will also affect the recreated
reverberant sound, making it more
realistic.
AAC+ encoder details
Fig.3 shows the schematic of an
AAC+ encoder. The 48 kilosamples/
second digital audio from the studio
has every pair of samples per channel
averaged to reduce the sample rate to
24ks/s. The encoder will only use the
most significant 16 bits of each sample. The signal also has any frequency
below 10Hz, including DC, removed.
Otherwise it’s hard to keep the decoder
synchronised.
The main digital audio signal has all
time-coincident samples averaged to
produce a monophonic signal.
A comb filter samples the audio at
the frequency of each “tooth” of the
comb as shown in the AAC+ comb
filter diagram (Fig.1). The level of each
filter is modified by the loss shown on
the vertical axis of the filter diagram
and added together. This is the loudness signal.
The samples are stored so that the
difference between the current sample
and the previous sample is sent to
the decoder. So the resulting signal
for transmission says makes it louder
or softer.
The mono signal is also fed through
a Fast Fourier Transfer function which
will separate the waveform into individual frequencies. The strongest
frequency is used to increase the
sensitivity to the surrounding frequen-
cies. This characteristic is added to
the loudness weighting. Now each
frequency’s level is compared with
that signal in that frequency band
concerned. The component frequencies are compared with the respective
masked signal level.
If the incoming frequency is higher
in level than the masked signal, then
the difference in level will be quantised for transmission. The calculation of the bit rate will rank in order
the largest difference in level to the
smallest non-zero value. If the bit
rate produced is higher than what is
available, the lowest values will not
be transmitted.
The SBR (Spectral Band Replication) circuit measures the slope of
the decreasing levels of harmonics.
It transmits this slope value and the
starting level.
The direction detection measures
the level difference between the input channels and the mono signal for
frequencies above 2.5kHz. The same
applies to the phase differences for
signals between 250Hz and 2.5kHz.
Lastly, the time difference of signal
transients in each channel is compared to the mono channel. A multiplexer will select the output signals in
the pattern required by the standard.
So in conclusion, the following information is sent to the decoder (via
the encoder and transmission):
• Loudness (Dynamic Range Control
or Scaling Factor signal)
• Pitch/timbre (Quantised filter output signals & SBR signal)
• Direction (Parametric Stereo/5.1
signal).
Next month we will look at digital
radio transmission and the subsequent
program decoding in the receiver. SC
siliconchip.com.au
|