Digital Audio File Formats Explained - August 2014

Outer Front Cover
Contents
Publisher's Letter: Electricity is a boon; electricity is a killer!
Feature: Your House Water Pipes Could Electrocute You by Leo Simpson
Feature: Digital Audio File Formats Explained by Nicholas Vinen
Feature: Is Your Wireless Microphone Soon To Be Illegal? by Ross Tester
Project: Nirvana Valve Sound Simulator by John Clarke
Project: The 44-pin Micromite Module by Geoff Graham
Project: The Tempmaster Thermostat Mk.3 by Jim Rowe
Project: Build a Resistor/Capacitor Substitution Box by Ross Tester
Product Showcase
Review: Atlas DCA75 Pro Semiconductor Analyser by Nicholas Vinen
Subscriptions
Vintage Radio: AWA Empire State 5-valve radio by Associate Professor Graham Parslow
Order Form
Market Centre
Advertising Index
Outer Back Cover

This is only a preview of the August 2014 issue of Silicon Chip.

You can view 41 of the 104 pages in the full issue, including the advertisments.

For full access, purchase the issue for $10.00 or subscribe for access to the latest issues.

Purchase a printed copy of this issue for $10.00.

Digital Audio File Formats Explained By NICHOLAS VINEN 11010001 11010001 001 1 0 0 1 0 0011110000011 110 0 1 0 011 11 In the digital world, there are a lot more ways to store and transmit audio data than there are analog forms. The differences, advantages and disadvantages of these formats are not obvious. Here are some details on the various formats and their differences. P CM, MP3, WAV, FLAC, AAC, OGG, WMA – these are all pretty cryptic names for common audio file formats. So if you want to store audio on your computer, phone etc, which format should you use and why? It depends on a number of factors as each format comprises a different set of compromises. as an analog signal (eg, from a micro phone) and also ends up as an analog signal; typically, going to an amplifier to drive a set of speakers or headphones. An analog-to-digital converter (ADC) converts the original analog signal to a digital format while at the other end of the chain, a digital-toanalog (DAC) converter turns it back into analog. The digital output of the ADC is usually some form of Pulse-Code Digital audio basics All digital audio ultimately starts 0 Sample value 0us 0 10 20 Sample number 20 Silicon Chip 750us 1ms 3V 1.5V 16 0V 14 12 5 3 1111 2 30 2 2 3 5 6 Input signal voltage Time <at> 44.1kHz 250us 500us 31 31 30 29 31 30 29 28 27 26 28 27 26 24 24 24 22 22 20 20 18 18 16 16 16 14 12 10 8 8 6 32 10 -1.5V 8 40 44 -3V Fig.1: how a 1kHz analog sinewave (red) is converted to PCM values (sequence of green numbers). This shows 32 voltage steps when a real PCM waveform normally uses at least 65,536. The voltage is sampled at a constant rate (here, 44.1kHz) and the nearest value for the voltage at that point (blue dots) is stored as the next number in the sequence. Modulation (PCM) and this can be regarded as the most basic form of digital audio. Two common examples of digital audio formats based on PCM are the Microsoft WAV file format and CD audio. Fig.1 shows how a continuously varying analog signal (red) is converted to a set of data points (blue). The horizontal axis is the timebase and the vertical lines represent sampling periods which occur at fixed intervals. The number of such sampling intervals per second is known as the sampling rate and is typically at least 44.1kHz for good-quality sound reproduction. The vertical axis represents voltage and this too is divided up into a number of equal intervals. For CD audio, there are 65,536 such intervals representing a total voltage of about 6V, to handle signals up to about 2.1V RMS. Each interval therefore covers a range of about 0.1mV. The number of voltage steps is the resolution and since 65,536 = 216 this is known as 16-bit audio (note: Fig.1 has a lot less steps for the purposes of illustration). At each sampling period, the analogto-digital converter finds the horisiliconchip.com.au LEFT RIGHT IN IN ADC ADC OUT OUT ANALOG to DIGITAL I2S (PCM or DSD) 1.5-9.2 Mbps 'LOSSY' OR 'LOSSLESS' AUDIO ENCODER (SOFTWARE/ HARDWARE) MP3, FLAC, AAC etc 0.1-5 Mbps SOME INFORMATION LOST IN THIS STAGE IF 'LOSSY' CODEC IS USED STORAGE MEDIUM (HARD DISK/ FLASH) AND/OR BROADCAST MP3, FLAC, AAC etc 0.1-5 Mbps 'LOSSY' OR 'LOSSLESS' AUDIO DECODER (SOFTWARE/ HARDWARE) I2S (PCM or DSD) 1.5-9.2 Mbps IN DAC OUT IN DAC OUT LEFT RIGHT DIGITAL to ANALOG OUTPUT WAVEFORM IS NOT IDENTICAL TO INPUT WAVEFORM IF 'LOSSY' CODEC IS USED Fig.2: compressing digital audio adds an extra step into both recording and playback. Once analog data has been digitised (and possibly mixed), it passes through an encoder which produces an output bitstream at a lower rate than its input. This can then be more easily transmitted or stored. Later, before being played back, the data passes through a decoder which reconstructs the PCM audio data (or a close facsimile) to be sent to the DAC. zontal line closest to the input signal voltage at that time and since each line is numbered, by storing this series of numbers, we form a digital approximation of the waveform shape (blue dots). The storage required for this data is the product of the number of bits of vertical resolution, the sampling rate and the number of channels. So for CD quality audio this is 16 x 44,100 x 2 = 1.4Mbit/s or 172.266KB/s. Why 44.1kHz? This frequency was chosen for historical reasons as it divides evenly into PAL and NTSC frequencies, allowing easy synchronisation with video tape. The important thing is that it’s more than twice the highest frequency humans can hear (~20kHz) so the Nyquist limit is high enough (more on that later) and the anti-aliasing filter has a sufficiently large 4.1kHz transition band. At this point, you may have noticed that some of the blue dots in Fig.1 are not precisely on the red curve due to the limited voltage and time resolution. As a result, if you drew a line through these points, it would not be a pure sinewave like the input. But the DAC has a ‘reconstruction filter’ which smooths the output to give a result very close to the input signal despite this, as long as the input only contains frequencies up to the Nyquist limit, which is half the sampling rate. In this case, that means frequencies up to about 22kHz are reconstructed almost perfectly. Experience tells us that with a goodquality ADC and DAC, most people can’t hear any imperfections in CDquality audio recordings. There are arguments to be made that higher sampling rates and bit depths (eg, 96kHz siliconchip.com.au and 24-bit) have certain advantages, hence formats such as DVD-Audio and SACD, as described below. However, there is a further issue to consider and that is that 172KB/s of data results in quite large storage requirements. A typical CD holds up to 72 minutes of audio with a 720MB capacity. If you have a 32GB SD card, that would hold about 40 full CDs worth of audio; more if the CDs weren’t entirely full (but some CDs can contain up to 80 minutes/800MB). In order to fit more recordings into the same amount of space, newer formats which require less storage have been developed in the last 20 years. With those formats which gain the most dramatic reductions though, small errors are introduced in the reconstruction process. These are known as ‘lossy’ compression formats, because there is some loss in audio quality. However, ‘lossless’ audio compression can also be used, to reduce storage requirements modestly without this drawback. Fig.2 shows a typical audio chain, from an analog recording (eg, an old tape master) through to a digital stream which is then compressed, stored and/ or transmitted, then decompressed and converted back into analog audio for amplification. Original Copied Copied Copied Copied Difference Fig.3: at top is a 90ms snippet of audio from a music CD. Below this we have copied and pasted each cycle over the subsequent cycle as a crude method of ‘predicting’ the shape of the audio based on previous samples. The difference, at bottom, shows that the error (‘residual’) with even this crude prediction method has significantly less amplitude than the audio signal itself and thus requires less storage space. August 2014 21 32 0us 250us Time <at> 44.1kHz 500us 750us 1ms 8 6 Sample value 24 5 16 4 3 8 2 1 Quantisation error 0 0 10 0 +4 10 20 Sample number 20 30 40 44 30 40 44 0 0 Quantised sample value 7 Fig.4: an example of quantisation, which can be applied to residuals or other less important signals to reduce the amount of storage space they require. The number of vertical divisions is reduced; in this case from 32 to 8 and the nearest points are selected instead. The resulting quantisation error is shown at bottom. Normally, though, this is applied to spectral (frequency domain) data rather than time domain data as the deleterious effects are less severe. -4 So just how much does lossy compression affect the sound and are you willing to put up with that in exchange for fitting more audio? And if you are very fussy, just how much audio can you fit into a limited amount of space? Lossless compression There are a few common ‘lossless’ audio CODECs (encoders/decoders) available. Perhaps the most common is called FLAC, or Free Lossless Audio Codec. Like most lossless CODECs, FLAC typically achieves a 40-45% reduction in storage requirement over PCM without affecting audio quality at all. Other similar CODECs include Apple Lossless, WavPack, TAK and APE but the differences are minor. If you aren’t interested in the details of how this is achieved, you can skip right ahead to the next section. Many readers will be familiar with compression programs such as “ZIP” and “RAR” which can reduce the size of many types of file, sometimes quite substantially. Unfortunately, if you try to ZIP up a PCM audio file (eg, WAV format), you will save little if any space. That’s because despite PCM audio containing quite a bit of redundancy, the general-purpose pattern matching algorithms used in archiving software such as ZIP are not effective at identifying and eliminating it. To realise lossless compression for audio, we have to consider the nature of the signal itself. For a start, while 22 Silicon Chip the vertical scale of the PCM data has to be large enough to encompass the peak signal amplitude, much of the time the signal level is much lower than this; in other words, most audio has significant dynamic range. During the quieter passages, PCM ‘wastes bits’ (or fractions of bits) which are always zero but are nevertheless stored. So one simple (but limited) approach to lossless audio compression is to create a PCM-like file format where the bit depth (resolution) can vary over time, to save bits by taking advantage of amplitude variations in the signal. FLAC effectively does this, using a technique called “Rice coding”. But that only gives modest gains in space efficiency; perhaps 10% at best. A more advanced approach is to realise that audio signals tend to be quite repetitive and while subsequent cycles of a waveform are almost never identical to the previous cycle, they are often quite similar. So if you can use one cycle of audio to ‘predict’ the next, you only have to store the error between the prediction and reality. By then applying that difference during decoding, you can reconstruct the next part of the waveform exactly. This error signal is known as the ‘residual’ and it is normally much lower in amplitude than the signal itself. The compressor can store the ‘prediction’ parameters, which take little space, and since the residual’s amplitude is low, the aforementioned dynamic bit depth coding will give a much greater reduction in storage space. This process is illustrated with a typical audio fragment in Fig.3. All we have done is taken a 90ms snippet from a music CD (top), then created a predicted version (middle) by simply copying each previous cycle to form the next; something that a decoder can easily do, since it has access to past audio samples. The bottom panel shows the difference between these two signals. As you can see, it’s much lower in amplitude than the original and can thus be stored much more efficiently, despite the crudeness of our approach. Of course, what FLAC and other lossless compressors do is much more advanced than this. For example, they can try multiple different prediction methods for each section of audio and store the results of whichever takes up the least number of bits. But the general principle is the same. Other lossless methods Meridian Lossless Packing (MLP) is a commercial lossless CODEC that is used for DVD-Audio. This is slightly more space-efficient than FLAC but it is a proprietary, patented system, compared to FLAC which is free to download and use (including the source code). FLAC is also somewhat faster, especially to decode. There’s a lot of detailed information on how MLP operates at www.meridian-audio. com/w_paper/mlp_jap_new.PDF Ultimately, there isn’t much practical difference between FLAC and the other lossless CODECs, including Apple Lossless (ALAC), except for popularity. Apple computer users will find their software has better support for ALAC while Windows/Linux users will be better off with FLAC. Apple audio player hardware such as iPods and iPhones also support ALAC while other devices, including Android phones, typically use FLAC. Lossy compression With lossless compression, you can fit around 60 CDs worth of audio onto a 32GB SD card rather than 40 CDs worth. That’s an improvement but it is possible to do better, using a lossy compression method. One way to do this would be to use the same method as FLAC but ‘quantise’ the residual. That means effectively rounding some of the residual values up or down, in order siliconchip.com.au siliconchip.com.au +80 Psychoacoustic Masking +70 Masking tone +60 Sound Pressure Level (dB(SPL)) to create less possible sample values. Less values take less bits to store, saving space (see Fig.4). This is not a particularly good approach but it would work; since the residual values are usually small, the error signal introduced (shown in red on Fig.4) would also be small. To get really good compression ratios (ie, increase the ratio between the original PCM data size and the compressed data size), more advanced techniques are required. The first lossy compression method with very high compression and good audio quality was MPEG-2 (Moving Pictures Expert Group) Audio Layer III, better known as “MP3”. This was developed by the Fraunhofer-Gesellschaft research institute in Germany in the early 90s. MP3 followed on the heels of MPEG1 Audio Layer 2 (MP2), formalised in the late 1980s. MP2 is a less advanced coding method which has a worse compression ratio. However, it also results in less audible degradation and is still in use today for broadcasting, as it is simpler (and thus faster) to encode and decode than MP3. The added complexity of MP3 is in its heavy reliance on a psychoacoustic model. This takes advantage of psychoacoustic masking, a property of human hearing whereby tones at certain frequencies can make simultaneous tones at other frequencies but of lower amplitudes inaudible (‘masked’). In other words, if both tones are present in a signal, depending on the relative levels, the human brain will perceive the louder one but not the other. Thus, it is possible to do a spectral analysis of the audio data and ‘chop out’ certain signal components (frequencies) without (in theory, at least) affecting how it sounds. This is illustrated in Fig.5. Note that the tones don’t necessarily need to be simultaneous; if one comes immediately after the other it may still be masked and this is referred to as ‘temporal masking’. The resulting simplified spectrum can then be quantised (as explained earlier), re-ordered and compressed using a standard ‘entropy coding’ method (such as Huffman) to give a much smaller amount of data than the original PCM. This is usually done by chopping the PCM audio data up into variable-sized overlapping blocks and then compressing them separately. Another option to achieve good compression with reasonable sound +50 Masked tone +40 Psychoacoustic 'shadow' +30 +20 +10 Threshold of audibility 0 -10 20 50 100 200 500 1k 2k 5k 10k 20k Frequency (Hz) Fig.5: an illustration of psychoacoustic masking. Tones with amplitudes below the threshold of audibility (mauve) are always inaudible but when loud tones are present (eg, 300Hz <at> +65dB as shown in green), even tones above the audibility threshold can be masked and generally not perceived as audible. In this case, the 150Hz +39dB tone (shown in red) is within the other signal’s “shadow” and thus could theoretically be removed without changing the overall sound. quality is to separate the spectrum out into the important parts, which the listener is expected to hear, and the unimportant parts which will be partially or completely masked and then decimate the latter more heavily using a more severe quantisation scheme. During playback, the compressed frequency-domain audio blocks are unpacked and converted back into time domain data. The snippets of reconstructed sound are then joined back together using a ‘windowing’ method to get rid of any discontinuities caused by the imprecise storage method, ie, where the signal at the end of one block wouldn’t necessarily end up at the same voltage as the start of the next block. Windowing takes advantage of the overlap to smooth these transitions. MP3 also uses the similarities between the two channels in a stereo recording to reduce the size, storing them as a sum and (lossy) difference with a technique known as “joint stereo” (using a method such as ‘intensity coding’ or ‘mid/side coding’), thereby saving further space. The overall amount of compression varies, depending on how aggressively the psychoacoustic model removes ‘redundant’ signals and also by controlling the amount of quantisation of the resulting data. In practice, for MP3, the reduction in size ranges from about 77% (320kbps) to 93% (96kbps). At 96kbps/s, there will be a rather noticeable impact to audio quality; at 320kbps/s, not so much. Note that “kbps” refers to kilobits per second and may also be written “kbit”. To convert from kilobits per second to kilobytes per second, divide by eight, ie, 128kbps = 16kB/s. MP3 compression thus allows for something like 4-10 times more audio to be stored in the same amount of space as raw PCM. Or to put it another way, 250-500 full CDs can fit onto a 32GB SD card with reasonable sound quality. That’s quite an improvement! What’s the catch? So what’s the catch? Well, if you’re listening to relatively high bit-rate MP3 files on a noisy bus or in a car, you probably won’t tell the difference August 2014 23 Constant Bitrate (CBR) Fixed bitrate (128kbps) Varying quality factor (q) Variable Bitrate (VBR) Fixed quality factor (q=6) 1m35s 1m45s 55s 1m20s q=8, 1.5MB q=4, 1.6MB q=7, 0.85MB q=4, 1.2MB 96kbps, 1.1MB 50s q=9, 0.75MB (5.9MB) (5.8MB) 160kbps, 1.5MB 160kbps, 2.0MB 108kbps, 0.7MB 80kbps, 0.5MB Fig.6: this illustrates the difference between constant bit rate (CBR) and variable bit rate (VBR) encoding. The encoder can either vary the quality factor to maintain a constant bit rate or use a fixed quality factor which results in the bit rate varying with signal complexity. As shown here, both methods can produce a file of the same size but the CBR file will have the more complex passages encoded with a low quality factor which could result in poor sound quality. between it and the original recording – even with a decent car audio system. But with a proper hifi set-up, the difference between an MP3 and a CD can be stark for critical listeners. Some more recent lossy CODECs claim to do a better job of reproducing CD quality; more on this below. But if you’re a discerning listener with reasonable hearing acuity, lossless compression is still your best choice. Variable vs fixed bit rate When a lossy compression algorithm is applied to normal audio data, even if each block of raw audio data processed is the same size, it will generally produce compressed data blocks of varying size. That’s because the complexity of the audio signal varies over time. For example, a cymbal clash contains a wide range of frequency components and so will not compress anywhere near as well as, say, a bass guitar by itself. Many audio files also contain short gaps of (near) silence, which may not be obvious during listening. So while the PCM data is recorded or played at a fixed rate, the natural compressed data stream naturally has a varying rate. Sometimes, this is undesirable – for example, in a broadcast, there will be a fixed amount of bandwidth allocated to audio. The maximum compressed data rate must not exceed this and while smaller blocks could be padded to fit, that would simply waste bandwidth. In this case, the best solution is to adjust the ‘lossiness’ of the compression algorithm block-by-block, in order to produce compressed data with a 24 Silicon Chip more-or-less fixed bit rate and then use padding to make up the difference; see Fig.6. This also has the advantage that the ratio between the uncompressed and compressed data is fixed. For example, if you are compressing CD-quality WAV files to 192kbps MP3 files, you know that the MP3 files will be exactly 192kHz ÷ (44kHz x 16 bits x 2 channels) = 13.6% the size of the originals. However, there’s little reason to do this if you are simply creating files to store on, say, a phone or PC. In this case, it would make more sense to use a fixed quality level and let the bit rate vary. This is known as ‘variable bit rate’ encoding or VBR. With VBR, some files will have a higher compression ratio and some lower, depending on the content. However, for a given quality setting (which determines psychoacoustic masking aggressiveness, quantisation factors, etc), the variation is generally only of the order of 25%. MP3s ain’t identical You might think that if you used two different pieces of software to produce similarly sized MP3 files from the same CD or WAV file source, they would sound essentially the same. But this isn’t necessarily the case. During MP3 encoding (or indeed, any lossy encoding), the encoder has thousands of decisions to make for each block of audio processed in order to produce the smallest output which loses the least information. For example, during the psychoacoustic modelling process, there are many signals which could be removed from the sound in different combinations and different encoders may choose to remove different frequencies to achieve the desired reduction in signal complexity. There is also a speed trade-off as encoders which take longer may have more time to ‘explore’ all the possible combinations of masking, quantisation etc and determine the best combination to achieve the required compressed data size. There are also many different metrics which the encoder can use to determine which is the ‘best’ outcome. Therefore, a more carefully designed MP3 encoder can produce significantly better sounding MP3 files at the same size (or even smaller!) as a poorly written encoder. So if you’re going to compress hundreds of CDs to MP3 format, it pays to do your research first and pick encoding software which gives the best quality output. This may even allow you to use a lower ultimate bit rate for the same sound quality, thus fitting more data on to your storage medium. While this is a subjective evaluation (and readers are invited to do their own research via Google), some encoders are generally considered superior. One of the better-regarded MP3 encoders is the free, open source, multi-platform “LAME”. Even this, though, has many different settings which give different results. Suggested quality levels for a good size/sound quality trade-off are the “-V0” (~245kbps), “-V1” (~225kbps), “-V2” (~190kbps) or “-V3” (~175kbps) options; see http:// wiki.hydrogenaud.io/index.php? title=Lame Advanced compression Since MP3 was formalised in 1995, a number of improved CODECs have been developed. In some cases, the siliconchip.com.au Ogg Vorbis Another post-MP3 format is Ogg Vorbis. This was developed specifically because MP3 is a patented algorithm and Fraunhofer charge a fee for using the technology. In contrast, Ogg Vorbis is a free, open source alternative which can give superior audio quality to MP3 at some (usually higher) bit rates. One source of subjective comparisons of lossy audio CODECs and encoders is http://soundexpert.org/encoders We have graphed the information from this website and smoothed it considerably, to give Fig.7. This suggests that AAC is the best choice above 224kbps, Vorbis the best between 112kbps and 224kbps, and AAC+ the best choice below 112kbps. Note the large increase in perceived quality above 224kbps, suggesting that if you want to play lossily compressed files through a hifi system, the best compromise between quality and size is probably somewhere around 256288kbps and thus AAC is the CODEC to use. This gives a compression ratio relative to CD-quality PCM audio data of around 5.5:1 – still very worthwhile. siliconchip.com.au Multi-CODEC Comparison, Subjective Evaluation (SoundExpert.org) AAC AAC+ MP3 Vorbis MPC WMA 18 16 Subjective Sound Quality Score aim was to produce an algorithm with a similar compression ratio to MP3 but with better audio quality. In other cases, the aim was to produce better compression without such objectionable artefacts as are present in low bit rate MP3 files (<128kbps). One of the more successful codecs has been AAC and its variations, AAC+ and HE-AAC, These were developed as a successor to MP3 for the MPEG-4 standard and have also been adopted by Apple for use with iTunes. AAC is generally regarded has having better audio quality than MP3 at the same bit rate while AAC+ is optimised for lower bit rates and gives little or no benefit at settings of 128kbps and above. In fact, AAC+ is generally inferior to both MP3 and AAC at higher bit rates (192kbps+). While many consider 128kbps AAC to give good sound quality, we feel that as with MP3, you really need 192kbps to even get close to CD quality. Note that DAB digital radio uses AAC encoding and DAB+ uses AAC+, so the same comments apply. Unfortunately, few DAB+ radio stations in Australia are encoded at rates above 64kbps! The result is that they sound inferior to a simultaneous FM broadcast of the same program (assuming good reception). 14 12 10 8 6 4 2 0 32 64 96 128 160 192 Stereo Bitrate (kbps) 224 256 288 320 Fig.7: a comparison of the subjective scores awarded to audio samples compressed with different audio CODECs and varying bit rates. While this can only be considered a guide, it shows the perceived audio quality of most lossy CODECs goes up significantly above 256kbps and also that certain CODECs seem to sound better than others for particular ranges of bit rates. To get an idea of what these bit rates really mean, refer to Table 1. This shows how much audio you can fit, in terms of hours or average CDs per GB. This applies both to storage (ie, how much you can fit on an xGB flash drive) and transmission (ie, how much bandwidth you will use streaming digital radio at a specific bit rate). Encapsulation When you have a digital audio stream, it needs to be “encapsulated” somehow to be stored. For example, PCM audio can be encapsulated in the WAV format which in addition to the PCM audio data itself, includes information at the start of the file indicating the sampling rate, bit resolution, number of channels and length. CD audio (‘redbook’) also uses the PCM format but it involves a more complex encapsulation for two reasons: (1) it adds error checking and correction (ECC) information so that small scratches or divots on the surface of the CD do not render it unplayable (and hopefully won’t affect the sound at all); and (2) it provides feedback to the user as to which track they are listening to, how far they are into the track and allows seeking and skipping to specific tracks. As a result, each ‘sector’ of a CD, containing 2352 bytes of PCM audio data, is actually 3234 bytes in size. The extra 882 bytes per sector includes two 392-byte ECC blocks and 98 bytes of side-channel/control data. There are 75 sectors of data for each second of audio. A redbook audio CD also has a table of contents, listing the location of up to 99 tracks along with their length, the duration of any pauses between tracks etc. Encapsulation can also include the ability to store track names, authors, composers, genre etc. For example, CD audio includes the ability to store track names using the CD-Text extension although few discs contain such information. Other types of digital audio encapsulation for storage include: • FLAC: this can be encapsulated in its own simple container format (.flac), or it can be stored within an “Ogg” file, which is the same encapsulation as used for the Vorbis CODEC which also supports metadata (track name, author, etc). • MP3: can either be stored in an “elementary stream” (.mp3 file), with optional ID3 metadata tag at the beginning or end, or in an MPEG stream, possibly along with video data. • AAC: can be encapsulated in an MPEG-2 or MPEG-4 stream. Also used for DAB+, DVB-H or can be contained in an “ISO base media file” (.aac file). • Vorbis: generally either appears in an Ogg file (with or without Theora Video) or in a Matroska file, which is intended to be a flexible multimedia August 2014 25 Table 1: Storage Required For Typical Audio Bit Rates Bit rate Hours/CDs per GB Hours/CDs per 32GB Data per hour/CD 64kbps 37 1165 28MB 96kbps 25 775 41MB 128kbps 19 580 55MB 160kbps 15 466 69MB 192kbps 12 390 82MB 224kbps 10 333 96MB 256kbps 9 291 110MB 288kbps 8 259 124MB 320kbps 7 233 137MB 1.4Mbps (CD) 1.7 53 606MB container format (akin to Microsoft’s AVI). Having been read from the source file or media, the same data may then be transmitted to a different piece of equipment or a different IC within the same device. This is generally done by re-encapsulating the extracted digital audio data in one of several transmission formats: • S/PDIF: a two-wire format using biphase encoding, intended for transmitting audio data between media players, amplifiers, receivers and so on. S/PDIF can carry linear PCM, Dolby Digital, DTS and other formats, along with metadata describing the contents of the data and its source. The optical version of S/PDIF is known as TOSLINK. • I2S or one of its variants: a simple method for transmitting PCM audio data between ICs within a device, similar to SPI. Typically involves a bit clock line (typically 32 or 64 times the sampling rate), word clock (at the sampling rate), data bit transmit and/ or receive lines, plus a master clock which is typically between 128 and 1024 times the sampling rate. • MPEG transport stream: while this is used as a file format (with an extension such as .mpg or .mp4) it is also intended to be used as a transmission format and is used for digital TV, among other purposes. MPEG streams can contain video, audio or both and can also include subtitles and other metadata. Multi-channel formats Multi-channel formats compress three or more channels of audio for “surround sound”. They usually have a relatively high bit rate (eg, 384kbps+) as they are intended for use with movies 26 Silicon Chip where significant degradation in sound quality is not acceptable. However, multi-channel formats are also sometimes used for music recordings, to give a more ‘immersive’ or ‘live’ sound. With some exceptions, these formats generally have inferior sound quality to CD-quality PCM. Of the two most common 5.1 channel formats, DTS is usually considered to have superior quality to Dolby Digital (AC3) at the same bit rate. As with stereo CODECs, multichannel formats take advantage of the similarity in content between channels to achieve good compression. They also use the fact that some channels only operate over a limited range of frequencies, especially the subwoofer or “low frequency effects” channel (the “.1” in 5.1 or 7.1). The sound quality of the left and right channels is generally the most critical as these carry most of the music; centre is used mainly for voice while surround channels mostly carry effects so degradation on those channels is less objectionable. Thus, the bit rate of a 5.1-channel audio stream is usually no more than about twice that of a stereo recording. Dolby Digital 5.1 and DTS 5.1 were the most common multi-channel formats in the early days of DVDs. More recently, with the introduction of HD-DVD (now obsolete) and Blu-ray, both Dolby Labs and Digital Theatre Systems have come up with higher quality formats that support even more channels, eg, 7.1 surround sound with a total of eight channels. More recent multi-channel formats such as Dolby Digital Plus, Dolby TrueHD, DTS Neo, DTS 96/24 and DTS-HD increase audio quality through higher bit rates and in some cases, use lossless compression. However, the general principle remains the same. DVDs use an MPEG-2 stream and allow linear PCM, MP2, AC3 or DTS compressed audio data to be interleaved with the video. Multiple audio streams can be interleaved, to support different numbers of channels or languages. DVD-audio adds the ability to carry Meridian Lossless Packing (MLP) audio data at higher sampling rates and bit depths such as 24-bit 96kHz or 24-bit 192kHz. DVD-audio players thus generally have higher-quality DACs plus the ability to decode these streams. In addition, DVD-audio discs can contain Dolby Digital and DTS tracks. Non-PCM audio data While virtually every digital audio format is either based around PCM or derived from PCM, there are other formats. Super Audio CD or SACD is one of these and it is based on PulseDensity Modulation Encoding (PDME) which Sony and Philips refer to as Direct-Stream Digital (DSD). Rather than using a sampling rate of 44.1kHz, they use 2.8224MHz (ie, 64 times higher) but each sample is just a single bit. Noise shaping is used to allow the one-bit data stream to accurately encode an analog signal at a much lower frequency. The reason for using PDME rather than PCM is that most modern DACs are the Delta-Sigma type, which typically comprise a 4-bit DAC operating at a similar frequency, ie, some multiple of the incoming PCM data sampling rate. The advantage of this approach is that it’s much cheaper to fabricate a 4-bit DAC with good linearity than a 16-bit DAC. In addition, the much higher noise frequency means that the output analog filter doesn’t need to be anywhere near as steep and so it can be much simpler. The logic therefore is this: if the DAC is going to have to convert the PCM to some form of PDME internally, why not simply store and transmit the data in this format? It certainly is a valid approach but one criticism levelled at DSD is that it’s much more difficult to process audio in this format than PCM data, and converting between PDME and PCM is not simple. Perhaps it is for this reason that DVD-audio uses traditional PCM encoding, although with higher sampling rates and bit depths. SC siliconchip.com.au