Audio
The world's first broadcast automation audio compression system was developed
by Oscar Bonello, an Engineering professor at the University of Buenos Aires.In
1983, he started researching the subject using the recently developed IBM PC computer.
The problems that he faced were creating a good quality audio PC card, inventing
a bit compression algorithm, and writing the automation software. Solidyne 922:
The world first bit compression card for PC, 1990 Solidyne 922: The world first
bit compression card for PC, 1990 The audio card was designed with standard CMOS
logic ICs and, using hardware, performed a custom ECAM bit compression algorithm
based on the psychoacoustic principle of the masking of critical bands first published
by Eberhard Zwicker and Richard Feldtkeller in 1967. Today the same principle is
used in most lossy audio bit compression systems. Bonello's audio card was designed
for the then-current ISA slots of the PC and worked using direct access to PC memory.
The driver for this software was developed by Gustavo Pesci, a young software engineer,
formerly a pupil of Bonello at the University of Buenos Aires. The card hardware
was designed by Ricardo Sidoti and Elio Demaria.
Digital audio technologies
* Digital Audio Broadcasting (DAB)
* Digital audio workstation
* Digital audio player
Storage technologies:
* Digital Audio Tape (DAT)
* Compact disc (CD)
* DVD DVD-A
* MiniDisc
* Super Audio CD
* various audio file formats
Audio-specific interfaces include:
* AC97 (Audio Codec 1997) interface between Integrated circuits on PC motherboards
* Intel High Definition Audio A modern replacement for AC97
* ADAT interface
* AES/EBU interface with XLR connectors
* AES47, Professional AES3 digital audio over Asynchronous Transfer Mode networks
* I²S (Inter-IC sound) interface between Integrated circuits in consumer electronics
* MADI Multichannel Audio Digital Interface * MIDI low-bandwidth interconnect for
carrying instrument data; cannot carry sound
* S/PDIF, either over coaxial cable or TOSLINK
* TDIF, Tascam proprietary format with D-sub cable Naturally, any digital bus (e.g.,
USB, FireWire, and PCI) can carry digital audio.
Transform domain methods
In order to determine what information in an audio signal is perceptually irrelevant,
most lossy compression algorithms use transforms such as the modified discrete cosine
transform (MDCT) to convert time domain sampled waveforms into a transform domain.
Once transformed, typically into the frequency domain, component frequencies can
be allocated bits according to how audible they are. Audibility of spectral components
is determined by first calculating a masking threshold, below which it is estimated
that sounds will be beyond the limits of human perception. The masking threshold
is calculated using the absolute threshold of hearing and the principles of simultaneous
masking - the phenomenon wherein a signal is masked by another signal separated
by frequency - and, in some cases, temporal masking - where a signal is masked by
another signal separated by time. Equal-loudness contours may also be used to weight
the perceptual importance of different components. Models of the human ear-brain
combination incorporating such effects are often called psychoacoustic models.
Time domain methods
Other types of lossy compressors, such as the linear predictive coding (LPC) used
with speech, are source-based coders. These coders use a model of the sound's generator
(such as the human vocal tract with LPC) to whiten the audio signal (i.e., flatten
its spectrum) prior to quantization. LPC may also be thought of as a basic perceptual
coding technique; reconstruction of an audio signal using a linear predictor shapes
the coder's quantization noise into the spectrum of the target signal, partially
masking it.
Applications
Due to the nature of lossy algorithms, audio quality suffers when a file is decompressed
and recompressed (generational losses). This makes lossy compression unsuitable
for storing the intermediate results in professional audio engineering applications,
such as sound editing and multitrack recording. However, they are very popular with
end users (particularly MP3), as a megabyte can store about a minute's worth of
music at adequate quality.
Usability
Usability of lossy audio codecs is determined by: * Perceived audio quality * Compression
factor * Speed of compression and decompression * Inherent latency of algorithm
(critical for real-time streaming applications; see below) * Software and hardware
support Lossy formats are often used for the distribution of streaming audio, or
interactive applications (such as the coding of speech for digital transmission
in cell phone networks). In such applications, the data must be decompressed as
the data flows, rather than after the entire data stream has been transmitted. Not
all audio codecs can be used for streaming applications, and for such applications
a codec designed to stream data effectively will usually be chosen. Latency results
from the methods used to encode and decode the data. Some codecs will analyze a
longer segment of the data to optimize efficiency, and then code it in a manner
that requires a larger segment of data at one time in order to decode. (Often codecs
create segments called a "frame" to create discrete data segments for encoding and
decoding.) The inherent latency of the coding algorithm can be critical; for example,
when there is two-way transmission of data, such as with a telephone conversation,
significant delays may seriously degrade the perceived quality. In contrast to the
speed of compression, which is proportional to the number of operations required
by the algorithm, here latency refers to the number of samples which must be analysed
before a block of audio is processed. In the minimum case, latency is 0 zero samples
(e.g., if the coder/decoder simply reduces the number of bits used to quantize the
signal). Time domain algorithms such as LPC also often have low latencies, hence
their popularity in speech coding for telephony. In algorithms such as MP3, however,
a large number of samples have to be analyzed in order to implement a psychoacoustic
model in the frequency domain, and latency is on the order of 23 ms (46 ms for two-way
communication). Speech encoding Speech encoding is an important category of audio
data compression. The perceptual models used to estimate what a human ear can hear
are generally somewhat different from those used for music. The range of frequencies
needed to convey the sounds of a human voice are normally far narrower than that
needed for music, and the sound is normally less complex. As a result, speech can
be encoded at high quality using relatively low bit rates. This is accomplished,
in general, by some combination of two approaches: * Only encoding sounds that could
be made by a single human voice. * Throwing away more of the data in the signal
-- keeping just enough to reconstruct an "intelligible" voice rather than the full
frequency range of human hearing.
Audio engineering is a part of audio science dealing with the recording and reproduction
of sound through mechanical and electronic means. The field draws on many disciplines,
including electrical engineering, acoustics, psychoacoustics, and music. Unlike
acoustical engineering, audio engineering generally does not deal with noise control
or acoustical design. However, an audio engineer is often closer to the creative
and technical aspects of audio rather than formal engineering. An audio engineer
must be proficient with different types of recording media, such as analog tape,
digital multitrack recorders and workstations, and computer knowledge. With the
advent of the digital age, it is becoming more and more important for the audio
engineer to be versed in the understanding of software and hardware integration
from synchronization to analog to digital transfers.
Audio file format An audio file format is a container format for storing audio data
on a computer system. The general approach towards storing digital audio is to sample
the audio voltage which, on playback, would correspond to a certain position of
the membrane in a speaker of the individual channels with a certain resolution —
the number of bits per sample — in regular intervals (forming the sample rate).
This data can then be stored uncompressed or compressed to reduce the file size.
Types of formats It is important to distinguish between a file format and a codec.
A codec performs the encoding and decoding of the raw audio data while the data
itself is stored in a file with a specific audio file format. Though most audio
file formats support only one audio codec, a file format may support multiple codecs,
as AVI does. There are three major groups of audio file formats:
* Uncompressed audio formats, such as WAV, AIFF and AU;
* formats with lossless compression, such as FLAC, Monkey's Audio (filename extension
APE), WavPack (filename extension WV), Shorten, Tom's lossless Audio Kompressor
(TAK), TTA, Apple Lossless and lossless Windows Media Audio (WMA).
* formats with lossy compression, such as MP3, Vorbis, Musepack, lossy Windows Media
Audio (WMA) and AAC.
Uncompressed audio format
There is one major uncompressed audio format, PCM, which is usually stored as a
.wav on Windows or as .aiff on Mac OS. WAV is a flexible file format designed to
store more or less any combination of sampling rates or bitrates. This makes it
an adequate file format for storing and archiving an original recording. A lossless
compressed format would require more processing for the same time recorded, but
would be more efficient in terms of space used. WAV, like any other uncompressed
format, encodes all sounds, whether they are complex sounds or absolute silence,
with the same number of bits per unit of time. As an example, a file containing
a minute of playing by a symphonic orchestra would be the same size as a minute
of absolute silence if they were both stored in WAV. If the files were encoded with
TTA, the first file would be marginally smaller, and the second file taking up almost
no space at all. However, to encode the files to TTA would take significantly more
time than encoding the files to the WAV format. The WAV format is based on the RIFF
file format, which is similar to the IFF format. BWF (Broadcast Wave Format) is
a standard audio format created by the European Broadcasting Union as a successor
to WAV. BWF allows metadata to be stored in the file. See European Broadcasting
Union: Specification of the Broadcast Wave Format — A format for audio data files
in broadcasting. EBU Technical document 3285, July 1997. This format is the primary
recording format used in many professional Audio Workstations used in the Television
and Film industry. Stand-alone, file based, multi-track recorders from Sound Devices,
Zaxcom, HHB USA, Fostex, and Aaton all use BWF as their preferred file format for
recording multi-track audio files with SMPTE Time Code reference. This standardized
Time Stamp in the Broadcast Wave File allows for easy synchronization with a separate
picture element. Lossless audio formats Lossless audio formats (such as TTA and
FLAC) provide a compression ratio of about 2:1.
Free and open file formats
* wav – standard audio file container format used mainly in Windows PCs. Commonly
used for storing uncompressed (PCM), CD-quality sound files, which means that they
can be large in size — around 10 MB per minute. Wave files can also contain data
encoded with a variety of codecs to reduce the file size (for example the GSM or
mp3 codecs). Wav files use a RIFF structure.
* ogg – a free, open source container format supporting a variety of codecs, the
most popular of which is the audio codec Vorbis. Vorbis offers better compression
than MP3 but is less popular.
* mpc - Musepack or MPC (formerly known as MPEGplus, MPEG+ or MP+) is an open source
lossy audio codec, specifically optimized for transparent compression of stereo
audio at bitrates of 160–180 kbps. Musepack and Ogg Vorbis are rated as the two
best available codecs for high-quality lossy audio compression in many double-blind
listening tests. Nevertheless, Musepack is even less popular than Ogg Vorbis and
nowadays is used mainly by the audiophiles.
* flac – a lossless compression codec. You can think of lossless compression as
like zip but for audio. If you compress a PCM file to flac and then restore it again
it will be a perfect copy of the original. (All the other codecs discussed here
are lossy which means a small part of the quality is lost). The cost of this losslessness
is that the compression ratio is not good. Flac is recommended for archiving PCM
files where quality is important (e.g. broadcast or music use).
* aiff – the standard audio file format used by Apple. It is like a wav file for
the Mac.
* raw – a raw file can contain audio in any codec but is usually used with PCM audio
data. It is rarely used except for technical tests.
* au – the standard audio file format used by Sun, Unix and Java. The audio in au
files can be PCM or compressed with the μ-law, a-μlaw or G729 codecs.
Open file formats
* gsm – designed for telephony use in Europe, gsm is a very practical format for
telephone quality voice. It makes a good compromise between file size and quality.
Note that wav files can also be encoded with the gsm codec
* dct – A variable codec format designed for dictation. It has dictation header
information and can be encrypted (often required by medical confidentiality laws).
* vox – the vox format most commonly uses the Dialogic ADPCM (Adaptive Differential
Pulse Code Modulation) codec. Similar to other ADPCM formats, it compresses to 4-bits.
Vox format files are similar to wave files except that the vox files contain no
information about the file itself so the codec sample rate and number of channels
must first be specified in order to play a vox file.
* aac – the Advanced Audio Coding format is based on the MPEG2 and MPEG4 standards.
aac files are usually ADTS or ADIF containers. * mp4/m4a – MPEG-4 audio most often
AAC but sometimes MP2/MP3
Proprietary formats
* mp3 – the MPEG Layer-3 format is the most popular format for downloading and storing
music. By eliminating portions of the audio file that are essentially inaudible,
mp3 files are compressed to roughly one-tenth the size of an equivalent PCM file
while maintaining good audio quality.
* wma – the popular Windows Media Audio format owned by Microsoft. Designed with
Digital Rights Management (DRM) abilities for copy protection.
* atrac (.wav) – the older style Sony ATRAC format. It always has a .wav file extension.
To open these files simply install the ATRAC3 drivers.
* ra – a Real Audio format designed for streaming audio over the Internet. The .ra
format allows files to be stored in a self-contained fashion on a computer, with
all of the audio data contained inside the file itself.
* ram – a text file that contains a link to the Internet address where the Real
Audio file is stored. The .ram file contains no audio data itself.
* dss – Digital Speech Standard files are an Olympus proprietary format. It is a
fairly old and poor codec. Prefer gsm or mp3 where the recorder allows. It allows
additional data to be held in the file header.
* msv – a Sony proprietary format for Memory Stick compressed voice files.
* dvf – a Sony proprietary format for compressed voice files; commonly used by Sony
dictation recorders.
* mp4 – A proprietary version of AAC in MP4 with Digital Rights Management developed
by Apple for use in music downloaded from their iTunes Music Store.
|