Skip to Content
Skip to Table of Contents

← Previous Article Next Article →

ATPM 6.02
February 2000



How To



Download ATPM 6.02

Choose a format:

Digital Audio and the Mac

by David Ozab,


Before we can discuss the specifics of digital sampling and synthesis, we need to ask some fundamental questions. What is sound? What causes us to perceive pitch, loudness, and timbre? How do we record and reproduce music? What are the differences between analog and digital recordings?


The phenomenon we call sound has three parts, all of which are necessary for its existence. First, a medium must be present that can transmit sound. This medium is usually air, but could also be water or a solid object, as sound passes through all three. Second, something must disturb this medium in order to create waves. These waves are generally alternations in air pressure, but they behave just as waves in water. Third, a receiver needs to be present in order to perceive the disturbance. The transmitter and receiver might be the same person, or two different individuals, but either way sound is perceived by anyone present.


If any of these parts is absent, like an exploding TIE fighter in deep space (no air) or a tree falling in the forest (no one to hear it), there is no sound.

A Complex Phenomenon Simplified

All the sounds we hear are a complex mixture of waves, of all different sizes travelling in all different directions. Yet we only hear three things: frequency, amplitude, and phase. Frequency is dependent on the length of each wave. Since the speed of sound is constant within a given medium, then the shorter the wave, the higher the frequency. Amplitude is the literal strength of the wave measured in air pressure. The relationship can be shown easily by drawing a graph of amplitude over time.


High and low pressure alternates at regular intervals, resulting in a simple periodic waveform (in this case, a sine wave). The length of the wave, 1/440 sec., also tells us the frequency indirectly. Since there are 440 cycles of pressure every second, we know the frequency is 440 Hz. The indicated amplitudes (1 and -1) simply indicate an arbitrary maximum and minimum.


If we add a second wave at the same amplitude and frequency and begin both waves at exactly same time, they will reinforce each other and the result will be a doubling of amplitude. These waves are said to be ‘in phase.’ We can delay the onset of one wave, though, and shift it out of phase. In the extreme example, with the two waves 180° out of phase, they cancel each other out altogether.


These examples are the clearest ones, but they never occur in nature. Waves are constantly cancelling and reinforcing parts of each other, but in much more complex ways. Not only are there many more than two waves present on most occasions, but the waves themselves are complex and contain many frequencies.


All natural sounds (and most artificial sounds, with one notable exception) are made up of multiple frequencies, each of which has a unique amplitude. These frequencies are called partials (partial frequencies). In the special case of musical notes, though, the partials have a particular “harmonic” relationship that allows us to perceive a unified pitch.


Some of you recognize this construction as the harmonic series. The frequencies are all related to the fundamental, or first harmonic, by whole number relationships. In this case, all the other waveforms will line up with the fundamental wave at each cycle, resulting in a clear pitch. Any time a number is doubled, it’s an octave. Other intervals are pure, and are approximated by the tempered scale (a discussion of tuning systems is way beyond the scope of this article—maybe later). The same structure exists in a piano note, a cello note, a bassoon note, etc. The amplitudes and phase relationships differ in every case, though, and these help us to distinguish between timbres.

Aside: Why Harmonics Instead of Overtones

The construct above is also referred to as the overtone series. I avoid the word overtone for the following reason: the overtone that determines the pitch is still called the fundamental, but it isn’t the first overtone. The frequency an octave higher is the first overtone, and the other overtones are numbered upward from there. It’s like stepping into an elevator in London (ok, a lift), pressing the button for the first floor and getting out one floor too high. Like a British lift, the overtone series has a ground floor that complicates matters. If I asked you “What’s the relationship between the 24th overtone and the 49th overtone?” you would just scratch your head, but if I asked you “What’s the relationship between the 25th harmonic and the 50th harmonic?” you would know immediately. They’re an octave apart.

Analog Recording

Before there were CDs and MiniDiscs and MP3s, there were LPs and cassette tapes. Even if you’ve long since parted with your record player, and haven’t used your cassette deck since you picked up that cool MP3 player, understanding how analog works is essential. It is still the intermediate step between you and your digital recording, and if you take it for granted you’ll regret it later. The term analog is short for analogous representation. The alternations in air pressure that we perceive as sound are transformed into analogous electrical voltages, sent up and down wires, through amplifiers and other components, and stored on magnetic tape. Let’s go back to our first diagram and add the analog stage.



In the diagram above, the microphone and the loudspeaker are the doors in and out of the analog world. Both are transducers: devices that change one kind of energy to another. The only difference between the two is the direction of the process.


Have you ever used a set of headphones as a really cheap microphone? Now you know why it works. Each contains three parts, a membrane (diaphragm or speaker cone), a coil, and a magnet. The only difference is whether the membrane moves the coil or the coil moves the membrane. Well, that and the size. Loudspeakers are much larger than microphones because they have to handle much stronger signals. Next time you lift a speaker, remember that you’re only lifting a box, some paper, and a really big magnet.

Digital Recording

A digital recording is simply a series of numbers representing analog waveforms. The trick is in making the measurements. Since a continuous waveform can be measured at an infinite number of places, yet we can only make and store a finite number of measurements, the result will always be approximate. The trick is to make a lot of measurements at regular intervals.

What “Sampling” Really Means

The first step in making a digital representation is an analog component called a sample/hold generator. This device reads the continuous signal, and outputs fixed voltages at regular intervals. A strobe light is a good visual metaphor. The next step is an analog to digital converter (ADC), which measures each voltage output by the sample/hold generator. The measurements are output as binary data, which can then be stored on any appropriate medium. CDs, DAT tapes, and hard drives can store any kind of digital information including digital audio.


The Advantages of Digital

Once a signal enters the digital domain, it is simply a series of numbers. These numbers can be transferred between media repeatedly without degradation. Tape, on the other hand, accumulates noise with each generation. The other advantage is flexibility. Digital audio can be easily edited on a hard drive. It can also be modified in any way that can be mathematically modelled. A sound source can be placed in a bathroom, a gymnasium, or the Taj Mahal just by pressing a few buttons.

Digital Reproduction

No matter how you manipulate them, though, they’re still numbers. At some point the sound must be reconstructed out of all those numbers. A digital to analog converter (DAC) produces voltages for each number. As long as the DAC and the ADC run at the same rate (like a movie camera and a projector must run at the same rate), the original wave is reproduced. Well, not quite. Just as a sample/hold generator is needed to break up a continuous signal, a low pass filter (more on filters next month) smooths out the digital stairsteps.


Put all the pieces together, and it looks like this.


Next month: Digital Audio and the Mac—Part Two: The Specifics of Sampling.

Copyright © 2000 David Ozab ( David Ozab is a Ph.D student at the University of Oregon, where he teaches electronic music courses and assists in the day-to-day operation of The Future Music Oregon

Also in This Series

Reader Comments (7)

Debbie Brooks · October 7, 2001 - 11:04 EST #1
Do you know the relationship between harmonic content and waveform of sound waves?
David Ozab (ATPM Staff) · October 7, 2001 - 15:17 EST #2
Good question. I had planned to cover this subject in a "Synthesis How To," but never got around to it. It's still in the works, though. Time permitting, I'll get it started in November.
Roy Griffin · January 17, 2002 - 18:18 EST #3
I'm using inexpensive audio editing shareware that will calculate FFT's. It will generate a graph that shows the relative amplitude of component frequencies and specifically indicates the frequency with the greatest amplitude. I understand that it is generally the case that the frequency with the greatest amplitude is the fundamental. My shareware will, supposedly, often indicate the fundamental in this manner. However, it is clear to me that it usually doesn't indicate the fundamental (I'm speaking particularly of segments of vocal recordings). The shareware author's claims in this regard are quite modest, but he was unable to tell me the systemic reason the FFT calculator in this shareware often won't identify the fundamental correctly. I'm wondering why so that I can systematically correct the readings, if possible. A related concern is--how does one arrive at good values to input into the virutual digital notch and bandpass filters in order to eliminate unwanted frequencies and emphasize desired ones? Where can I find this out? From what I can tell from my own researches, the process may be different from that used in dealing with hardware filters. Thanks, Roy G.
Dave Maye · December 27, 2002 - 19:56 EST #4
Why is it that some people say a sampling rate of 196 kbps is needed to truly reproduce an analog sound digitally? Do you agree?
George Halsey · April 1, 2003 - 09:04 EST #5
I put the wrong fuse in my 16-bit ADAT XT and burnt the whole power board up. Does anyone know where I can get another one? Alesis doesn't seem to know much about it. Please e-mail me. Thanks.
Jeff Grossman · January 21, 2005 - 10:04 EST #6
Roy - the local spectral content of a signal changes with time, but the concept of a "local spectrum" is really subjective. It relies on a time window, which can have arbitrary length and shape. The longer the window, the more we can resolve the frequency content of the signal; while the shorter the window is, the more poorly the frequency resolution becomes. This time-frequency resolution tradeoff is a special case of the well-known Heisenberg uncertainty principle.

So your shareware may not be windowing your audio signals in a "natural" way. That is, not the way your ears and brain perceive that signal. The window spacing, shape, and length all contribute here. Incidentally, the Gabor transform (or short time Fourier transform) is the appropriate mathematical analysis tool to use here.

My best guess is that the windows are spaced too far apart for the given window size/shape.
umwonya eunice · October 21, 2007 - 16:02 EST #7
i have learnt a great deal. am actually doing a course work on radio and television production. thanks

Add A Comment

 E-mail me new comments on this article