WHAT’S THE FAST FOURIER TRANSFORM GOT TO DO WITH IT? BRIAN CRUZ

The Fourier Transform, which can be used to break any signal down into its component frequencies, has wide application in many fields. In particular, the Fast Fourier Transform (FFT) has made it computationally feasible and practical to apply the discrete version of the Fourier transform—the Discrete Fourier Transform (DFT)—and its inverse to any signal, most ubiquitously the digital music signals that many of us in the modern world enjoy from our iPods. This paper will begin to dissect this most essential tool of the digital music era that, for most people, goes living on quietly in the background. This article will accomplish several things. First it will explain why fourier theory, perhaps moreso than all other theories, is so important for working with music. Second, it will discuss the mathematics behind the DFT that underlies the FFT. Last, we will unshroud the FFT in all its glory.

So Why the Fourier Transform? First we must ask, "what is music?" The Merriam-Webster dictionary defines it as the "art of ordering tones or sounds in succession, in combination, and in temporal relationships to produce a composition having unity and continuity." This helps some, but what exactly are "tones" are "sounds"? In order to figure out what exactly we are dealing with, we must delve a little deeper. Imagine this. A long, long time ago, when the first song was sung by a human being, the vibrating motion of this neolithic popstar’s vocal chords hit the cold cave air, inducing a mechanical chain reaction of tiny vibrations that propogated throughout the damp cavern. These vibrations—really, tiny variations in air pressure—travelled as waves, bouncing off the stone walls and into the ears of the first musical audience. These vibrations were transmitted deep into the inner ear where they met the spiral-shaped cochlea. Then what happened was really amazing. Differing parts along the length of the cochlea resonated with the differing frequencies shaking many tiny hair cells sending the whole spectrum of frequencies as electrical impulses to the brain[1]. Julius O. Smith III, who has written prolificly on the topic of digital signal processing, explained this process bluntly: "The ear is a kind of Fourier analyzer"[3], and as for our cave audience their ears were no different. What this means is that in order to study, manipulate, and even synthesize the the "tones" and "sounds" that we have come to appreciate as music, the best place to go is to the way that our own brains interpret them: as a "spectrum that evolves through time"[3]. It is the mathematics of the Fourier Transform allow us to do just that. 1

WHAT’S THE FAST FOURIER TRANSFORM GOT TO DO WITH IT?

2

Mathematics of the Discrete Fourier Transform An excellent way to describe the Fourier Transform (FT) and its discrete variety the Discrete Fourier Transform is using the language of linear algebra. If our signal follows the variations in air pressure over time then we can model it using a function x : R → R, where the domain is time and the codomain pressure (zero can represent the ambient air pressure). The FT of x(t) is a complex-valued function given by, ˆ ∞ X(ω) = x(t)e−iωt dt −∞

where ω ∈ (−∞, ∞) is the frequency in radians per unit of time[2]. How this works is most easily digestible in the language of inner product spaces. If V some subspace of the ´ ∞functions that map R to C, and if we define the inner product on V as hf, gi = −∞ f (t)g(t) dt, then the continuum of functions bω : t 7→ eiωt are orthogonal and so B = {bω | ω ∈ R} is a suitable basis of V . The FT now becomes X (ω) = hx, bω i and its inverse, x(t) =

1 2π

ˆ



X(ω)eiωt dω −∞

which can be seen as a "sum" of the projections of x(t) onto bω [3]. An important question is what functions exactly are within the subspace V ? And so what real-valued functions such as x above are within this subspace? A key problem is that of convergence, for we must expect all convergent infinite linear combinations of the bω to be candidates for our vector space (real-valued ones forming a subspace), and this actually does get us a long way, but we actually don’t need to worry about that. Fortunately, this mathematical conundrum is banished away by solving two other key problems. First, in the digital age, continuous signals are interpreted by computers in the digital, not analog, domain as discrete-time signals. Second, music is interpreted by our brains as a series of tones and sounds in time. That means we not only care that a certain frequency exists within our signal and want to know its amplitude, but we also need to know when it happens. Fortunately for us, the discrete version of the FT handles this nicely. The DFT instead of taking a function with continuous domain and mapping it to another with continuous domain, it instead deals with discrete domains. Also, rather than isolating an entire, potentially infinite length signal, it analyzes only a portion of the signal. This can be repeated many times along the length of the signal to get a sense of the spectral envelope over time[3]. Consider a signal x at N distinct and equidistant times tj in seconds and let x ∈ CN be the vector composed of the N samples x(tj ) such that x[j] = x(tj ). This means if we can find N orthogonal vectors, they should form a basis and so constructing our discrete transform will follow from using the inner product as we did in the continuous case. Of course the whole point is to use sampled sinusoids to mimic the workings of 2πi the ear. The trick is to generate them using the n roots of unity. Let wN = e N be the primative N th root of unity. Then let

WHAT’S THE FAST FOURIER TRANSFORM GOT TO DO WITH IT?

d0

=

d1

=

d2

0 0 0 0 wN , wN , wN , . . . , wN

3



 N −1 0 1 2 wN , wN , wN , . . . , wN   2(N −1) 0 2 4 = wN , wN , wN , . . . , wN .. .

dN −1

=



1(N −1)

0 wN , wN

2(N −1)

, wN

(N −1)2

, . . . , wN



What is happening here? Well, d0 is always one, and so is similar to b0 from N −1 before. If we imagine that d1 , after ending with wN , wraps back around to 0 N wN = wN , then we see it completes one cycle over the N samples. Likewise, d2 completes two cycles over N samples. In general, dj completes j cycles over the N samples. If s is the duration of one sample in seconds, it isn’t too hard to see the relationship with the continuous case: dj is the sampled sinusoid b 2πj . That is, the Ns

frequencies present in the DFT are all multiples of Njs cycles per second, or Hertz. So how can we show that the vectors dj are in fact orthogonal? Consider the inner product hdj , dk i

hdj , dk i =

N −1 X

jm km wN = wN

N −1 X

m(j−k)

wN

m=0

m=0

If j = k, then all the summands are one, and so hdj , dk i = N . On the other hand, if j 6= k, then this is a geometric series which can be rewritten as N (j−k)

1 − wN

j−k 1 − wN

in which case we get zero because the numerator vanishes[2]. Therefore our vectors are orthogonal, and so our vectors dj do form a basis. Alas, we have our Discrete Fourier Transform: X[k] = hx, dk i =

N −1 X j=0

x[j]dk [j] =

N −1 X

−jk x[j]wN

j=0

where k = 0, 1, . . . , N − 1 and the kth entry of X is the fourier coefficient for the frequency Nks Hertz[2]. Taking the inverse is as easy if we use the language of projections[2]

x[j]

=

N −1 X k=0

N −1 1 X hx, dk i dk = X[k]dk hdk , dk i N k=0

There is another interesting thing taking place here. Looking closer at dN −1 , if N we remember that wN = 1 we see that,    1(N −1) 2(N −1) (N −1)2 −1 −2 0 0 1 , wN , wN , . . . , wN = wN , wN , wN , . . . , wN dN −1 = wN

WHAT’S THE FAST FOURIER TRANSFORM GOT TO DO WITH IT?

4

which is exactly like d1 except with the multiplicative inverses. This implies that dN −1 is the sampled version of b −2π . But what does this mean, to have a sinusoidal Ns component with negative frequency? With complex sinusoids it simply means the spiral is going in the other direction, but for our ears a middle A (440 Hz) is a middle A no matter how the air is vibrating; it just matters how quickly. Furthermore, since our input signal is real, the amplitude of the negative frequency sinusoids must be matched with their the positive frequency counterparts like complex conjugates. This means that X[k] = X[N − k] when x is real, and about half of these entries are redundant. This effectively reduces the frequencies we need to think about bN c to 0, N1s . . . , N2s , which means that we can capture frequencies through sampling 1 as high as 2s or half the sampling rate of the signal in samples per second. The frequency which is half the sampling rate is actually referred to as the Nyquist frequency or Nyquist limit after Harold Nyquist who studied this relationship[2]. Since we are studying linear algebra, of course it would behoove us to write the DFT in matrix format, so here it is[4]:

       

1 1 1 .. .

1 1 wN 2 wN .. .

1

N −1 wN

1 2 wN 4 wN .. . 2(N −1)

wN

··· ··· ··· .. .

N −1 wN 2(N −1) wN

···

wN

1

.. .

(N −1)2

       

x[0] x[1] x[2] .. . x[N − 1]





      =    

X[0] X[1] X[2] .. .

      

X[N − 1]

Remember, the our goal here was to find a tool we could use to analyze the spectrum of our sound over time, and we did. The basic method is to divide our longer, sampled signal into smaller chunks, and that way the chunks are small enough to provide us the temporal information we’re looking for. But in order to analyze the frequencies, we need to make sure we have spectral breadth and resolution, provided by choosing a large enough chunks (large N ). For large N , this becomes cumbersome because this matrix gives us N 2 multiplications to carry out. That’s where an algorithm called the Fast Fourier Transform comes in.

The Fast Fourier Transform There’s a wonderful pattern that emerges from the fourier matrix above when N is highly composite, and here we’ll discuss the case where N is a power of 2. As we noticed earlier, the time it takes to compute a DFT of size N is on the order of N 2 . The algorithm I am about to present, first described by J.W. Cooley and John Tukey, will reduce this to 21 N log N [4]. If N is even, we can separate the the samples x[j] into evens xe = (x[0], x[2], . . . , x[N − 2]) and odds xo = (x[1], x[3], . . . , x[N − 1]) and then calculate the DFT as follows,

WHAT’S THE FAST FOURIER TRANSFORM GOT TO DO WITH IT?

X[k]

=

N −1 X

5

−jk x[j]wN

j=0 N 2

−1 X

=

N 2

−2jk x[2j]wN

+

−1 X

−k wN

j=0 N 2

=

j=0 N 2

−1

X

−2jk x[2j + 1]wN

−jk xe [j]wN/2

+

−k wN

−1 X

j=0

jk xo [j]wN/2

j=0

We see that we might be able to get away with calculating two DFT’s of N2 samples each, which would save us a lot of multiplication because two matricies that are N2 × N2 each still only have half the entries as one matrix that is N × N [4]. But at this point, there are still N 2 multiplications taking place here if we run k from 0 to N − 1. However, notice that the first N2 components of X[k] on differ from the last N2 only by a sign because, N X[k + ] 2

N 2

N 2

=

−1 X

−jk−N/2 xe [j]wN/2

+

−k−N/2 wN

j=0 −1 X j=0

−jk+N/2

xo [j]wN/2

j=0

N 2

=

−1 X

N 2

−jk −k xe [j]wN/2 − wN

−1 X

−jk xo [j]wN/2

j=0

N/2 wN

where = −1[2]. In essence, this means that we only need to calculate the smaller two DFT’s once to get the first N2 components of X[k], and then only subtract instead of add what we have stored in memory to recreate the all the others. Then we apply this process recursively until we have only groups  of two 1 1 samples each, for which the transform is simply the matrix , which 1 −1 requires no multiplications[2]. An Octave implementation of this algorithm—sometimes referred to as the Radix2 FFT because of how it divides work into two smaller FFT’s of half the size—is included in the appendix. According to MIT’s Gilbert Strang, using this process to compute a FFT of 4096 samples versus the full matrix multiplication of the DFT uses only 6 · 212 multiplications as opposed to 224 multiplications: about 0.15% of the time[4]. No wonder they call it fast! In fact, there are many algorithms for calculating FFT’s, all of which depend on the factorability of N —not only into twos—though they do all follow the same general reasoning. Sometimes, algorithms that use a combination of these methods at the various recursions achieve more efficiency than the one presented here[2]. Conclusion This is just an introduction to some of the ideas behind the Discrete Fourier Transform, the Fast Fourier Transform, and what those have to do with music. What we’ve seen is that these tools are so useful because they actually employ the

WHAT’S THE FAST FOURIER TRANSFORM GOT TO DO WITH IT?

6

same action that our own ears do. They tune into and distinguish many frequencies simultaneously, and the DFT and the FFT let us see what our ears hear. In terms of music and the music industry, all music is recorded digitally nowadays. The computers which we use to equalize the sounds to achieve the spectral balance we’re looking for, or that read and modify the pitches so we can all have pitch-perfect singing, employ the algorithm above, or some variation of it. Even if you just listen to music, the MP3 files many people have of their favorite artists actually store the spectral data of the songs. The truth is that behind the scenes, the Fast Fourier Transform is working very hard; and none of it would exist were it not for some of the mathematical concepts presented here. References [1] Ehret Guenter. Stiffness gradient along the basilar membrane as a way for spatial frequency analysis within the cochlea. J Acoust Soc Am, 1978. [2] Julius O. Smith. Mathematics of the Discrete Fourier Transform (DFT). W3K Publishing, http://www.w3k.org/books, 2007. [3] Julius O. Smith. Spectral Audio Signal Processing, October 2008 Draft. http://ccrma.stanford.edu/˜jos/sasp/, accessed November 17, 2010. online book. [4] Gilbert Strang. Linear Algebra and Its Applications. Brooks Cole, February 1988.

Brian’s Radix-2 FFT Function for Octave function out = briansfftrad2(in) N = length(in); switch (N) case 1 out = in; case 2 # NO MULTIPLICATIONS NECESSARY FOR LENGTH 2 out = [in(1) + in(2), in(1) - in(2)]; otherwise evenfft = briansfftrad2(in(1:2:N)); wN = exp(-2*pi*i/N) .^ (0:(N/2-1)); wNoddfft = wN .* briansfftrad2(in(2:2:N)); # EVENS, ODDS out = [evenfft + wNoddfft, evenfft - wNoddfft]; end end