Journal of Signal Processing Systems 51, 245–256, 2008
) 2007 Springer Science + Business Media, LLC Manufactured in The United States.
DOI: 10.1007/s11265-007-0063-8
An Area-Efficient Design of Variable-Length Fast Fourier Transform Processor SHUENN-SHYANG WANG AND CHIEN-SUNG LI Department of Electrical Engineering, Tatung University, 40 Sec. 3, Chungshan N. Road, Taipei, 103 Taiwan, Republic of China
Received: 13 September 2006; Revised: 28 January 2007; Accepted: 17 March 2007
Abstract. Fast Fourier transform (FFT) plays an important role in the orthogonal frequency division multiplexing (OFDM) communication systems. In this paper, we propose an area-efficient design of variablelength FFT processor which can perform various FFT lengths of 512/1,024/2,048/4,096/8,192 points used in OFDM-based communication systems, such as digital audio broadcasting (DAB), digital video broadcastingterrestrial (DVB-T) and digital video broadcasting-handheld (DVB-H). To reduce computational complexity and chip area, we develop a new variable-length FFT architecture by devising a mixed-radix algorithm that consist of radix-2, radix-22 and radix-2/4/8 algorithms and optimizing the realization by substructure sharing. Based on this architecture, an area-efficient design of variable-length FFT processor is presented. By synthesized using the UMC 0.18 mm process, the area of the processor is 2.9 mm2 and the 8,192-point FFT can be performed correctly up to 50 MHz with power consumption 823 mW under a 1.8 V supply voltage.
Keywords: 1.
variable length FFT, Fast Fourier Transform, OFDM, substructure sharing
Introduction
Due to the advanced VLSI technology, Fast Fourier Transform (FFT) has been applied to wide field of digital signal processing (DSP) applications. For modern communication systems, the most important FFT application is the orthogonal frequency division multiplexing (OFDM) [1–13]. OFDM is an advanced modulation technique that effectively expands channel utilization and reduces inter-symbol interference (ISI) as well as inter-carrier interference (ICI) caused by multipath effect. It is mainly used in digital audio broadcasting (DAB), digital video broadcastingterrestrial (DVB-T) and digital video broadcastinghandheld (DVB-H). The key specifications of some OFDM-based communication standards are shown in Table 1 [4].
Recently, with the increased IC design resources in chip technology, the computation of complex algorithms, such as the FFT, can be implemented on a chip [14–20]. Especially, it is desirable to develop a single chip that can operate for different length FFT to meet the requirement of various OFDM communication standards. As a result of complex control, the variable-length FFT (VL-FFT) processor may have little extra hardware cost compared with the fixed length FFT processor [14–17]. In this paper, we propose an area-efficient design of variable-length FFT processor which can perform various FFT lengths of 512/1,024/2,048/4,096/8,192 points. To reduce computational complexity and chip area, we develop a new variable-length FFT architecture by devising a mixed-radix algorithm that consist of radix-2, radix-22 and radix-2/4/8 algorithms and
Wang and Li
246
Table 1. systems.
FFT sizes and sampling rates needed in various OFDM
Communication system
FFT size (sampling rate)
802.11 a
64 (20 MHz)
x [ n + N / 2]
DAB
2,048 (2 MHz), 1,024 (2 MHz), 512 (2 MHz), 256 (2 MHz)
Figure 2.
DVB-T
8,192 (8 MHz), 2,048 (8 MHz)
DVB-H
4,096 (8 MHz)
ADSL
512 (2.2 MHz)
VDSL
8,192 (34.5 MHz), 4,096 (17.3 MHz), 2,048 (8.6 MHz), 1,024 (4.3 MHz), 512 (2.2 MHz)
2.
optimizing the realization by substructure sharing. Based on this architecture, an area-efficient design of variable-length FFT processor is presented. By synthesized using the UMC 0.18 mm process, the area of the processor is 2.9 mm2 and the 8,192-point FFT can be performed correctly up to 50 MHz with power consumption 823 mW under a 1.8 V supply voltage. The rest of this paper is organized as follows. In Section 2, a brief review of radix-2, radix-22 and radix-2/4/8 FFT algorithms is described. Section 3 proposed a new VL-FFT architecture and compared it with other architectures. In Section 4, the detail design of our VL-FFT processor is presented and the simulated results are given. Finally, the conclusion is given in Section 5.
g[n]
x[n]
g[n]
h[n]
h[ n]WNn
WNn
-1
The radix-2 butterfly graph.
The FFT Algorithms
This section describes the mathematical basis and some FFT algorithms which are applied to develop our VL-FFT processor. An N-point discrete Fourier transform of a sequence x[n] is defined as X ½k ¼
N 1 X
x½nWNnk
; k ¼ 0; 1; . . . N 1
ð1Þ
n¼0 nk j2nk=N ¼ where 2nk t he twiddle 2nk factor WN ¼ e cos N j sin N . In general, FFT algorithms are derived by taking advantage of the symmetry properties of twiddle factor as shown in Fig. 1.[12]
2.1. Radix-2 DIF FFT Algorithm The radix-2 DIF (decimation-in-frequency) FFT algorithm is obtained by decomposing the output frequency sequence X[k] into even-numbered points and odd-numbered points [12]. By separating X[k] into 2r and 2r+1, we obtain 2 1 X N
X½2r ¼
WN3N / 4
g½nW Nrn
n¼0
ð2Þ
2
WN7 N / 8
W N5 N / 8
2 1 X N
X½2r þ 1 ¼
n¼0
WN0
WNN / 2
h½nWNn W Nrn 2
x[n]
WN3N / 8
W NN / 8 N /4 N
W Figure 1.
The twiddle-factor of an N-point DFT.
x[n + N / 2] Figure 3.
ð3Þ
g[n]
n N
W
The simplified radix-2 butterfly graph.
h[ n]WNn
An Area-Efficient Design of Variable-Length Fast Fourier Transform Processor
x[n]
x[ n +
N ] 4
x[ n +
N ] 2
x[ n +
s1 = 0 s1 = 0 s1 = 1
s1 = 1
3N ] 4
Figure 4.
0 s2 = 0 WN s2 = 1
W
W
n N
s 2 = 0 W N2 n s2 = 1
1 4
W
3n N
247
X [n] X [n +
N ] 2
X [n +
N ] 4
X [n +
3N ] 4
2
The radix-2 butterfly graph.
wh e r e g½n ¼ x½n þ x n þ N2 a n d h½n ¼ x½n x n þ N2 . We can combine the Eq. (2) and the Eq. (3) to form a butterfly graph shown in Fig. 2 which can be simplifiedly represented as shown in Fig. 3.
Figure 6. a Variable-length FFT structure in [4]. b Variable length FFT structure in [5].
recursively decimating the frequency series into four subsets. By substituting k by 4r+2s2+s1, it follows from Eq. (1) that
2.2. Radix-22 DIF FFT Algorithm 2
The reason to develop the radix-2 algorithm is that the number of the nontrivial multiplications can be further reduced in implementation [2]. The radix-22 algorithm is characterized with the same multiplication complexity as the radix-4 algorithm but still retains the radix-2 butterfly structure. The radix-22 DIF FFT algorithm can be derived by
PE 1
PE 2
N ] 8 2N x[ n + ] 8 3N x[ n + ] 8
PE 3
¼
n¼0
s1
x½n þ ð1Þ x n þ
s2 s1 N 2 þð1Þ ðjÞ
nð2s2 þs1 Þ nr x n þ N4 þ ð1Þs1 x n þ 3N WN 4 WN 4
ð4Þ
X[8s] 2.3. Radix-2/4/8 DIF FFT Algorithm
X[8s+2]
WN2n
W41
4N ] 8 5N x[ n + ] 8
WN6n WNn
W81
1 4
W
WN5n WN3n
W41
radix-2/8 Figure 5.
W N0 WN4n
x[ n +
6N ] 8 7N x[ n + ] 8
4
P
N 4 1
X[8s+4]
x[ n +
x[ n +
P
This yields a butterfly graph shown in Fig. 4. It is seen that the radix-22PE (processing element) consists of the radix-22PE1 and the radix-22PE2.
PE x[n]
N 4 1
f x½n þ x n þ N4 W42s2 þs1 þ x n þ N2 W44s2 s1 n¼0 6s2 þ3s1 nð2s2 þs1 Þ nr WN þx n þ 3N WN 4 W4
X½4r þ 2s2 þ s1 ¼
3 8
W
radix-2/4
The radix-2/4/8 butterfly graph.
7n N
X[8s+6] X[8s+1] X[8s+5] X[8s+3] X[8s+7]
The radix-2/4/8 FFT algorithm is developed by taking the advantages of radix-2, radix-2/4 and radix-2/8 algorithms [3]. Figure 5 shows the butterfly graph for the radix-2/4/8 FFT algorithm. The processing element (PE) of the radix-2/4/8 butterfly consists of three processing elements: PE1, PE2 and PE3. The radix-2/4/8 algorithm is not only regular but also possesses lower complexity, thus it is suitable for VLSI implementation. 3.
The Proposed VL-FFT Architecture
W
radix-2
In this section, we propose a new variable-length FFT (VL-FFT) architecture that operates on variable data length of 512, 1,024, 2,048, 4,096, 8,192 points
248
Wang and Li
Radix-4 PE
Radix-4 PE
Radix-4 PE
2048
512
128
2048
512
128
2048
512
128
FFT Butterfly
ROM 1
FFT Butterfly
Mux
Radix-4 PE
FFT Butterfly
ROM 2
Radix-4 PE
Radix-4 PE
32
8
2
32
8
2
32
8
2
ROM 4
FFT Butterfly
ROM 3
FFT Butterfly
ROM 5
FFT Butterfly
Radix-2 PE 1
FFT Butterfly
a Radix-2 PE
Radix-2 PE
4096
Radix-2 PE 1024
2048 ROM 2
ROM 1
MUX 1
512 ROM 3
FFT
FFT Butterfly
Radix-2 PE
ROM 4
FFT
Butterfly
MUX 2
Butterfly
Radix-2/4/8 PE
FFT MUX 3
Butterfly
MUX 4
Radix-2/4/8 PE
256
128
64
32
16
8
FFT Butterfly
FFT
FFT
FFT
FFT
FFT
Butterfly
Butterfly
Butterfly
Butterfly
Butterfly
ROM 5
ROM 6
Radix-2/4/8 PE 4
2
1
FFT
FFT
FFT
Butterfly
Butterfly
Butterfly
b Figure 7.
a Variable-length FFT pipelined architecture in [4]. b Variable-length FFT pipelined architecture in [5].
An Area-Efficient Design of Variable-Length Fast Fourier Transform Processor Stage 1
Stage 2
Stage 3
Stage 4
512-point
Stage
1024-point
Radix-2 5/6/7 PE Radix − 2 2 Radix2/4/8 PE PE 2
2048-point
Radix − 22 PE1
4096-point
Radix-2/4/8 Radix-2/4/8 Radix-2/4/8 PE1 PE2 PE3
8192-point
Radix-2 PE
Stage 8/9/10
Stage 11/12/13
Radix2/4/8 PE
Radix2/4/8 PE
Radix-2/4/8 Radix-2/4/8 Radix-2/4/8 PE1 PE2 PE3 Mixed-Radix Mixed-Radix Mixed-Radix PE 1 PE 2 PE 3
Mixed-Radix PE
Figure 8.
The proposed variable-length FFT structure.
Stage 2
Stage 3
Stage 4
512-point
0 0 W1024 = W4096
1024-point
n 4n W1024 = W4096
0 0 W2048 = W4096
2n 4n W2048 = W4096
2048-point
n 2n W2048 = W4096
W41
3n 6n W2048 = W4096
0 W4096
4n W4096
4096-point
2n W4096
W41
6n W4096 n W4096
W81
8192-point
5n W4096
W41 W41
Mixed-Radix PE 1
3n W4096
W83
7n W4096
Mixed-Radix PE 2
Mixed-Radix PE 3
Mixed-Radix PE radix-2/4/8 Figure 9.
The SFG of mixed-radix PE.
radix- 2
2
radix-2
249
250
Wang and Li Stage 2
Stage 3
Stage 4 S4=1: ON S4=0: OFF
S3=1: ON
0 W4096 4n W4096
S3=0: OFF
S5=1: ON
S1=1: ON
2n W4096
S5=0: OFF
S1=0: OFF
W
S2=1: ON
6n W4096
1 4
S6=1: ON
S2=0: OFF
S6=0: OFF
S3=1: ON S3=0: OFF
W
1 4
W
1 4
W
n W4096 5n W4096
1 8
S6=1: ON
3n W4096
S6=0: OFF 7n W4096
3 8
W
Mixed-Radix PE1
Mixed-Radix PE2
Mixed-Radix PE3
Mixed-Radix PE Figure 10.
Detailed structure of mixed-radix PE.
which are used in the DAB, DVB-T and DVB-H systems.
It is known that the split-radix algorithm and the high-radix algorithm have lower computational complexity. But the high-radix algorithm cannot be applied to all sizes of FFT for the OFDM applications. It must insert some radix-2 stages to deal with the 2n-point or 4n-point FFT. Such limitation does not exist in the split-radix FFT algorithm, but compared with fixed-radix FFT algorithm it is more difficult to implement in hardware due to the
Table 2.
Action of switches in different modes.
FFT length Switch
512
1,024
2,048
4,096
8,192
S1
0
0
1
1
1
S2
0
0
0
1
1
S3
0
0
0
1
1
S4
0
1
1
1
1
S5
0
0
1
1
1
S6
0
0
0
1
1
irregularity of its SFG (single flow graph). In view of these facts, we devise a mixed-radix FFT algorithm that mixes with radix-2, radix-22 and radix-2/4/8 algorithms to develop our VL-FFT architecture. By comparing three kinds of popular pipeline architectures for implementing FFT algorithms [3–6], it is found that single-path delay feedback (SDF) architecture has less hardware requirement and higher utilization rate. So, we adopt the SDF architecture to design our pipeline VL-FFT architecture. In [4, 5], the variable-length FFT structures are suggested as shown in Fig. 6a, b and their detailed architectures are shown in Fig. 7a, b. It is seen that Fig. 6a consisting radix-4 PE and radix-2 PE can deal with only the FFT of lengths 2,048 and 8,192 points, and Fig. 6b consisting of radix-2 PE and radix-2/4/8 PE can deal with the FFT of lengths 512, 1,024, 2,048, 4,096 and 8,192 points. From Fig. 7b, we can see that four multiplexers (MUXs) are needed for the FFT pipelined architecture to deal with varying lengths of 512, 1,024, 2,048, 4,096, 8,192 points. If input data enter the first stage of FFT architecture, it performs an 8,192-point FFT. On the
An Area-Efficient Design of Variable-Length Fast Fourier Transform Processor
Figure 11.
The proposed VL-FFT pipelined architecture.
other hand, if input data enter the second stage by bypassing the first stage, it performs an 4,096-point FFT. By the same analysis for the rest of stages, it is seen that the VL-FFT architecture of Fig. 7b can perform 2,048, 1,024 and 512-point FFT by proper switching using the multiplexers. Now, we propose a new VL-FFT structure shown in Fig. 8. By cascading a radix-2 PE, a mixed-radix PE and three radix-2/4/8 PE, the VL-FFT structure
Table 3.
251
can deal with all the FFT computation of 512, 1,024, 2,048, 4,096, 8,192 points. We use the mixed-radix PE to perform radix-2, radix-22 and radix-2/4/8 FFT algorithms in different modes, as shown in Fig. 9. By proper switching, the mixed-radix PE1 can do either radix-2 2 PE1 or radix-2/4/8 PE1 in stage 2. Similarly, the mixed-radix PE3 can do either radix2 PE or radix-22PE2 or radix-2/4/8 PE3 in stage 4. 0 0 Based on the observation that W1;024 ¼ W4;096 ; W n1;024 ¼
Comparison of VL-FFT architectures.
Characteristics
Wang [4]
Lin [5]
Proposed
FFT size (N)
2,048/8,192
512/1,024/2,048/4,096/8,192
512/1,024/2,048/4,096/8,192
Algorithm
Radix-2
Radix-2
Radix-2
Radix-4
Radix-2/4/8
Radix-22 Radix-2/4/8
Architecture
SDF
SDF
SDF
Data memory
Nj1
Nj1
N-1
N=512
824
824
N=1,024
2,158
2,158
5,338
4,826
Number of nontrivial twiddle factor multiplications
N=2,048
6,146
N=4,096 N=8,192
30,722
12,722
10,160
29,538
24,430
252
Wang and Li 4.
The Design of VL-FFT Processor
In the following, we present the detailed realization of the processing elements and components in the proposed VL-FFT architecture to design VL-FFT processor. 4.1. Radix-2 PE Figure 12.
The radix-2 PE.
0 0 2n 4n n W 4n 4;096 ; W 2;048 ¼ W 4;096 ; W 2;048 ¼ W 4;096 ; W 2;048 ¼ 2n 3n 6n W 4;096 and W2048 ¼ W4096 , we can reduce the
hardware complexity by the substructure sharing for radix-22PE1 and radix-2/4/8 PE1 in stage 2, and by the substructure sharing for radix-2 PE, radix-22PE2 and radix-2/4/8 PE3. The operation of the mixedradix PE can be effectively realized by proper switching to share the same substructure as shown in Fig. 10. Table 2 shows the action of switches in different mode to achieve different length FFT. By this way, we can reduce the complexity of multiplications and the number of complex multipliers. Summing up the above analysis, we develop our VL-FFT pipelined architecture shown in Fig. 11, which can perform different lengths of 512, 1,024, 2,048, 4,096 and 8,192 points. The multiplexers are used to switch for different lengths of FFT and the number of delay elements needed decreases by half in every stage. In order to evaluate efficiency of the proposed VLFFT architecture, we compare with other architectures in Table 3. It is apparent that compared with [4] and [5] our VL-FFT architecture requires less nontrivial complex multiplications.
Radix-2/4/ 8 PE 1
FFT Butterfly
Figure 13.
The radix-2/4/8 PE.
Figure 12 shows the realization of the radix-2 PE which is composed of a FFT butterfly, delay elements, a ROM storing twiddle factors and a complex multiplier. 4.2. Radix-2/4/8 PE Figure 13 shows the realization of the radix-2/4/8 PE which contains radix-2/4/8 PE 1, radix-2/4/8 PE 2 and radix-2/4/8 PE 3. It is realized by three FFT butterflies, delay elements, a constant multiplier, a ROM storing twiddle factors and a complex multiplier. 4.3. Mixed-radix PE In view of Fig. 14, the mixed-radix PE consists of mixed-radix PE 1, mixed-radix PE 2 and mixedradix PE 3. It is seen that a multiplexer (MUX 2) is used to select the sizes of delay elements in the mixed-radix PE 1. By multiplexing the output labeled A (B) to the input of FFT butterfly, the mixed-radix PE 1 can perform radix-22 PE 1 (radix2/4/8 PE 1). Because of sharing the same twiddle factors, the mixed-radix PE 3 can efficiently do radix-2 PE or radix-22PE 2 or radix-2/4/8 PE 3 as depicted in Fig. 9.
Radix-2/4/ 8 PE 2
FFT Butterfly
Radix-2/4/ 8 PE 3
FFT Butterfly
An Area-Efficient Design of Variable-Length Fast Fourier Transform Processor Mixed-Radix PE 2
Mixed-Radix PE 1 1024
Mixed-Radix PE 3
1024 A
B
253
1024 1 or -j
MUX 2
512 Constant multiplier
FFT Butterfly
ROM
FFT Butterfly
FFT Butterfly
Complex multiplier
MUX 4 MUX 3
Figure 14.
The mixed-radix PE.
4.4. FFT Butterfly There are FFT butterflies in every PE of the proposed architecture. Figure 15 shows the realization of the FFT butterfly in which the butterfly outputs a sum and a difference of two inputs at upper and lower branch, respectively. 4.5. Complex Multiplier There are three types of multiplications in the VLFFT architecture. One is the multiplication by jj, the second is the multiplication by a constant twiddle factor and the third is the multiplication by a complex twiddle factor. (a) Multiplication by jj: The multiplication by jj can be realized with no extra hardware cost by simply interchanging the real and imaginary part
of the product as shown in Fig. 16 based on Eq. (5). ða þ bjÞ ðjÞ ¼ b aj
(b) Multiplication by constant twiddle factor: The N=8 3N=8 complex multiplications with WN and WN can be respectively expressed in the following equations. N=8
ða þ jbÞ WN
ða þ jbÞ
3N=8 WN
pffiffiffi pffiffiffi 2 2 j ¼ ða þ jbÞ 2 2 pffiffiffi 2 ½ða þ bÞ þ jðb aÞ ¼ 2
ð6Þ
pffiffiffi pffiffiffi 2 2 ¼ ða þ jbÞ j 2 2 pffiffiffi 2 ½ðb aÞ jðb þ aÞ ¼ 2
ð7Þ
b
real part
−1 a Figure 15.
FFT butterfly.
ð5Þ
Figure 16.
imaginary part Block diagram of multiplication by - j.
Wang and Li
254
a b
2 2
+ -1
2 2
+
real part
a b
-1
+ −
imaginary part
+
a Figure 17. 3N=8 WN .
2 2
2 2
real part
imaginary part
b N=8
a Multiplication with WN . b Multiplication with
Thus, we can realize the complex multiplications with two real additions and two real multiplications as shown in Fig. 17a, b. pffiffi 2 Besides, the can be pffiffi real multiplication by 2 2 1 3 rewritten as 2 ¼ 0:70710678 ¼ 2 þ 2 þ 24 þ 26 þp2ffiffi 8 þ 29. This implies that the multiplication with 22 can be replaced by shift adders and five real adders as depicted in Fig. 18. (c) Multiplication by complex twiddle factor: Based on Eq. (8), one complex multiplier can be realized by four real multiplications and two real additions as shown in Fig. 19. ða þ bjÞðc þ djÞ ¼ ðac bd Þ þ jðbc þ ad Þ
ð8Þ
This complex multiplier occupies large chip area in hardware implementations. Fortunately, this complex multiplier can be realized efficiently with three real
Figure 18.
Implementation hardware of multiplication with
pffiffi 2 2 .
Figure 19. Complex multiplier with four real multiplications and two real additions.
multiplications and five real additions as shown in Fig. 20 based on the Eq. (9).
ða þ bjÞðc þ djÞ ¼ fcða bÞ þ bðc dÞg þ jfdða þ bÞ þ bðc d Þg ð9Þ
4.6. ROM Table All the required twiddle factors are stored in ROM table in advance for the FFT computations. To reduce the size of ROM table, only the cosine and sine values of the angles between 0 and p/4 is stored. Based on the symmetry, the cosine and sine values of the other angles can be generated from stored samples in cosine and sine tables. The proposed VL-FFT processor has been modeled by Verilog HDL and synthesized with the
Figure 20. Complex multiplier with three real multiplications and five real additions.
An Area-Efficient Design of Variable-Length Fast Fourier Transform Processor
Table 4.
consumption. Because these FFT designs use different CMOS technology and the FFT sizes are also different, we adopt the normalized area to evaluate the cost of silicon [4].
The synthesis results of our VL-FFT design.
Characteristics
Values
Process
UMC 0.18 mm
FFT size
512/1,024/2,048/ 4,096/8,192
Word length
12 bits
Voltage
1.8 V
Maximum speed
Area ð10Þ ðTechnology=0:18mÞ2 It is seen that our VL-FFT design is area-efficient since it requires least normalized area. Normalized area ¼
50 MHz Shift register based 19 mm2
Area
SRAM based Power (mW)
2.9 mm2
5.
Shift register based 1.51 W (50 MHz) SRAM based
823 mW (50 MHz)
Conclusions
This paper proposes an area-efficient design of VLFFT processor. Our proposed VL-FFT processor is programmable in selecting different length FFT of size 512/1,024/2,048/4,096/8,192 which is suitable for various OFDM-based communication systems, such as digital audio broadcasting (DAB), digital video broadcasting-terrestrial (DVB-T) and digital video broadcasting- handheld (DVB-H). By synthesized using the UMC 0.18 mm process, the area of our VLFFT processor is 2.9 mm2 and the 8,192-point FFT can be performed correctly up to 50 MHz with power consumption 823 mW under a 1.8 V supply voltage. Our VL-FFT processor is area-efficient because it reduces the number of nontrivial multipliers and optimizes the realization by substructure sharing. Our future research is to achieve the low power design by other low power techniques.
Synopsys cell library. Two kinds of memory model are used: shift register and SRAM. Table 4 shows the synthesis results of our VL-FFT design. It is seen that the SRAM based design is better than shift register based design. The area of the SRAM based design is 2.9 mm2 and it can function correctly up to 50 MHz with power consumption 823 mW under a 1.8 V supply voltage. Using SRAM we can achieve the design with low power consumption instead of using shift register. The comparison with other FFT designs is given in Table 5, including FFT size, word length, algorithm, FFT architecture, technology process, supply voltage, area, clock rate and power
Table 5.
255
Comparison with other FFT designs.
Characteristics
Bidet [10]
He [2]
Jia [3]
Wang [4]
Lin [5]
Proposed Design
FFT size
8,192
1,024
8,192
2,048/8,192
512/1,024/2,048/4,096/8,192 512/1,024/2,048/ 4,096/8,192
Word length, bit
12
12
12
8
12
12
Algorithm
Radix-2 Radix-4 Radix-22
Radix-2/4/8 Radix-2 Radix-4
Radix-2 Radix-2/4/8
Radix-2 Radix22 Radix-2/4/8
FFT architecture
SDF
SDF
SDF
SDF
SDF
SDF
Process (mm)
0.5
0.5
0.6
0.35
0.35
0.18
Voltage (V)
3.3
3.3
3.3
3.3
3.3
1.8
Area (mm2)
100
40
107
33.75
13.05
2.9
Normalized area (mm2)
12.96
5.18
9.63
8.93
3.45
2.9
Clock rate (MHz)
20
20
20
16
45.45
Power (mW) (N=8,192)
600
650
535
20
50
279
823
Power (mW) (N=2,048)
640
198
581
Power (mW) (N=1,024)
480
160
466.5
256
Wang and Li
References 1. J.W. Cooley and J.W. Tukey, BAn Algorithm for Machine Computation of Complex Fourier Series,^ Math. Computation, vol. 19, 1965, pp. 297–301, April. 2. S. He and M. Torkelson, BDesigning Pipeline FFT Processor for OFDM (de)Modulation,^ IEEE Sig Syst Electron, 1998, pp. 257–262, Oct. 3. L. Jia, Y. Gao, J. Isoaho and H. Tenhunen, BA New VLSIOriented FFT Algorithm and Implementation,^ IEEE ASIC Conference, 1998, pp. 337–341. 4. C.C. Wang, J.M. Huang, and H.C. Cheng, BA 2K/8K Mode Small-Area FFT Processor for OFDM Demodulation of DVBT Receivers, ^ IEEE Trans. Consum. Electron., vol. 51, no. 1, 2005, pp. 28–32, February. 5. Y.T. Lin, P.Y. Tsai, and T.D. Chiueh,BLow-Power Variable-Length Fast Fourier Transform Processor,^ IEE Proc.Comput. Digit. Tech., vol. 152, no. 4, 2005, pp. 499–506, July. 6. S. He and M. Torkelson, ^A New Approach to Pipeline FFT Processor ,^ IEEE Parallel Processing Symposium, 1996, pp. 766–770, April. 7. K. K. Parhi, BVLSI Digital Signal Processing Systems Design and Implementation,^ Wiley, 1999. 8. W. Li and L. Wanhammar, BA Pipeline FFT Processor,^ IEEE Signal Processing Systems, 1999, pp. 654–662, Oct. 9. L. Jia, B. Li, Y. Gao and H. Tenhunen, BImplementation of A Low Power 128-Point FFT,^ IEEE Solid-State and Integrated Circuit Technology, 1998, pp. 369–372, Oct. 10. E. Bidet, D. Castelain, C. Joanblanq, and P. Senn, BA Fast Single-Chip Implementation of 8192 Complex Point FFT,^ IEEE J. Solid-State Circuits, vol. 30, no. 3, 1995, March. 11. B.M. Bass, BA Low-Power, High-Performance, 1024-Point FFT Processor,^ IEEE J. Solid-State Circuits, vol. 34, no. 3, 1999, March. 12. A.V. Oppenheim R.W. Schafer, BDiscrete-Time Signal Processing,^ Prentice-Hall, 1999. 13. P. Duhamel and H. Hollmann, BSplit-radix FFT algorithm,^ Electron. Lett., vol. 20, no.1, 1984, pp. 14–16, Jan. 14. S.Y. Park, N.I. Cho, S.U. Lee, K. Kim, and J. Oh, BDesign of 2K/4K/8K-point FFT processor based on CORDIC algorithm in OFDM receiver,^ IEEE Communications, Computers and Signal Processing, vol. 2, 2001, pp. 457–460, Aug. 15. C.P. Hung, S.G. Chen, and K.L. Chen, BDesign of an efficient variable-length FFT processor,^ IEEE ISCAS, vol. 2, 2004, pp. 833–836, May. 16. A. Sadat and W.B. Mikhael, BFast Fourier Transform for high speed OFDM wireless multimedia system,^ Proceedings of the 44th IEEE 2001 Midwest Symposium on Circuits and Systems, vol. 2, 2001, pp. 938–942, 14–17 Aug. 17. M.B. Bevan, BA low-power, high-performance, 1024-point FFT processor,^ IEEE J. Solid-State Circuits, vol. 34, 1999, pp. 380–387, March.
18. A.M. El-Khashab and E.E. Swartzlander, Jr., ^The modular pipeline fast Fourier transform algorithm and architecture,^ Conference Record of the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, vol.2, 2003, pp.1463–1467. 19. S. Yu and E.E. Swartzlander, Jr.,^A pipelined architecture for the multidimensional DFT,^ IEEE Trans. Signal Process., vol. 49, no. 9, 2001, pp. 2096–2102. 20. A.M.El-Khashab and E.E. Swartzlander, Jr.,^ An architecture for a radix-4 modular pipeline fast Fourier transform,^ Proceedings of the 14th International Conference on Application-Specific Systems, Architectures and Processors, 2003.
Shuenn-Shyang Wang received the M.S. and Ph.D degree in Electrical Engineering from Tatung Institute of Technology, Taipei, in 1985 and 1987, respectively. He was an Instructor at the Military Police School, Taipei from 1987 to 1989. Since 1989, he has been with the Department of Electrical Engineering, Tatung University, Taipei, where he is currently a Professor. His research interests include digital signal processing, image/video processing, VLSI implementation for DSP and network security.
Chien-Sung Li received the B.Sc. degree in Electrical Engineering from Chung Hua University, Hsinchu, Taiwan, in 2002 and the M.S. degree in Electrical Engineering from Tatung University, Taipei, in 2005. His research interests include digital signal processing and VLSI implementation for DSP.