An Area-Efficient Design of Variable-Length Fast Fourier Transform Processor

Journal of Signal Processing Systems 51, 245–256, 2008 ) 2007 Springer Science + Business Media, LLC Manufactured in The United States. DOI: 10.1007...

Author: Corey Fisher

5 downloads 1 Views 451KB Size

Report

Download PDF

Recommend Documents

A versatile VLSI fast Fourier transform processor

The Fast Fourier Transform

Fast Polar Fourier Transform

The Fast Fourier Transform

Fast Fourier Transform

fft_on_chip Fast Fourier Transform

Sparse Fast Fourier Transform

Fast Fourier transform telescope

Integer Fast Fourier Transform

FAST FOURIER TRANSFORM

CoreFFT Fast Fourier Transform

Parallel Fast Fourier Transform

Fast Fourier Transform

THE FAST FOURIER TRANSFORM

The Fast Fourier Transform

fft Fast Fourier Transform

The Fast Fourier Transform

OpenCL Fast Fourier Transform

An Equivariant Fast Fourier Transform Algorithm

Low-power variable-length fast Fourier transform processor

Fourier Series and the Fast Fourier Transform

A Fast Fourier Transform Compiler

Journal of Signal Processing Systems 51, 245–256, 2008

) 2007 Springer Science + Business Media, LLC Manufactured in The United States.

DOI: 10.1007/s11265-007-0063-8

An Area-Efficient Design of Variable-Length Fast Fourier Transform Processor SHUENN-SHYANG WANG AND CHIEN-SUNG LI Department of Electrical Engineering, Tatung University, 40 Sec. 3, Chungshan N. Road, Taipei, 103 Taiwan, Republic of China

Received: 13 September 2006; Revised: 28 January 2007; Accepted: 17 March 2007

Abstract. Fast Fourier transform (FFT) plays an important role in the orthogonal frequency division multiplexing (OFDM) communication systems. In this paper, we propose an area-efficient design of variablelength FFT processor which can perform various FFT lengths of 512/1,024/2,048/4,096/8,192 points used in OFDM-based communication systems, such as digital audio broadcasting (DAB), digital video broadcastingterrestrial (DVB-T) and digital video broadcasting-handheld (DVB-H). To reduce computational complexity and chip area, we develop a new variable-length FFT architecture by devising a mixed-radix algorithm that consist of radix-2, radix-22 and radix-2/4/8 algorithms and optimizing the realization by substructure sharing. Based on this architecture, an area-efficient design of variable-length FFT processor is presented. By synthesized using the UMC 0.18 mm process, the area of the processor is 2.9 mm2 and the 8,192-point FFT can be performed correctly up to 50 MHz with power consumption 823 mW under a 1.8 V supply voltage.

Keywords: 1.

variable length FFT, Fast Fourier Transform, OFDM, substructure sharing

Introduction

Due to the advanced VLSI technology, Fast Fourier Transform (FFT) has been applied to wide field of digital signal processing (DSP) applications. For modern communication systems, the most important FFT application is the orthogonal frequency division multiplexing (OFDM) [1–13]. OFDM is an advanced modulation technique that effectively expands channel utilization and reduces inter-symbol interference (ISI) as well as inter-carrier interference (ICI) caused by multipath effect. It is mainly used in digital audio broadcasting (DAB), digital video broadcastingterrestrial (DVB-T) and digital video broadcastinghandheld (DVB-H). The key specifications of some OFDM-based communication standards are shown in Table 1 [4].

Recently, with the increased IC design resources in chip technology, the computation of complex algorithms, such as the FFT, can be implemented on a chip [14–20]. Especially, it is desirable to develop a single chip that can operate for different length FFT to meet the requirement of various OFDM communication standards. As a result of complex control, the variable-length FFT (VL-FFT) processor may have little extra hardware cost compared with the fixed length FFT processor [14–17]. In this paper, we propose an area-efficient design of variable-length FFT processor which can perform various FFT lengths of 512/1,024/2,048/4,096/8,192 points. To reduce computational complexity and chip area, we develop a new variable-length FFT architecture by devising a mixed-radix algorithm that consist of radix-2, radix-22 and radix-2/4/8 algorithms and

Wang and Li

246

Table 1. systems.

FFT sizes and sampling rates needed in various OFDM

Communication system

FFT size (sampling rate)

802.11 a

64 (20 MHz)

x [ n + N / 2]

DAB

2,048 (2 MHz), 1,024 (2 MHz), 512 (2 MHz), 256 (2 MHz)

Figure 2.

DVB-T

8,192 (8 MHz), 2,048 (8 MHz)

DVB-H

4,096 (8 MHz)

ADSL

512 (2.2 MHz)

VDSL

8,192 (34.5 MHz), 4,096 (17.3 MHz), 2,048 (8.6 MHz), 1,024 (4.3 MHz), 512 (2.2 MHz)

2.

optimizing the realization by substructure sharing. Based on this architecture, an area-efficient design of variable-length FFT processor is presented. By synthesized using the UMC 0.18 mm process, the area of the processor is 2.9 mm2 and the 8,192-point FFT can be performed correctly up to 50 MHz with power consumption 823 mW under a 1.8 V supply voltage. The rest of this paper is organized as follows. In Section 2, a brief review of radix-2, radix-22 and radix-2/4/8 FFT algorithms is described. Section 3 proposed a new VL-FFT architecture and compared it with other architectures. In Section 4, the detail design of our VL-FFT processor is presented and the simulated results are given. Finally, the conclusion is given in Section 5.

g[n]

x[n]

g[n]

h[n]

h[ n]WNn

WNn

-1

The radix-2 butterfly graph.

The FFT Algorithms

This section describes the mathematical basis and some FFT algorithms which are applied to develop our VL-FFT processor. An N-point discrete Fourier transform of a sequence x[n] is defined as X ½k ¼

N 1 X

x½nWNnk

; k ¼ 0; 1; . . . N 1

ð1Þ

n¼0 nk j2nk=N ¼ where 2nk t he twiddle 2nk factor WN ¼ e cos N j sin N . In general, FFT algorithms are derived by taking advantage of the symmetry properties of twiddle factor as shown in Fig. 1.[12]

2.1. Radix-2 DIF FFT Algorithm The radix-2 DIF (decimation-in-frequency) FFT algorithm is obtained by decomposing the output frequency sequence X[k] into even-numbered points and odd-numbered points [12]. By separating X[k] into 2r and 2r+1, we obtain 2 1 X N

X½2r ¼

WN3N / 4

g½nW Nrn

n¼0

ð2Þ

2

WN7 N / 8

W N5 N / 8

2 1 X N

X½2r þ 1 ¼

n¼0

WN0

WNN / 2

h½nWNn W Nrn 2

x[n]

WN3N / 8

W NN / 8 N /4 N

W Figure 1.

The twiddle-factor of an N-point DFT.

x[n + N / 2] Figure 3.

ð3Þ

g[n]

n N

W

The simplified radix-2 butterfly graph.

h[ n]WNn

An Area-Efficient Design of Variable-Length Fast Fourier Transform Processor

x[n]

x[ n +

N ] 4

x[ n +

N ] 2

x[ n +

s1 = 0 s1 = 0 s1 = 1

s1 = 1

3N ] 4

Figure 4.

0 s2 = 0 WN s2 = 1

W

W

n N

s 2 = 0 W N2 n s2 = 1

1 4

W

3n N

247

X [n] X [n +

N ] 2

X [n +

N ] 4

X [n +

3N ] 4

2

The radix-2 butterfly graph.

wh e r e g½n ¼ x½n þ x n þ N2 a n d h½n ¼ x½n x n þ N2 . We can combine the Eq. (2) and the Eq. (3) to form a butterfly graph shown in Fig. 2 which can be simplifiedly represented as shown in Fig. 3.

Figure 6. a Variable-length FFT structure in [4]. b Variable length FFT structure in [5].

recursively decimating the frequency series into four subsets. By substituting k by 4r+2s2+s1, it follows from Eq. (1) that

2.2. Radix-22 DIF FFT Algorithm 2

The reason to develop the radix-2 algorithm is that the number of the nontrivial multiplications can be further reduced in implementation [2]. The radix-22 algorithm is characterized with the same multiplication complexity as the radix-4 algorithm but still retains the radix-2 butterfly structure. The radix-22 DIF FFT algorithm can be derived by

PE 1

PE 2

N ] 8 2N x[ n + ] 8 3N x[ n + ] 8

PE 3

¼

n¼0

s1

x½n þ ð1Þ x n þ

s2 s1 N 2 þð1Þ ðjÞ

nð2s2 þs1 Þ nr x n þ N4 þ ð1Þs1 x n þ 3N WN 4 WN 4

ð4Þ

X[8s] 2.3. Radix-2/4/8 DIF FFT Algorithm

X[8s+2]

WN2n

W41

4N ] 8 5N x[ n + ] 8

WN6n WNn

W81

1 4

W

WN5n WN3n

W41

radix-2/8 Figure 5.

W N0 WN4n

x[ n +

6N ] 8 7N x[ n + ] 8

4

P

N 4 1

X[8s+4]

x[ n +

x[ n +

P

This yields a butterfly graph shown in Fig. 4. It is seen that the radix-22PE (processing element) consists of the radix-22PE1 and the radix-22PE2.

PE x[n]

N 4 1

f x½n þ x n þ N4 W42s2 þs1 þ x n þ N2 W44s2 s1 n¼0 6s2 þ3s1 nð2s2 þs1 Þ nr WN þx n þ 3N WN 4 W4

X½4r þ 2s2 þ s1 ¼

3 8

W

radix-2/4

The radix-2/4/8 butterfly graph.

7n N

X[8s+6] X[8s+1] X[8s+5] X[8s+3] X[8s+7]

The radix-2/4/8 FFT algorithm is developed by taking the advantages of radix-2, radix-2/4 and radix-2/8 algorithms [3]. Figure 5 shows the butterfly graph for the radix-2/4/8 FFT algorithm. The processing element (PE) of the radix-2/4/8 butterfly consists of three processing elements: PE1, PE2 and PE3. The radix-2/4/8 algorithm is not only regular but also possesses lower complexity, thus it is suitable for VLSI implementation. 3.

The Proposed VL-FFT Architecture

W

radix-2

In this section, we propose a new variable-length FFT (VL-FFT) architecture that operates on variable data length of 512, 1,024, 2,048, 4,096, 8,192 points

248

Wang and Li

Radix-4 PE

Radix-4 PE

Radix-4 PE

2048

512

128

2048

512

128

2048

512

128

FFT Butterfly

ROM 1

FFT Butterfly

Mux

Radix-4 PE

FFT Butterfly

ROM 2

Radix-4 PE

Radix-4 PE

32

8

2

32

8

2

32

8

2

ROM 4

FFT Butterfly

ROM 3

FFT Butterfly

ROM 5

FFT Butterfly

Radix-2 PE 1

FFT Butterfly

a Radix-2 PE

Radix-2 PE

4096

Radix-2 PE 1024

2048 ROM 2

ROM 1

MUX 1

512 ROM 3

FFT

FFT Butterfly

Radix-2 PE

ROM 4

FFT

Butterfly

MUX 2

Butterfly

Radix-2/4/8 PE

FFT MUX 3

Butterfly

MUX 4

Radix-2/4/8 PE

256

128

64

32

16

8

FFT Butterfly

FFT

FFT

FFT

FFT

FFT

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

ROM 5

ROM 6

Radix-2/4/8 PE 4

2

1

FFT

FFT

FFT

Butterfly

Butterfly

Butterfly

b Figure 7.

a Variable-length FFT pipelined architecture in [4]. b Variable-length FFT pipelined architecture in [5].

An Area-Efficient Design of Variable-Length Fast Fourier Transform Processor Stage 1

Stage 2

Stage 3

Stage 4

512-point

Stage

1024-point

Radix-2 5/6/7 PE Radix − 2 2 Radix2/4/8 PE PE 2

2048-point

Radix − 22 PE1

4096-point

Radix-2/4/8 Radix-2/4/8 Radix-2/4/8 PE1 PE2 PE3

8192-point

Radix-2 PE

Stage 8/9/10

Stage 11/12/13

Radix2/4/8 PE

Radix2/4/8 PE

Radix-2/4/8 Radix-2/4/8 Radix-2/4/8 PE1 PE2 PE3 Mixed-Radix Mixed-Radix Mixed-Radix PE 1 PE 2 PE 3

Mixed-Radix PE

Figure 8.

The proposed variable-length FFT structure.

Stage 2

Stage 3

Stage 4

512-point

0 0 W1024 = W4096

1024-point

n 4n W1024 = W4096

0 0 W2048 = W4096

2n 4n W2048 = W4096

2048-point

n 2n W2048 = W4096

W41

3n 6n W2048 = W4096

0 W4096

4n W4096

4096-point

2n W4096

W41

6n W4096 n W4096

W81

8192-point

5n W4096

W41 W41

Mixed-Radix PE 1

3n W4096

W83

7n W4096

Mixed-Radix PE 2

Mixed-Radix PE 3

Mixed-Radix PE radix-2/4/8 Figure 9.

The SFG of mixed-radix PE.

radix- 2

2

radix-2

249

250

Wang and Li Stage 2

Stage 3

Stage 4 S4=1: ON S4=0: OFF

S3=1: ON

0 W4096 4n W4096

S3=0: OFF

S5=1: ON

S1=1: ON

2n W4096

S5=0: OFF

S1=0: OFF

W

S2=1: ON

6n W4096

1 4

S6=1: ON

S2=0: OFF

S6=0: OFF

S3=1: ON S3=0: OFF

W

1 4

W

1 4

W

n W4096 5n W4096

1 8

S6=1: ON

3n W4096

S6=0: OFF 7n W4096

3 8

W

Mixed-Radix PE1

Mixed-Radix PE2

Mixed-Radix PE3

Mixed-Radix PE Figure 10.

Detailed structure of mixed-radix PE.

which are used in the DAB, DVB-T and DVB-H systems.

It is known that the split-radix algorithm and the high-radix algorithm have lower computational complexity. But the high-radix algorithm cannot be applied to all sizes of FFT for the OFDM applications. It must insert some radix-2 stages to deal with the 2n-point or 4n-point FFT. Such limitation does not exist in the split-radix FFT algorithm, but compared with fixed-radix FFT algorithm it is more difficult to implement in hardware due to the

Table 2.

Action of switches in different modes.

FFT length Switch

512

1,024

2,048

4,096

8,192

S1

0

0

1

1

1

S2

0

0

0

1

1

S3

0

0

0

1

1

S4

0

1

1

1

1

S5

0

0

1

1

1

S6

0

0

0

1

1

irregularity of its SFG (single flow graph). In view of these facts, we devise a mixed-radix FFT algorithm that mixes with radix-2, radix-22 and radix-2/4/8 algorithms to develop our VL-FFT architecture. By comparing three kinds of popular pipeline architectures for implementing FFT algorithms [3–6], it is found that single-path delay feedback (SDF) architecture has less hardware requirement and higher utilization rate. So, we adopt the SDF architecture to design our pipeline VL-FFT architecture. In [4, 5], the variable-length FFT structures are suggested as shown in Fig. 6a, b and their detailed architectures are shown in Fig. 7a, b. It is seen that Fig. 6a consisting radix-4 PE and radix-2 PE can deal with only the FFT of lengths 2,048 and 8,192 points, and Fig. 6b consisting of radix-2 PE and radix-2/4/8 PE can deal with the FFT of lengths 512, 1,024, 2,048, 4,096 and 8,192 points. From Fig. 7b, we can see that four multiplexers (MUXs) are needed for the FFT pipelined architecture to deal with varying lengths of 512, 1,024, 2,048, 4,096, 8,192 points. If input data enter the first stage of FFT architecture, it performs an 8,192-point FFT. On the

An Area-Efficient Design of Variable-Length Fast Fourier Transform Processor

Figure 11.

The proposed VL-FFT pipelined architecture.

other hand, if input data enter the second stage by bypassing the first stage, it performs an 4,096-point FFT. By the same analysis for the rest of stages, it is seen that the VL-FFT architecture of Fig. 7b can perform 2,048, 1,024 and 512-point FFT by proper switching using the multiplexers. Now, we propose a new VL-FFT structure shown in Fig. 8. By cascading a radix-2 PE, a mixed-radix PE and three radix-2/4/8 PE, the VL-FFT structure

Table 3.

251

can deal with all the FFT computation of 512, 1,024, 2,048, 4,096, 8,192 points. We use the mixed-radix PE to perform radix-2, radix-22 and radix-2/4/8 FFT algorithms in different modes, as shown in Fig. 9. By proper switching, the mixed-radix PE1 can do either radix-2 2 PE1 or radix-2/4/8 PE1 in stage 2. Similarly, the mixed-radix PE3 can do either radix2 PE or radix-22PE2 or radix-2/4/8 PE3 in stage 4. 0 0 Based on the observation that W1;024 ¼ W4;096 ; W n1;024 ¼

Comparison of VL-FFT architectures.

Characteristics

Wang [4]

Lin [5]

Proposed

FFT size (N)

2,048/8,192

512/1,024/2,048/4,096/8,192

512/1,024/2,048/4,096/8,192

Algorithm

Radix-2

Radix-2

Radix-2

Radix-4

Radix-2/4/8

Radix-22 Radix-2/4/8

Architecture

SDF

SDF

SDF

Data memory

Nj1

Nj1

N-1

N=512

824

824

N=1,024

2,158

2,158

5,338

4,826

Number of nontrivial twiddle factor multiplications

N=2,048

6,146

N=4,096 N=8,192

30,722

12,722

10,160

29,538

24,430

252

Wang and Li 4.

The Design of VL-FFT Processor

In the following, we present the detailed realization of the processing elements and components in the proposed VL-FFT architecture to design VL-FFT processor. 4.1. Radix-2 PE Figure 12.

The radix-2 PE.

0 0 2n 4n n W 4n 4;096 ; W 2;048 ¼ W 4;096 ; W 2;048 ¼ W 4;096 ; W 2;048 ¼ 2n 3n 6n W 4;096 and W2048 ¼ W4096 , we can reduce the

hardware complexity by the substructure sharing for radix-22PE1 and radix-2/4/8 PE1 in stage 2, and by the substructure sharing for radix-2 PE, radix-22PE2 and radix-2/4/8 PE3. The operation of the mixedradix PE can be effectively realized by proper switching to share the same substructure as shown in Fig. 10. Table 2 shows the action of switches in different mode to achieve different length FFT. By this way, we can reduce the complexity of multiplications and the number of complex multipliers. Summing up the above analysis, we develop our VL-FFT pipelined architecture shown in Fig. 11, which can perform different lengths of 512, 1,024, 2,048, 4,096 and 8,192 points. The multiplexers are used to switch for different lengths of FFT and the number of delay elements needed decreases by half in every stage. In order to evaluate efficiency of the proposed VLFFT architecture, we compare with other architectures in Table 3. It is apparent that compared with [4] and [5] our VL-FFT architecture requires less nontrivial complex multiplications.

Radix-2/4/ 8 PE 1

FFT Butterfly

Figure 13.

The radix-2/4/8 PE.

Figure 12 shows the realization of the radix-2 PE which is composed of a FFT butterfly, delay elements, a ROM storing twiddle factors and a complex multiplier. 4.2. Radix-2/4/8 PE Figure 13 shows the realization of the radix-2/4/8 PE which contains radix-2/4/8 PE 1, radix-2/4/8 PE 2 and radix-2/4/8 PE 3. It is realized by three FFT butterflies, delay elements, a constant multiplier, a ROM storing twiddle factors and a complex multiplier. 4.3. Mixed-radix PE In view of Fig. 14, the mixed-radix PE consists of mixed-radix PE 1, mixed-radix PE 2 and mixedradix PE 3. It is seen that a multiplexer (MUX 2) is used to select the sizes of delay elements in the mixed-radix PE 1. By multiplexing the output labeled A (B) to the input of FFT butterfly, the mixed-radix PE 1 can perform radix-22 PE 1 (radix2/4/8 PE 1). Because of sharing the same twiddle factors, the mixed-radix PE 3 can efficiently do radix-2 PE or radix-22PE 2 or radix-2/4/8 PE 3 as depicted in Fig. 9.

Radix-2/4/ 8 PE 2

FFT Butterfly

Radix-2/4/ 8 PE 3

FFT Butterfly

An Area-Efficient Design of Variable-Length Fast Fourier Transform Processor Mixed-Radix PE 2

Mixed-Radix PE 1 1024

Mixed-Radix PE 3

1024 A

B

253

1024 1 or -j

MUX 2

512 Constant multiplier

FFT Butterfly

ROM

FFT Butterfly

FFT Butterfly

Complex multiplier

MUX 4 MUX 3

Figure 14.

The mixed-radix PE.

4.4. FFT Butterfly There are FFT butterflies in every PE of the proposed architecture. Figure 15 shows the realization of the FFT butterfly in which the butterfly outputs a sum and a difference of two inputs at upper and lower branch, respectively. 4.5. Complex Multiplier There are three types of multiplications in the VLFFT architecture. One is the multiplication by jj, the second is the multiplication by a constant twiddle factor and the third is the multiplication by a complex twiddle factor. (a) Multiplication by jj: The multiplication by jj can be realized with no extra hardware cost by simply interchanging the real and imaginary part

of the product as shown in Fig. 16 based on Eq. (5). ða þ bjÞ ðjÞ ¼ b aj

(b) Multiplication by constant twiddle factor: The N=8 3N=8 complex multiplications with WN and WN can be respectively expressed in the following equations. N=8

ða þ jbÞ WN

ða þ jbÞ

3N=8 WN

pﬃﬃﬃ pﬃﬃﬃ 2 2 j ¼ ða þ jbÞ 2 2 pﬃﬃﬃ 2 ½ða þ bÞ þ jðb aÞ ¼ 2

ð6Þ

pﬃﬃﬃ pﬃﬃﬃ 2 2 ¼ ða þ jbÞ j 2 2 pﬃﬃﬃ 2 ½ðb aÞ jðb þ aÞ ¼ 2

ð7Þ

b

real part

−1 a Figure 15.

FFT butterfly.

ð5Þ

Figure 16.

imaginary part Block diagram of multiplication by - j.

Wang and Li

254

a b

2 2

+ -1

2 2

+

real part

a b

-1

+ −

imaginary part

+

a Figure 17. 3N=8 WN .

2 2

2 2

real part

imaginary part

b N=8

a Multiplication with WN . b Multiplication with

Thus, we can realize the complex multiplications with two real additions and two real multiplications as shown in Fig. 17a, b. pﬃﬃ 2 Besides, the can be pﬃﬃ real multiplication by 2 2 1 3 rewritten as 2 ¼ 0:70710678 ¼ 2 þ 2 þ 24 þ 26 þp2ﬃﬃ 8 þ 29. This implies that the multiplication with 22 can be replaced by shift adders and five real adders as depicted in Fig. 18. (c) Multiplication by complex twiddle factor: Based on Eq. (8), one complex multiplier can be realized by four real multiplications and two real additions as shown in Fig. 19. ða þ bjÞðc þ djÞ ¼ ðac bd Þ þ jðbc þ ad Þ

ð8Þ

This complex multiplier occupies large chip area in hardware implementations. Fortunately, this complex multiplier can be realized efficiently with three real

Figure 18.

Implementation hardware of multiplication with

pﬃﬃ 2 2 .

Figure 19. Complex multiplier with four real multiplications and two real additions.

multiplications and five real additions as shown in Fig. 20 based on the Eq. (9).

ða þ bjÞðc þ djÞ ¼ fcða bÞ þ bðc dÞg þ jfdða þ bÞ þ bðc d Þg ð9Þ

4.6. ROM Table All the required twiddle factors are stored in ROM table in advance for the FFT computations. To reduce the size of ROM table, only the cosine and sine values of the angles between 0 and p/4 is stored. Based on the symmetry, the cosine and sine values of the other angles can be generated from stored samples in cosine and sine tables. The proposed VL-FFT processor has been modeled by Verilog HDL and synthesized with the

Figure 20. Complex multiplier with three real multiplications and five real additions.

An Area-Efficient Design of Variable-Length Fast Fourier Transform Processor

Table 4.

consumption. Because these FFT designs use different CMOS technology and the FFT sizes are also different, we adopt the normalized area to evaluate the cost of silicon [4].

The synthesis results of our VL-FFT design.

Characteristics

Values

Process

UMC 0.18 mm

FFT size

512/1,024/2,048/ 4,096/8,192

Word length

12 bits

Voltage

1.8 V

Maximum speed

Area ð10Þ ðTechnology=0:18mÞ2 It is seen that our VL-FFT design is area-efficient since it requires least normalized area. Normalized area ¼

50 MHz Shift register based 19 mm2

Area

SRAM based Power (mW)

2.9 mm2

5.

Shift register based 1.51 W (50 MHz) SRAM based

823 mW (50 MHz)

Conclusions

This paper proposes an area-efficient design of VLFFT processor. Our proposed VL-FFT processor is programmable in selecting different length FFT of size 512/1,024/2,048/4,096/8,192 which is suitable for various OFDM-based communication systems, such as digital audio broadcasting (DAB), digital video broadcasting-terrestrial (DVB-T) and digital video broadcasting- handheld (DVB-H). By synthesized using the UMC 0.18 mm process, the area of our VLFFT processor is 2.9 mm2 and the 8,192-point FFT can be performed correctly up to 50 MHz with power consumption 823 mW under a 1.8 V supply voltage. Our VL-FFT processor is area-efficient because it reduces the number of nontrivial multipliers and optimizes the realization by substructure sharing. Our future research is to achieve the low power design by other low power techniques.

Synopsys cell library. Two kinds of memory model are used: shift register and SRAM. Table 4 shows the synthesis results of our VL-FFT design. It is seen that the SRAM based design is better than shift register based design. The area of the SRAM based design is 2.9 mm2 and it can function correctly up to 50 MHz with power consumption 823 mW under a 1.8 V supply voltage. Using SRAM we can achieve the design with low power consumption instead of using shift register. The comparison with other FFT designs is given in Table 5, including FFT size, word length, algorithm, FFT architecture, technology process, supply voltage, area, clock rate and power

Table 5.

255

Comparison with other FFT designs.

Characteristics

Bidet [10]

He [2]

Jia [3]

Wang [4]

Lin [5]

Proposed Design

FFT size

8,192

1,024

8,192

2,048/8,192

512/1,024/2,048/4,096/8,192 512/1,024/2,048/ 4,096/8,192

Word length, bit

12

12

12

8

12

12

Algorithm

Radix-2 Radix-4 Radix-22

Radix-2/4/8 Radix-2 Radix-4

Radix-2 Radix-2/4/8

Radix-2 Radix22 Radix-2/4/8

FFT architecture

SDF

SDF

SDF

SDF

SDF

SDF

Process (mm)

0.5

0.5

0.6

0.35

0.35

0.18

Voltage (V)

3.3

3.3

3.3

3.3

3.3

1.8

Area (mm2)

100

40

107

33.75

13.05

2.9

Normalized area (mm2)

12.96

5.18

9.63

8.93

3.45

2.9

Clock rate (MHz)

20

20

20

16

45.45

Power (mW) (N=8,192)

600

650

535

20

50

279

823

Power (mW) (N=2,048)

640

198

581

Power (mW) (N=1,024)

480

160

466.5

256

Wang and Li

References 1. J.W. Cooley and J.W. Tukey, BAn Algorithm for Machine Computation of Complex Fourier Series,^ Math. Computation, vol. 19, 1965, pp. 297–301, April. 2. S. He and M. Torkelson, BDesigning Pipeline FFT Processor for OFDM (de)Modulation,^ IEEE Sig Syst Electron, 1998, pp. 257–262, Oct. 3. L. Jia, Y. Gao, J. Isoaho and H. Tenhunen, BA New VLSIOriented FFT Algorithm and Implementation,^ IEEE ASIC Conference, 1998, pp. 337–341. 4. C.C. Wang, J.M. Huang, and H.C. Cheng, BA 2K/8K Mode Small-Area FFT Processor for OFDM Demodulation of DVBT Receivers, ^ IEEE Trans. Consum. Electron., vol. 51, no. 1, 2005, pp. 28–32, February. 5. Y.T. Lin, P.Y. Tsai, and T.D. Chiueh,BLow-Power Variable-Length Fast Fourier Transform Processor,^ IEE Proc.Comput. Digit. Tech., vol. 152, no. 4, 2005, pp. 499–506, July. 6. S. He and M. Torkelson, ^A New Approach to Pipeline FFT Processor ,^ IEEE Parallel Processing Symposium, 1996, pp. 766–770, April. 7. K. K. Parhi, BVLSI Digital Signal Processing Systems Design and Implementation,^ Wiley, 1999. 8. W. Li and L. Wanhammar, BA Pipeline FFT Processor,^ IEEE Signal Processing Systems, 1999, pp. 654–662, Oct. 9. L. Jia, B. Li, Y. Gao and H. Tenhunen, BImplementation of A Low Power 128-Point FFT,^ IEEE Solid-State and Integrated Circuit Technology, 1998, pp. 369–372, Oct. 10. E. Bidet, D. Castelain, C. Joanblanq, and P. Senn, BA Fast Single-Chip Implementation of 8192 Complex Point FFT,^ IEEE J. Solid-State Circuits, vol. 30, no. 3, 1995, March. 11. B.M. Bass, BA Low-Power, High-Performance, 1024-Point FFT Processor,^ IEEE J. Solid-State Circuits, vol. 34, no. 3, 1999, March. 12. A.V. Oppenheim R.W. Schafer, BDiscrete-Time Signal Processing,^ Prentice-Hall, 1999. 13. P. Duhamel and H. Hollmann, BSplit-radix FFT algorithm,^ Electron. Lett., vol. 20, no.1, 1984, pp. 14–16, Jan. 14. S.Y. Park, N.I. Cho, S.U. Lee, K. Kim, and J. Oh, BDesign of 2K/4K/8K-point FFT processor based on CORDIC algorithm in OFDM receiver,^ IEEE Communications, Computers and Signal Processing, vol. 2, 2001, pp. 457–460, Aug. 15. C.P. Hung, S.G. Chen, and K.L. Chen, BDesign of an efficient variable-length FFT processor,^ IEEE ISCAS, vol. 2, 2004, pp. 833–836, May. 16. A. Sadat and W.B. Mikhael, BFast Fourier Transform for high speed OFDM wireless multimedia system,^ Proceedings of the 44th IEEE 2001 Midwest Symposium on Circuits and Systems, vol. 2, 2001, pp. 938–942, 14–17 Aug. 17. M.B. Bevan, BA low-power, high-performance, 1024-point FFT processor,^ IEEE J. Solid-State Circuits, vol. 34, 1999, pp. 380–387, March.

18. A.M. El-Khashab and E.E. Swartzlander, Jr., ^The modular pipeline fast Fourier transform algorithm and architecture,^ Conference Record of the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, vol.2, 2003, pp.1463–1467. 19. S. Yu and E.E. Swartzlander, Jr.,^A pipelined architecture for the multidimensional DFT,^ IEEE Trans. Signal Process., vol. 49, no. 9, 2001, pp. 2096–2102. 20. A.M.El-Khashab and E.E. Swartzlander, Jr.,^ An architecture for a radix-4 modular pipeline fast Fourier transform,^ Proceedings of the 14th International Conference on Application-Specific Systems, Architectures and Processors, 2003.

Shuenn-Shyang Wang received the M.S. and Ph.D degree in Electrical Engineering from Tatung Institute of Technology, Taipei, in 1985 and 1987, respectively. He was an Instructor at the Military Police School, Taipei from 1987 to 1989. Since 1989, he has been with the Department of Electrical Engineering, Tatung University, Taipei, where he is currently a Professor. His research interests include digital signal processing, image/video processing, VLSI implementation for DSP and network security.

Chien-Sung Li received the B.Sc. degree in Electrical Engineering from Chung Hua University, Hsinchu, Taiwan, in 2002 and the M.S. degree in Electrical Engineering from Tatung University, Taipei, in 2005. His research interests include digital signal processing and VLSI implementation for DSP.