IDENTIFICATION OF GROUND VEHICLE S AERODYNAMIC DERIVATIVES USING NEURAL NETWORK NABILAH BINTI RAMLI

IDENTIFICATION OF GROUND VEHICLE’S AERODYNAMIC DERIVATIVES USING NEURAL NETWORK NABILAH BINTI RAMLI A thesis submitted in fulfilment of the requirem...

Author: Cuthbert Clark

16 downloads 0 Views 1MB Size

Report

Download PDF

Recommend Documents

BINTI RAMLI. Kreatif

Pill Identification with Imprints Using a Neural Network

SYSTEM IDENTIFICATION USING NEURAL NETWORKS

USING DISCONTINUOUS TRANSFORMATION IN NUMERICAL SIMULATION OF VEHICLE AERODYNAMIC INTERACTION

Convolutional Neural Network for Paraphrase Identification

QSAR STUDY OF INSECTICIDES OF PHTHALAMIDE DERIVATIVES USING MULTIPLE LINEAR REGRESSION AND ARTIFICIAL NEURAL NETWORK METHODS

Correspondence. Dynamic System Identification Using Neural Networks

STABILIZATION OF CLAYEY SOIL USING POLYETHYLENE AND POLYPROPYLENE FATIN NABILAH BINTI MENDER

Identification and Control of Distillation Process using Partial Least Squares based Artificial Neural Network

IMPROVING THE ROBUSTNESS OF EXAMPLE-BASED DIALOG RETRIEVAL USING RECURSIVE NEURAL NETWORK PARAPHRASE IDENTIFICATION

Using Neural Network for DJIA Stock Selection

PLANT RECOGNITION USING HARDWARE-BASED NEURAL NETWORK

Motion Learning Using The Neural Network

Multdevel Logc Multiplier Using VLSI Neural Network

Paraphrase Recognition using Neural Network Classification

Automated steering design using Neural Network

Tumor Prediction in Mammogram using Neural Network

Population prediction using artificial neural network

Vehicle Identification *

PLASTIC OPTICAL FIBER COMMUNICATION SYSTEM ROSLINA BINTI RAMLI

Research Article Nonlinear Controllers for a Light-Weighted All-Electric Vehicle Using Chebyshev Neural Network

Predicting Motor Vehicle Collisions using Bayesian Neural Network Models: An Empirical Analysis

OPTIMAL DISPATCH OF POWER GENERATION USING ARTIFICIAL NEURAL NETWORK

Survey on Methodologies of Data Mining using Neural Network

IDENTIFICATION OF GROUND VEHICLE’S AERODYNAMIC DERIVATIVES USING NEURAL NETWORK

NABILAH BINTI RAMLI

A thesis submitted in fulfilment of the requirements for the award of the degree of Master of Engineering (Mechanical)

Faculty of Mechanical Engineering Universiti Teknologi Malaysia

March 2008

iii

To my beloved mother, father, sisters and brothers

iv

ACKNOWLEDGEMENTS

In the name of Allah, the most gracious and the most merciful. I thank Allah for the ability to conduct this research and finally completing the thesis.

I wish to express my sincere gratitude and appreciation to all my supervisors for their efforts in guiding me through out the research period. To Professor Dr. Hishamuddin bin Jamaluddin for his supervision and critics, and for sharing his expertise and his time in reviewing this thesis. To my co-supervisor, Dr. Shuhaimi Mansor, for his assistance, comments and guidance, and his permission on reproducing his experimental data. Special thanks are extended to Dr. Waleed Fekry Faris as my external supervisor from International Islamic University Malaysia, for his continuous motivation, support and attention. Also, I would like to thank all the personnel that I have been in contact for their opinion and advices that contribute towards my understanding and thoughts

In addition, I would like to thank Universiti Teknologi Malaysia for the opportunity of conducting my research here, International Islamic University Malaysia for giving me the opportunity to further my educational degree, and to Malaysian Ministry of Higher Education in funding my Masters degree.

Finally, my warmest appreciation to my beloved family; my parents Ramli Muda and Nik Shiham Wan Mashhor, my brothers and sisters, for their support, motivation and prayers for me to successfully completing the degree.

v

ABSTRACT

Stability of a ground vehicle is dependent on its aerodynamic characteristics when encountered by sudden crosswind disturbances. Aerodynamic side force and yaw moment have been identified as the main influence on the sensitivity of a vehicle towards crosswind, which is largely shape related. A reliable identification technique is a prerequisite to estimate the aerodynamic side force and the yaw moment. One of the recent techniques in wind-tunnel testing is the use of a pure yawing motion test rig to simulate the transient behavior of a simple vehicle model in crosswind condition. Adapting the stiffness and damping approach, the lateral aerodynamic derivatives are evaluated from the identified system’s frequency and damping of a pure yawing motion. This research explores the alternative identification technique apart from the conventional method of using a spectral density plot to identify the system’s frequency and the logarithmic decrement of peak amplitude for estimating the system’s damping from a recorded impulse response data. The present study aims to design a multilayer feedforward neural network to carry out the estimation of natural frequency and damping ratio trained with the Bayesian Regularization training algorithm. The network properties studied are necessary to give insight on the optimum network architecture, the suitable input representation and the effect of noise. The possibility of using principal component analysis technique for reducing the network input dimension has also been explored. The results show that the neural network is able to approximate the natural frequency and the damping ratio of an impulse response data and also the ability of the network to handle noisy input data. The application of principal component analysis technique has been shown to reduce the network input dimension while maintaining good estimation results and shortening the network training period. This study demonstrates that the identification of the frequency and the damping of the system can be done using neural network and can be applied to any other similar systems.

vi

ABSTRAK

Kestabilan kenderaan darat bergantung kepada ciri aerodinamiknya apabila berhadapan dengan gangguan angin lintang mengejut. Daya sisi aerodinamik dan momen rewang aerodinamik adalah pengaruh utama yang menentukan kepekaan kenderaan terhadap angin lintang dan nilainya berkait dengan bentuk kenderaan. Satu teknik yang boleh diharap bagi mengenalpasti nilai daya dan momen aerodinamik merupakan satu prasyarat. Salah satu teknik terkini dalam ujian terowong angin adalah penggunaan rig pergerakan rewang tulen untuk menyelakukan kelakuan fana model ringkas kenderaan dalam keadaan angin lintang. Menggunakan pendekatan kekakuan dan peredaman, nilai terbitan aerodinamik sisi diperolehi daripada nilai kekakuan dan redaman sistem tersebut. Penyelidikan ini meneroka teknik pengenalpastian alternatif selain daripada teknik lazim yang mana nilai kekakuan dikenalpasti menerusi penggunaan plot ketumpatan spektral dan nilai redaman diperolehi daripada teknik penyusutan logaritma amplitud puncak. Kaedah pengenalpastian ini adalah berdasarkan sambutan dedenyut sistem tersebut yang telah direkodkan. Kajian ini bertujuan untuk menghasilkan rangkaian neural suap depan berbilang lapis untuk mengenalpasti nilai frekuensi tabii dan nisbah redaman yang dilatih dengan algoritma Penyusunan Semula Bayesian. Sifat rangkaian neural yang dikaji adalah perlu untuk memberi gambaran bagi menghasilkan seni bina rangkaian yang optimum, perwakilan masukan yang sesuai, dan kesan hinggar. Kajian ini juga mengkaji kemungkinan penggunaan teknik analisis komponen utama bagi mengurangkan dimensi masukan rangkaian neural. Hasil kajian menunjukkan bahawa rangkaian neural boleh menganggarkan nilai frekuensi tabii dan nisbah redaman sambutan dedenyut dan ia juga boleh mengendalikan data masukan yang dipengaruhi oleh hingar. Penggunaan analisis komponen utama pula boleh mengurangkan dimensi masukan rangkaian neural sementara mengekalkan nilai anggaran yang baik dan memendekkan tempoh latihan. Kajian ini telah menunjukkan kaedah pengenalpastian frekuensi dan redaman sesuatu sistem oleh rangkaian neural dan ia boleh diaplikasikan kepada sistem lain yang setara.

vii

TABLE OF CONTENTS

CHAPTER

1.

2.

TITLE

PAGE

DECLARATION

ii

DEDICATION

iii

ACKNOWLEDGEMENTS

iv

ABSTRACT

v

ABSTRAK

vi

TABLE OF CONTENTS

vii

LIST OF TABLES

xi

LIST OF FIGURES

xiii

LIST OF SYMBOLS

xviii

LIST OF ABBREVIATIONS

xxi

LIST OF APPENDICES

xxii

INTRODUCTION

1

1.1

Motivation

1

1.2

Problem Statement

2

1.3

Research Objectives

2

1.4

Research Methodology

3

1.5

Scope of work

7

1.6

Organization of the thesis

7

LITERATURE REVIEW

9

2.1

9

Introduction

viii 2.2

3.

Overview of research area in ground vehicle’s crosswind stability study

11

2.3

Development in dynamic experimental approach

14

2.4

System identification in the area of aerodynamic derivatives

16

2.5

Neural network

19

2.6

Principal component analysis for input optimization

21

2.7

Summary

21

IDENTIFICATION TECHNIQUE: CONVENTIONAL METHOD

23

3.1

Introduction

23

3.2

Overview of the Investigation

24

3.3

Background on Experimental Work

25

3.3.1

Oscillation frequency selection

27

3.3.2

Equation of motion of the dynamic test rig

28

3.4

3.3.3 Moment of Inertia Estimation

30

3.3.4

Estimation of side force derivatives

32

Parameter Identification: Conventional Technique

34

3.4.1

The general solution for natural frequency, ωn, and damping ratio, ζ

3.4.2

Estimation

of

yaw

35 stiffness

and

damping

derivatives 3.4.3

Estimation of the damped frequency and time to half amplitude

4.

37 38

3.5

Identification Technique: Results and Discussions

40

3.6

Conclusion

47

IDENTIFICATION TECHNIQUE: NEURAL NETWORK

49

4.1

Introduction

49

4.2

Training and testing data set

50

ix 4.2.1 4.3 4.4 4.5

4.6 4.7 4.8

5.

Training samples distribution

52

Multilayer Neural Network

52

4.3.1 The selection of activation function

54

Training Algorithm

56

4.4.1

57

Procedure for Bayesian Regularization

Network Properties

59

4.5.1

Assignment of initial weights and biases

61

4.5.2

Number of input cycles

62

4.5.3 Number of input nodes

64

4.5.4

Training samples population

65

4.5.5

Effect of number of nodes in the hidden layer

68

4.5.6 Effect of number of hidden layers

70

4.5.7 Effect of noise

72

Application of Neural Network

78

4.6.1

81

Aerodynamic derivatives estimation procedure

Results and Discussion

82

4.7.1

85

Simulation of the Network with Measured Data

Conclusion

89

NEURAL NETWORK: INPUT NODE OPTIMIZATION USING PCA

91

5.1

Introduction

91

5.2

Principals of PCA

93

5.2.1

Procedures for PCA

95

5.2.2

Selection of the principal components

98

5.3

MLNN Design

99

5.3.1 Number of training samples

99

5.3.2 Input node selection

100

5.3.3

101

Hidden layer and hidden nodes

5.3.4 Effect of noise

101

5.3.5

102

Results

x 5.3.6 5.4

5.5

5.6

6.

Discussion

109

Application with the Measured Data

110

5.4.1

Resampling of the data

110

5.4.2

Preprocessing of the input data

110

5.4.3

Estimation of Aerodynamic Derivatives

111

Results and Discussion

111

5.5.1

The recommended network design

111

5.5.2

Estimation of the aerodynamic derivatives

115

Conclusion

117

CONCLUSION

118

6.1

Conclusion

118

6.2

Contribution of the Research Work

120

6.3

Recommendations for Future Research

121

REFERENCES

Appendices A-D

123

132-143

xi

LIST OF TABLES

TABLE NO.

TITLE

PAGE

Table 3.1

Specification of Davis Model (Mansor, 2006)

26

Table 3.2

Measured reduced frequency for the respective 8 springs at 10 m/s wind speeds (Mansor, 2006)

27

Table 3.3

The aerodynamic derivatives results at tunnel speed of 10 m/s

44

Table 3.4

Measurement of yaw moment derivative at two axes measurement

45

Table 4.1

Estimation error by MLNN for the training set and test set

67

Table 4.2

Generalization of the network with two hidden layers

72

Table 4.3

Neural network architectures used to study the effect of hidden layer structure on neural network performance

74

Table 4.4

Neural network structures for number of input nodes investigation

75

Table 4.5

Results of neural network training with different number of hidden nodes 79

xii Table 4.6

Recommended input representation and MLNN design parameters

Table 4.7

The estimation of noisy time response data by neural network and conventional method

Table 4.8

81

84

Aerodynamic derivatives for side force and yaw moment obtained from conventional method and neural network

86

Table 5.1

Single and two hidden layers network structure

107

Table 5.2

Recommended MLNN design for input optimization with PCA

112

Table 5.3

Results for aerodynamic derivatives from MLNN utilizing PCA for input optimization compared with the results obtained conventionally

Table 5.4

115

Yaw moment derivatives estimated by MLNN with and without application of PCA

116

xiii

LIST OF FIGURES

Figure 1.1

Overall research process

5

Figure 1.2

Training process of the MLNN

6

Figure 2.1

The six components of positive direction of aerodynamic forces and moments on simple automotive body type

10

Figure 3.1

Oscillating model in the wind tunnel working section (Mansor, 2006)

26

Figure 3.2

Free body diagram for the free oscillation system

31

Figure 3.3

Two axis measurement tests conducted to estimate the side force derivatives 33

Figure 3.4

Impulse response of a simple oscillator

Figure 3.5

Graphical representation of natural frequency, damping ratio and damped frequency

Figure 3.6

Figure 3.7

35

36

Power spectral density obtained by Fourier transform of the measured time response data

39

The time series response with the curve fitted peak amplitude

40

xiv Figure 3.8

Torsional stiffness versus measured wind-off damped frequency to estimate the moment of inertia of the oscillating model rig for middle reference axis and front reference axis.

41

Figure 3.9

Damped frequency obtained from PSD

42

Figure 3.10

The determination of time to half amplitude

43

Figure 3.11

Aerodynamic yaw moment derivatives, Cnβ and Cnr

46

Figure 3.12

Aerodynamic side force derivatives, Cyβ and Cyr

47

Figure 4.1

Standard plot for second order system impulse response

51

Figure 4.2

Three layer feedforward network

54

Figure 4.3

Commonly used nonlinear sigmoidal function

55

Figure 4.4

Neural network architecture

60

Figure 4.5

Network performance index with five different initial values

62

Figure 4.6

Network performance with increasing number of impulse response cycle used

63

Figure 4.7

Performance index of the network as the input nodes increase

65

Figure 4.8

Random training samples distribution

66

xv Figure 4.9

Results from neural network after being trained with different number of training samples

Figure 4.10

The improvement on network performance with increasing number of hidden nodes

Figure 4.11

68

69

Network performance with increasing number of hidden nodes in second hidden layer

71

Figure 4.12

Noise injected to the generated impulse response, y

73

Figure 4.13

Percentage error on the test set for ωn values

76

Figure 4.14

Percentage error on the test set for ζ values

76

Figure 4.15

The percentage error for ωn values for network with increasing number of input nodes

Figure 4.16

77

The percentage error for ζ values for network with increasing number of input nodes

77

Figure 4.17

Neural network percentage error in estimating ωn values for test set

80

Figure 4.18

Neural network percentage error in estimating ζ values for test set

80

Figure 4.19

Training results of the recommended MLNN design

83

Figure 4.20

The variation of ωn and ζ through out the response

85

xvi Figure 4.21

Validation test for conventional method and neural network

Figure 4.22

Mean squared error (mse) from both techniques

88

Figure 4.23

The simulated data compared with the real measured data for spring K4

89

Figure 5.1

Input representation (a) 8 s input data (b) 3 cycles input data and period of 3 cycles

Figure 5.2

87

92

Scree graph of the correlation matrix. The arrows show the number of PCs that are probably retained (Gordon et al., 2007).

99

Figure 5.3

Data transformation into principal components

100

Figure 5.4

The revised training samples

103

Figure 5.5

The performance of the network designed with different number of principal component used as input node

Figure 5.6

104

(a) The variance in each component number (b) The percentage of total variance of the component number used

104

Figure 5.7

Samples of two impulse response in time sequence

105

Figure 5.8

The transformation of impulse response into their principal components 105

Figure 5.9

Network performance in estimating ωn values

108

Figure 5.10

Network performance in estimating ζ values

108

xvii Figure 5.11

The application of PCA in reducing the input dimension of neural network

109

Figure 5.12

MLNN estimation on ωn values

113

Figure 5.13

MLNN estimation on ζ values

114

xviii

LIST OF SYMBOLS

A

-

Frontal model area

a

-

Neuron output

b

-

Biases

Ca

-

Aerodynamic damping

cg

-

Center of gravity

Cnβ

-

Aerodynamic yaw moment derivatives

Cnr

-

Aerodynamic yaw damping derivatives

cp

-

Center of pressure

Cr

-

Mechanical damping

Cy

-

Side force coefficient

Cyβ

-

Aerodynamic side force stiffness derivatives

Cyr

-

Aerodynamic side force damping derivatives

D

-

Aerodynamic Drag

Freg

-

Regularization objective function

F

-

Force

Fw

-

Sum squares weight

f

-

Neuron activation function

f

-

Frequency

fd

-

Damped frequency

H

-

Hessian matrix

I

-

Identity matrix

Izz

-

Model rig yaw moment of inertia

J

-

Jacobian matrix

Ka

-

Aerodynamic stiffness

Kr

-

Mechanical stiffness

xix Ks

-

Spring linear stiffness

km

-

Reduced frequency

L

-

Aerodynamic Lift

l

-

Characteristic model length

lcp

-

Distance between cp and cg

M

-

Aerodynamic Pitching moment

N, Na -

Aerodynamic yaw moment

Nβ

-

Aerodynamic yaw moment stiffness

Nˆ β

-

Normalized aerodynamic yaw moment stiffness

Nr

-

Aerodynamic yaw moment damping

Nˆ r

-

Normalized aerodynamic yaw moment damping

n

-

Induced local field

R

-

Aerodynamic Rolling moment

Re

-

Reynolds number

r

-

Yaw rate

T

-

Torque

t

-

Time

t1/2

-

Time to half amplitude

U

-

Freestream air velocity

V

-

Air velocity

V

-

Neuron performance function

w

-

Weights

Y

-

Aerodynamic Side force

Yβ

-

Aerodynamic side force stiffness

α, β

-

Regularization parameter

β

-

Yaw angle

β&

-

Yaw rate

β&&

-

Yaw acceleration

βo

-

Initial yaw angle

xx

δ

-

Sensitivity of network performance index to changes in the net input

γ

-

Effective number of parameter

λ

-

Eigenvalue

μ, φ

-

Marquardt parameter

ρ

-

Air density

ωd

-

Damped frequency

ωn

-

Natural frequency

ζ

-

Damping ratio

xxi

LIST OF ABBREVIATIONS

BP

-

Backpropagation

MLNN -

Multilayer neural network

MSE -

Mean square error

PC

-

Principal component

PCA

-

Principal component analysis

PSD

-

Power spectral density

PSNR -

Peak signal-to-noise ratio

rms

-

Root mean square

SSE

-

Sum square error

xxii

LIST OF APPENDICES

APPENDIX

TITLE

PAGE

A

Neural Network

132

B

Bayesian Regularization

134

C

Lavenverg-Marquardt Optimization Technique

137

D

Network Input Optimization with PCA

141

1

CHAPTER 1

1. INTRODUCTION

1.1

Motivation

Crosswind stability is an important area of study in vehicle aerodynamic design for it leads to safety issues. The main concern in aerodynamic design for years has been concentrated on reducing the drag for fuel efficiency. Later on, it was found that the streamlined vehicle shapes are sensitive to crosswind disturbance. The styling trend towards rounder shapes especially at the rear of the vehicles and a continuing reduction in aerodynamic drags are suspected to contribute to the crosswind sensitivity (Howell, 1993).

The theoretical and computational fluid dynamic methods have yet to prove their reliability in investigating the vehicle behavior in crosswind disturbance and ground vehicle aerodynamicist resorts to the experimental techniques where wind-tunnel testing has been widely used to simulate the transient condition. The primary motivation to this work is to design an alternative parameter identification technique to estimate ground vehicle’s aerodynamic derivatives. One of the early uses of parameter estimation was to validate wind tunnel or analytical predictions of aircraft stability and control derivatives (Ming-Der, 1990). Quantitative analysis of ground vehicle stability and its handling qualities make direct use of these parameter estimates. Thus, it is important to have a reliable parameter identification technique for these analyses.

2 1.2

Problem Statement

The oscillating test rig for wind tunnel testing that has been developed by Mansor (2006) managed to capture the transient response of a simple automotive body type in crosswind disturbances. The following mathematical analysis of the oscillating test rig model enables the determination of aerodynamic derivatives from the system’s stiffness and damping which are governed by the natural frequency and damping ratio and was identified in a conventional manner. Conventional method uses an indirect manner of identifying the aerodynamic derivatives where the damping ratio is calculated from the time to half amplitude and frequency is obtained from peak-picking method based on power spectral density calculation.

In the current work, a multilayer neural network was developed to carry out the function approximation task where the natural frequency and damping ratio is approximated based on the recorded impulse time response data. The study investigates the effectiveness of neural network with respect to input representation to the network, the network architecture, the training samples distribution and size, and the application of principal component analysis in reducing the size of the network input dimension. The estimated natural frequency and damping ratio from the designed network is used to calculate the aerodynamic derivatives and the results were compared with the derivatives retrieved through conventional identification process. To validate both techniques, impulse responses were generated from the model systems transfer function and the generated data were compared with the response actually recorded during wind tunnel test.

1.3

Research Objectives

The first objective of this research work is to design an alternative identification scheme for identification of ground vehicle’s aerodynamic derivatives. The work

3 proposes on the use of an artificial neural network that can identify the natural frequency and damping ratio given the impulse response of the automotive body recorded from the oscillating test rig. Secondly, this neural network approach is aimed to provide an alternative identification technique in identifying the natural frequency and damping ratio. The performance from both techniques; conventional and neural network, are compared. Through a modern computational approach, the steps in the identification process in estimating the modal parameters are tried to be reduced.

The properties of the network had been studied to construct the optimum design of neural network that can give a good estimation of damping ratio and natural frequency. In this identification work, the aerodynamic damping that acted on the bluff body is considerably low and it is crucial that the designed network should be able to give good estimation values. In addition, the network should be able to generalize well since all the response measured during the wind tunnel test are of arbitrary pair of natural frequency and damping ratio that have not been encountered by the network during the training process.

To optimize the network size which is proportional to the number of input nodes, the proper input representation to the network had been investigated. This work explores on the possibility of introducing the application of principal component analysis in reducing the number of input nodes to the network. The well used technique in the pattern recognition using the neural network were extended to the function approximation application since this identification process was conducted offline based on the past recorded time response.

1.4

Research Methodology

The estimation of the aerodynamic derivatives was based on the time response data recorded during the dynamic wind tunnel test conducted by Mansor (2006). The

4 impulse response was recorded within the linear range of oscillation (below ±20°) and within the frequency range of influence to the vehicle’s crosswind sensitivity given by the reduced frequency of 0.09 - 0.9. The estimation was based on the response amplitude range from 10° to 1° as the range has lesser significant effect from the initial release and the influence of noise.

The estimation of the aerodynamic derivatives are based on the identification of the frequency and damping of the measured response. The identification process was carried out with two identification tools; the conventional technique and the multilayer neural network as in Figure 1.1. These two identification tools were used to identify the natural frequency and damping ratio from the measured impulse response data. The aerodynamic derivatives were calculated using the identified parameters and the results from both approaches were compared.

For the neural network approach, a multilayer feedforward neural network (MLNN) was first trained using the training data that were generated from the standard second order systems transfer function. The neural network was trained in an inverse system method using the Bayesian regularization training algorithm. Two methods of input representation were introduced. The first representation is in the form of standard plot of a second order system while the second representation consists of the whole impulse response input to the network. In optimizing the size of input nodes in the second representation, principal component analysis was used. The neural network properties were investigated before the proper network architecture was selected. The network was trained in an iterative process until the network output coincides with the targeted output. Figure 1.2 shows the training process for the two input representation method.

5

Measured impulse response data

Conventional method

MLNN

Power spectral density

ωn and ζ values

Logarithmic decrement method

ωn and ζ values

Aerodynamic yaw moment and side force derivatives

Figure 1.1

Aerodynamic yaw moment and side force derivatives

Comparison

Overall research process

6

Figure 1.2

Training process of the MLNN

7 1.5

Scope of work

The current research work is limited to the following:

(i)

Using available experimental data. The data was generated from a free oscillation test using an oscillating test rig to capture the transient behavior of a simple model in crosswind.

(ii)

The identification of yaw moment and side force derivatives for ground vehicle in crosswind. The derivatives value gives the rate of change of aerodynamic force or moment acting on the body with respect to yaw angle.

(iii)

Identification based on damped response of a second order system given that the system is of pure yawing motion of a single degree of freedom system.

(iv)

The damping ratio range is between 0.001-0.1 and natural frequency ranges from 2.5-26.5 rad/s.

(v)

Using a multilayer feedforward neural network.

(vi)

Training algorithm: Bayesian Regularization.

1.6

Organization of the thesis

This thesis is divided into 6 main chapters. The introduction in this chapter is aimed to give some background on the research work. The purpose of the study and the methodology used to achieve the research objective is described and the thesis content is overviewed. The previous research work related to the study is presented in Chapter 2.

8 The literature review starts with the development in the study of crosswind sensitivity of ground vehicles and followed by the identification process. The problem and the shortcoming of the existing techniques used are presented and the alternative approach with neural network is discussed.

Chapter 3 presents the conventional method in identifying the aerodynamic derivatives. The chapter starts with the background and the mathematical analysis of the oscillating test rig that leads to the determination of the aerodynamic derivatives. The conventional modal parameter identification process is reviewed and the approach is reapplied to the published measured data for understanding the principals and the results to be used as benchmark to the alternative method proposed in the study. Neural network as an alternative approach in the identification of the modal parameter is presented in Chapter 4. This chapter investigates the properties and the optimum input representation to the network. The results of aerodynamic derivatives from the identified parameters by neural network are compared with the results in Chapter 3 and a comparison study is conducted to validate both results.

Further investigation on the input optimization is conducted in Chapter 5 by introducing the principal component analysis to reduce the size of input dimension. The mathematical description of the technique is introduced to understand how the principal components are extracted and the subsequent reduction on the input vectors. The results obtained from the proposed approach are compared with the results from conventional technique. Chapter 6 concludes the research work and suggests the possible future research.

9

CHAPTER 2

2. LITERATURE REVIEW

2.1

Introduction

The crosswind stability for ground vehicle is an important factor in car handling for it leads to safety issue. For a car traveling along the road and subjected to crosswind disturbances, the ride and handling characteristic of the vehicle is affected. Analyses produced by Baker and Reynolds (1992) shows that accidents may occur when the vehicle is subjected to crosswind disturbances. Not many wind-induced accidents involve overturning because passenger cars are unlikely to blow over (Barnard, 1996). Rather, the accidents are mostly associated with excessive path deviation that results in impact with other vehicles or roadside objects. Overturning is usually associated with trucks, busses and light vans and for passenger cars it may occur as an indirect result of very large course deviations when experiencing sudden side-wind.

Crosswind characteristics deal with the characteristics of suspensions, tires, steering mechanism and also the aerodynamic characteristics. This work concerns with the aerodynamic characteristics of a car which deal with aerodynamic forces. The complete system of moments and forces is illustrated in Figure 2.1 and it has been identified that the component of aerodynamic that influenced vehicles crosswind sensitivity is the aerodynamic side force, Y, and yaw moment, N.

10 Lift, L

Drag, D Pitching moment, M

Side Force, Y

y

Rolling moment, R x

Yawing moment, N

Yaw angle,

β

Relative wind direction V∞ z

Figure 2.1

The six components of positive direction of aerodynamic forces

and moments on simple automotive body type

An important factor in crosswind stability analysis is the rate at which the forces and moments vary with yaw angle, and it is necessary to measure quantities known as the aerodynamic derivatives such as dC y dβ , where Cy is the coefficients of side force and β is the yaw angle. A large value of aerodynamic derivatives means that the force or moment changes rapidly with angle, hence the vehicle is sensitive to yaw angle changes.

This chapter starts with the review on the background of the ground vehicles crosswind stability study and the following development in wind tunnel test to simulate the aerodynamic effects on vehicle body in crosswind disturbance. The next section reviews on the available identification method in identifying the aerodynamic characteristics. The aim is to give an overview on the investigation of crosswind stability and the role of system identification to estimate the aerodynamic characteristics. Since the identification of aerodynamic derivatives in automotive application is highly

11 depends on experimental approach, the following system identification tools should be able to extract an accurate result from the experimental results.

2.2

Overview of research area in ground vehicle’s crosswind stability study

The dominant factors that affect the crosswind characteristics can be categorized mainly into three categories (Yoshida et al., 1977):

(i)

The aerodynamic characteristics determined by the body shape of the vehicle

(ii)

characteristics of the body structural system determined by the basic dimensions of the vehicle

(iii)

Characteristics of the mechanical systems including suspensions, steering, tires, etc.

In investigating the aerodynamic characteristics alone, a simple body type has been used by most researchers to eliminate other crosswind characteristics of influence. A review by Le Good and Garry (2004) has listed three categories of vehicle shapes that have been used in experimental and computational research in automotive aerodynamics: simple bodies that are mainly for research purpose, basic car shapes used for calibration, correlation and research, and the production (series) cars that are used for variety of specific investigations and correlation studies.

The Davis model that has been used in Mansor’s (2006) work is of simple body type. The model was originated from PhD work by Davis (1983) to investigate the road vehicle wake. A research conducted by Bearman and Mullarkey (1994) using Davis model with systematic changes of backlight angle in investigating the effects of side winds and gusts leads to the result that suggested the measurement of steady forces and

12 moments at fixed yaw angles and assuming quasi-steady flow are conservative estimations of unsteady quantities.

Other simple body types are e.g. the Ahmed model, NRSCC/SAE model, Rover model and Docton model. The usage of Ahmed model in aerodynamic research was among the first study to investigate the significance of the backlight angle on aerodynamic characteristics (Ahmed et al., 1984). The NRSCC/SAE geometry was devised to approximate the overall dimensions of the average North American automobiles and to exhibit the main characteristics of flow-fields associated with temporary cars and trucks. The Rover model devised by Windsor and Howell in late 1980s was to assist in fundamental investigations of shape effects while the Docton model devised by researchers at Durham University to investigate transient effects (Le Good and Garry, 2004). The complete review of all these models and other type of models used can be accessed in Le Good and Garry (2004).

The study on aerodynamic characteristics can give the knowledge on vehicles behavior in crosswind condition. The study requires the knowledge of aerodynamic side force and yaw moment and usually are given in terms of aerodynamic coefficient or aerodynamic derivative. There are mainly three available approaches in identifying the aerodynamic derivatives; the theoretical approach and computational fluid dynamics (CFD), and the experimental approach. The identification of aerodynamic derivatives of ground vehicles is still new relative to the aircraft identification. The subject of ground vehicle aerodynamics does not lend itself readily to the mathematical analysis. There are no straightforward methods to predict airflow behavior around a given vehicle shape. The difficulty in the analysis is due to the highly three-dimensional flow around ground vehicle, the air does not follow the contours of the body everywhere, and there is almost always an unsteady wake (Barnard, 1996). The ground vehicle aerodynamic designer has very few mathematical tools and depends highly on experimental approach, which is still superior to the theoretical and computational fluid dynamics (CFD) approaches till now (Hucho, 1997).

13 Hucho and Emmelmann (1978) and Tran (1991) have developed theoretical predictions for lateral aerodynamic coefficients. Both techniques attempted to predict the transient aerodynamic derivatives. Hucho and Emmelmann (1978) make the first approach in applying the slender body theory simulated by a flat plate to enable the engineering estimates to transient effects. Tran (1991) derived the calculation method on the basis of a plate model and took into account to some extent the influences of partial flow, vehicle side area and pressure distribution over the vehicle length. Both of the calculation techniques make considerable simplifications in developing the model and could not account to the effect of styling details or for a unique vehicle shape. This theoretical approach based on the theoretical knowledge of fluid flow around flat plate is too constrained and limited to be applied to various vehicle shape which the fluid flow is highly three dimensional and the flow field dominated by the effects of separation.

Barnard (1996) dedicated the last chapter in his book discussing the computational fluid dynamics (CFD) method in the application of ground vehicles aerodynamics. Developed from the theoretical knowledge of fluid dynamics, CFD offers a number of numerical approaches to determine the fluid flow around vehicles model. However the results from the CFD can sometimes be misleading and it is found to be sensitive to the numerical scheme used and the turbulence model and choice of parameters used in it. In practice, the use of CFD requires a great deal of experience and specialist knowledge in the area of automotive application.

Up till now, the experimental approach is the most reliable technique in the aerodynamic study of ground vehicles. This empirical approach is more promising and gives better understanding on the flow features. Some experimental techniques use wind blowers to simulate the crosswind effects and others use wind tunnel facility. Although in setting up a large well equipped wind tunnel is costly, the uses of wind tunnels are almost never become obsolete. Wind tunnels can be applied to any geometrical shape and conditions and it can produces high quality reliable data in a very short simulation time, provided that the sources of errors in wind tunnel test has been considered.

14 2.3

Development in dynamic experimental approach

The conventional way of conducting wind-tunnel test is to yaw the model relative to the freestream flow about a fixed ground plane statically. A series of quasistatic measurements of the dynamic loads on the vehicle can then be made over a range of fixed angles. However, the quasi-static approach is not an accurate simulation of the full scale flow since the effective speed and direction of the incident flow (the vehicles forward speed, ambient wind speed and wind direction) is constantly changing.

Garry and Cooper (1986) has shown in their study that the static wind tunnel test in quantifying the aerodynamic loads is conservative. However, the approach is still used in recent studies but mainly for aerodynamic characteristic comparison for different vehicle shape as in Howell (1993) and improving the existing aerodynamic design of current model. Aerodynamic derivatives were found to be sensitive not only on the angle of yaw, but also on the rate of yaw. The data from static wind tunnel test are usually used for simulation of vehicle motion at high speed and crosswind stability in investigating the aerodynamic effects on car handling (Yip et al., 1992; Zengqi et al., 2001).

Some efforts have been reported by researchers to simulate the transient response in a dynamic wind tunnel test using reduced scale model. These include moving models, running along a track which usually traverses a wind tunnel section (Howell and Everitt, 1983), the use of an oscillating aerofoil gust generator to simulate the transient aerodynamic effects produced on a car-type bluff body during a simplified sinusoidal side gust interaction (Passmore et al., 2001) and oscillating test rig models to produce dynamic yaw angles (Mansor, 2006).

In the early study of determining the transient aerodynamic forces and moments during crosswind disturbances, Yoshida et al., (1977) has conducted an experimental study of a 1/10 scale vehicle model passing through a cross-wind region. His model was

15 constructed to be geometrically similar aerodynamic model while neglecting the effects of Reynolds number and Mach number. However, in investigating structures in fluid flow using scale models, it is important to model certain fine structure details more nearly “aerodynamically equivalent” rather than only “geometrically faithful” (Scanlan, 1997). This can be achieved by taking into considerations the Reynolds number and also the whole spectrum of eddy wavelength when turbulence is to be duplicated in the flow.

To match the actual motorway condition during experimentation requires placing the vehicle in a wind tunnel with a system capable of producing aerodynamic force of known characteristics in the transverse direction (frequency, wave length, shape, spectra) (Fillipone, 2003). For consistency with other published work as in Bearman and Mullarkey’s work (1994), Passmore et al. (2001) and Mansor (2006) used the following definition for reduced frequency, k m = πfl Vo where f is the frequency in Hz, l is the characteristic length of the vehicle model in meters and U is the freestream air velocity (m/s). A study conducted by Watkins and Saunders (1995) shows that the frequency that may influence the stability of vehicles in crosswind disturbance varies between 0.25 and 2.5 Hz with the peak energy is approximately at 1 Hz. At motorway speed, the corresponding reduced frequency is between 0.09 and 0.9.

A few methods can be used to simulate the transient condition in wind tunnel and accessing the transient aerodynamic data. Humphreys and Baker (1992) installed an experimental arrangement consists of a model mounted on a trolley on a track and the apparatus was fired across the tunnel by a catapult arrangement. The measurements on forces and moments were recorded by a miniature six-component balance and data acquisition mounted on the trolley. Since the balance is connected to the moving model the induced noise and vibration may contributes to major interference and design problem.

Bearman and Mullarkey (1994) and Passmore et al. (2001) reported their work in measuring the forces and moments on a model in an oscillating flow generated by

16 sinusoidally moving vanes placed upstream in wind tunnel. The former work reported their results in an admittance function: the ratio of dynamic power to the power predicted from quasi-static data. Although at first, it was intentionally to measure the side force and yaw moment admittance using internal balance, the setup suffers a problem of low signal to noise ratio especially at low frequencies. To overcome the problem of low signal to noise ratio, Passmore et al. (2001) attempted to measure the side force and yaw moment using surface pressure tapping to allow the calculation of pressure components of side force and yaw moment. The results were reported as the aerodynamics magnification factors: the ratio of transient to quasi steady predicted force or moment, instead of using the aerodynamic admittance.

Another alternative approach in simulating the transient condition is by rotating the model around its vertical axis in the wind tunnel test section. This approach was used by Garry and Cooper (1986) when they proved the conventional results of static wind tunnel test. However, their approach also suffers the signal to noise ratio problem and the oscillation was set at a controlled rotational speed. Mansor (2006) has developed a dynamic test rig where the motion of the rig is of free oscillation motion rotated around the z-axis. The motion is controlled by a pair of springs. This recent development does not measure the side force and yaw moment directly but the response was characterized by calculating the aerodynamic derivatives (i.e. side force and yaw moment derivatives) from the measured yaw angle position recorded by a potentiometer mounted on the rig. His experimental setup produces high signal to noise ratio measurement.

2.4

System identification in the area of aerodynamic derivatives

System identification is becoming an indispensable tool for many applications in engineering. It is the process of developing or improving mathematical representation of a physical system using experimental data. In aerospace field, there are basically three

17 types of identification work: modal parameter identification, structural-model parameter identification, and control-model identification (Juang, 1994).

Modal parameter identification is the identification of modal parameters (damping, frequencies, mode shapes and modal participation factors) from the measured signals produced by a structure. From these identified modal parameters, further identification process can be carried out to adjust the structural-model parameters. For the control-model identification, the purpose of identification is to develop a model to represent the system for control design.

The range of identification techniques is quite wide ranging from conventional approaches which depend on the theoretical framework of the problem at hand to modern computational approaches. Identification in aerodynamics is taking steady steps in aerospace applications since sometime (Wharington et al., 1993) and can be viewed in the work of Garcia-Velo and Walker (1995) using the extended Kalman filter technique, the estimation-before-modeling method by Sri-Jayantha and Stengel (1987), fuzzy logic approach by Yaacob and Jamaluddin (2002) and neural network (Raol and Jategaonkar, 1995; El-Badawy, 2000). However, there are very few reports on the identification work in automotive field where most works are concentrated on the experimental technique to produce a good simulation of vehicle behavior in transient condition.

In the maximum likelihood method, the widely used identification method in aeronautics, the model structure of the unknown aerodynamic forces and moments must be specified before hand in the parameter identification algorithm. If the model structure is to be modified to accommodate specific maneuvers or aerodynamic assumptions, the identification algorithm must be reformulated. Due to this constraint, Sri-Jayantha and Stengel (1987) introduces the estimation-before-modeling (EBM) approach where the state estimation and the aerodynamic model determination were modeled independently. The EBM consists of two important steps. The first step consists of estimating time histories of the forces and moments corresponding to each 15 maneuvers using extended

18 Kalman-Bucy filter (EKF) and modified Bryson-Frazier smoother. In the second step, the optimal estimates were merged and sorted into subspaces where modeling was performed in each subspace. By validating the model with full scale Schweizer 2-32 sailplane flight test data, EBM was proved to be able to generate nonlinear dynamic aerodynamic model.

Garcia-Velo and Walker (1995) used Extended Kalman Filtering (EKF) to estimate the aerodynamic coefficients in aircraft dynamic models from flight test data. The designed EKF tested on NASA’s X-31 Drop Model and High Angle-of-attack Research Vehicle (HARV) showed poor estimation lateral force estimates for X-31 and a varied estimation quality for HARV. The result depends on the parameter pseudonoise assumption which substantially affects the filter’s effectiveness.

A more modern approach using fuzzy logic was proposed by Yaacob et al. (2002) for estimation of longitudinal aircraft parameters for short-period and phugoid mode of motions. On the overall results, he concluded that fuzzy logic system has a good potential as an alternative approach in aircraft parameter identification and has a similar advantage as that of a feedforward multilayer neural network (MLNN) where it does not need any guesses on the initial values of the parameters.

Neural network is often considered as the black-box modeling approach. It can learn the mapping of the input and output data just from the observation of the training examples presented to it. A review by Wharington et al. (1993) proves that the neural network technique to be a highly feasible concept. The performance showed by neural network exceeds the capability of the more traditional methods used. This is also supported by Raol and Jategaonkar (1995) in their critical appraisal where the recurrent neural network (RNN) retaining the state space model representation was used in estimating aircraft parameter. However, in their approach’s viewpoint, a feedforward neural network may prove to be more flexible and have wider applications although

19 leading to the black box model structure without physical interpretation of the estimated weights.

2.5

Neural network

Neural networks have emerged as one of the promising tools in the area of system identification of nonlinear systems. The popularity of neural networks is due to their ability to learn from its environment in supervised as well as unsupervised ways, plus the universal approximation property of neural networks that makes them highly suited for solving difficult signal processing problems. It is a model-free estimator, since its functional properties do not specifically relate to the exact internal network structure, but to the overall network behavior (Wharington et al., 1993).

Neural network is a computing paradigm that imitates the biological neurons operations. The neural network is developed such that it can abstract the complex details of human brains and simplified it by building a network of simple processor. The main theme is to develop a computer model that can learn from examples. Neural network gains its popularity due to its ability to infer functions from observations.

For supervised learning, the untrained network is provided with a set of examples named the training set of proper network behavior. As the inputs are applied to the network, the network outputs are compared to the targets. The error correction learning rule used then will adjust the weights and biases values of the network that will move the network’s current output closer to the targeted value. Unlike supervised training, the unsupervised learning only provided input samples to the network for training and the network weights and biases are modified in response to network inputs only. Most of the unsupervised training algorithm performs clustering operations. The algorithm categorizes the input patterns into a finite number of classes which is useful in example the vector quantization application.

20 A number of recent works using neural network in aircraft aerodynamic identification have been reported. It can be categorized into two major works; modeling the entire dynamic system (El- Badawy, 2000; Ahmad et al., 2002; Shaheed, 2005), and parameter estimation of aerodynamic derivatives (Raol and Jategaonkar, 1995; Richardson, 2000; Levinski, 2001). The training/ tuning of the network free parameters (weights and biases) received the help from many available methods of analysis in aircraft dynamics, which is lack in automotive aerodynamics.

Since the experimental techniques for automotive aerodynamics are still under considerable amount of research, the identification technique are slowly developing. With the physical interpretation of the flow around ground vehicle body are still difficult to understand, neural network technique seems to be an appropriate identification method for its black box modeling properties.

A more similar approach to this work was reported where the coefficients of structural property are estimated from a recorded displacement in time histories. Liu et al. (2002) has used neural network to determine elastic constants from the measured displacement of anisotropic laminated plate. The MLNN was trained with modified backpropagation training algorithm to estimate the elastic constants from the input dynamic displacement responses on the surface. The estimation results showed to be stable even in the presence of noise and the network accuracy increases with the increasing number of training cycles.

Properties of neural networks have been studied by Billings et al. (1992) for the application of neural network in general and modeling the non-linear dynamical systems in specific. The study addresses the issues of input nodes assignment, the degree of network complexity, the validation test and the effect of noise, and provided the general guideline to conduct the investigation.

21 2.6

Principal component analysis for input optimization

The principal component analysis (PCA) which is also known as the KarhunenLoeve transform is an unsupervised statistical technique to extract information from multivariate data set. The problem solution is based on computing the eigenvectors and eigenvalues of data covariance matrix. The main aim of PCA is to reduce dimensionality with a minimum loss of information. Basically the extraction of principal components amounts to a variance maximizing rotation of the original variable space. “This type of rotation is called variance maximizing because the criterion for (goal of) the rotation is to maximize the variance of the “new” variable, while minimizing the variance around the new variable” (StatSoft Inc., 2007). The original data are never moved, instead the axes were rotated. Thus, PCA decides which; amongst all possible projections, are the best to represent the structure of the data.

Many applications in the pattern recognition using neural network applies principal component techniques to reduce the dimension of input size. This can be viewed in the medical field (Ya, 2007; Stasiak et al., 2007; Ceylan and Özbay, 2007), food technology (Lewis et al., 2007; O’Farrell et al., 2004; Liu et al., 2006), industrial engineering (Mirapeix et al., 2007), chemical engineering (Zhu and Li, 2006), etc.

2.7

Summary

Most of the research works to identify the ground vehicle’s aerodynamic derivatives rely on the use of wind tunnel testing due to the scarcity in the theoretical approach and CFD method. The majority of the wind tunnel tests conducted measured the forces and moments acting on the bluff body directly from the tests and some of the results suffer noisy measurement problem. A different approach used by Mansor (2006) where the yaw position of the vehicle body was measured in a free oscillation motion by

22 initiating the motion with impulse input. The aerodynamic force and moment derivatives were estimated from the equation of motion of the system.

The properties of the network need to be studied have been outlined by Billings et al. (1992). In the area of aircraft as well as structural aerodynamics, neural network has proved its capability as a universal function approximation and has a good potential in the identification area. Neural network has been reported to identify aerodynamic derivatives of aircraft. This provides the basis of using neural network to identify aerodynamic derivatives of ground vehicles. Another new approach will also be explored where the neural network will be trained to approximate function based on reduced data dimension using principal component analysis.

The review has shown that there is a big gap between the advancement in the identification of aircraft aerodynamic derivatives and the automotive aerodynamic derivatives. Neural network has shown promising results in the field of aircraft studies and its ability to infer function just from the observed input-output pairs gives it the credit in automotive area where the actual physical air flow is still not fully understood. Apart from the conventional method of identifying the frequency and damping of an impulse response, a new alternative approach can be advantageous to reduce the calculation steps and to compare with the existing method. The combination of the oscillating test rig and neural network can further improves the existing identification technique and open up new opportunities of incorporating advanced identification tool in the automotive aerodynamic studies.

23

CHAPTER 3

3. IDENTIFICATION TECHNIQUE: CONVENTIONAL METHOD

3.1

Introduction

The term conventional method in this identification process refers to the “tried and proven” methods of long standing and to differentiate from the more recently developed “modern” methods (Sage and Melsa, 1971). Generally, the conventional method identifies the problem in an indirect manner. The parameters to be identified from the recorded impulse response are the natural frequency and damping ratio. The frequency of the impulse response can be obtained by examining the spectral density of the response through frequency response function and the damping ratio can be estimated from the logarithmic decrement method.

The purpose of this chapter is to conduct a conventional approach in estimating the aerodynamic derivatives from a recorded transient time series response. Apart from validating the identification technique proposed by Mansor (2006), this chapter also provides the platform for understanding the underlying principal behind the identification of the ground vehicle’s lateral aerodynamic derivatives. Furthermore, the resulting aerodynamic derivatives were used for benchmarking purposes.

Chapter 3 is organized into several sections where the overview of the work is explained in section 3.2. Section 3.3 describes the background on the experimental work

24 where the selection of experimental parameters is explained and the following mathematical analysis that governs the derivative equations is derived. Section 3.4 presents the conventional approach for identification of the aerodynamic derivatives and the results are presented and discussed in section 3.5. This chapter is concluded in section 3.6.

3.2

Overview of the Investigation

The application of static wind tunnel test has been widely used by most researchers to quantify the static aerodynamic derivatives. However, the static test could not give insights to the aerodynamic loads in transient conditions. Alternatively, the dynamic wind tunnel test offers the capability of measuring the static derivatives as well as the dynamic derivatives. One method of dynamic wind tunnel test was developed by Mansor (2006) to estimate the vehicle aerodynamic derivatives.

Following the experimental work, the interpretation and analysis of the recorded response was carried out. The objective of this chapter is to estimate the aerodynamic derivatives based on a recorded transient response of simple ground vehicle model in a conventional manner. The damping ratio is estimated from the logarithmic decrement method and the damped frequency is obtained through power spectral density of the time series data. These two parameters are then used to calculate the aerodynamic derivatives of the vehicle model.

The aerodynamic derivatives as a function of damped frequency and time to half amplitude were derived based on the characteristic equation of the oscillating test rig and the general solution for the second order system. The measurement of the damping ratio from the logarithmic decrement method perhaps is the most popular method in damping measurement. The measurement was carried out on the generated impulse response of a single-degree-of-freedom oscillatory system. Usually, in measuring the damping ratio

25 based on the experimental data, the analytical approach is based on the assumption that the dynamic system behavior is linear (de Silva, 2000).

3.3

Background on Experimental Work

The free oscillation test was carried out using the simple vehicle model of Davis model. The test conducted by Mansor (2006) was performed at the low speed wind tunnel facility in the Department of Aeronautical and Automotive Engineering at Loughborough University. Figure 3.1 shows the experimental setup in the 1.9 × 1.3 m wind tunnel test section. The one-degree-of-freedom system was constrained to oscillate in yawing motion only, i.e. moment about z-axis. The oscillating model was mounted to a rigid support structure outside the test section connected by a 20 mm circular section steel rod that was passed through a clearance hole in the ceiling. The low friction potentiometer installed at the top of the support rod recorded the angular position of the model at 1 kHz sampling time. The free oscillation was initiated by giving the model an initial yaw displacement and released. The systems oscillation frequency was controlled by a pair of springs with same characteristic value attached at the support and the steel rod. The specification of the Davis model used is listed in Table 3.1 .

26 z x

y

to data acquisition system Spring

Potentiometer Spring Rigid support structure

Wind tunnel working section

Wind direction

Oscillating model

Figure 3.1

Oscillating model in the wind tunnel working section (Mansor,

2006)

Table 3.1

Specification of Davis Model (Mansor, 2006)

Parameters

Width (m)

0.225

Height (m)

0.160

Length (m)

0.625

Ground Clearance (m)

0.040

Frontal Area (m2)

0.036

Side Area (m2)

0.063

Mass (model + oscillating mechanism) (kg)

4.689

Moment Inertia (kgm2)

0.11

Material

GRP/ Composite

27 3.3.1

Oscillation frequency selection

To meet the range of frequencies that influence the crosswind stability of vehicles, a range of springs with different stiffness were used to vary the systems oscillation. By labeling the springs with K1 until K8, Table 3.2 shows the linear stiffness of the springs and their resulting reduced frequency measured at the tunnel speed of 10 m/s. The value of linear stiffness Ks for each spring are given by the manufacturer, Lee Spring Limited, UK. During the test rig design, the selection of the springs was based on the calculation of the reduced frequency. For the analysis of the test results, the measured reduced frequency was used instead.

Measured reduced frequency for the respective 8 springs at 10 m/s

Table 3.2

wind speeds (Mansor, 2006)

Reduced Frequency Spring Code

Spring stiffness

V = 10 m/s

Ks (N/m)

Re = 4.28×105

K1

119

0.13

K2

214

0.19

K3

306

0.25

K4

806

0.39

K5

1051

0.44

K6

1751

0.55

K7

2242

0.66

K8

2594

0.70

28 3.3.2

Equation of motion of the dynamic test rig

The equation of motion of the model mounted on the rig and free to oscillate with pure yawing motion is given by: I zz β&& + C r β& + K r β = ∑ N a (t )

(3.1)

where β , β& , and β&& are the sideslip angle, yaw rate and yaw acceleration respectively. The Izz, Cr and Kr representing the model moment of inertia, mechanical damping and mechanical stiffness respectively with Na(t) being the input function which is the total aerodynamic yaw moment. Following the stiffness and damping approach as applied by Hemon and Noger (2004), the aerodynamic load is written as:

N a (t ) dynamic = N β β + N r r

(3.2)

and r = β& , Nβ is termed as the aerodynamic stiffness derivative that measures the rate of change of the aerodynamic yawing moment with respect to yaw angle while Nr termed as the aerodynamic damping derivative measures the rate of change of yawing moment with respect to the rate of change of yaw rate. The aerodynamic stiffness affects the natural frequency of the system and can be viewed as the elastic property of the system while the aerodynamic damping is a fluid damping type that dissipates the energy in this dynamic system and hence affecting its damping ratio. The aerodynamic derivatives are given by:

Nβ =

and

1 ρV 2 AlCn β 2

(3.3)

29 Nr =

1 ρVAl 2 Cn r 2

(3.4)

where Cnβ is the derivative of yaw moment with respect to the angular position, β, while Cnr is the derivative of yaw moment with respect to the angular velocity, r. The air velocity (m/s), model frontal area (m2), model characteristic length (m) and air density (kg/m3) are each represented by V, A, l and ρ respectively.

Incorporating equation (3.2) into (3.1) and rearranging it,

I zz β&& + (C r − C a ) β& + ( K r − K a ) β = 0

(3.5)

where Ka = Nβ and Ca = Nr. By normalizing equation (3.5) with its moment of inertia, the systems characteristic equation is given by:

β&& +

(C r − C a ) I zz

β& +

(K r − K a ) I zz

β =0 (3.6)

Comparing equation (3.6) with the standard second order dynamic system,

s 2 + 2ζω n s + ω n2 = 0

(3.7)

with ζ is the damping ratio and ωn the natural frequency in rad/s of the system. The systems stiffness and damping are given by:

(K r − K a ) I zz

= ω n2

(3.8)

30 and

(C r − C a ) I zz

= 2ζω n

(3.9)

The mechanical terms are needed to be extracted from the overall systems damping and stiffness to get the aerodynamic damping and stiffness from equation (3.8) and (3.9). The mechanical terms can be evaluated alone by conducting the wind-off oscillation test with the assumption that in the wind-off condition, the aerodynamic loads does not exist.

3.3.3

Moment of Inertia Estimation

In the wind-off condition, the aerodynamic loads are assumed to be zero leading to assumption that:

β&& +

Cr & K r β+ β =0 I zz I zz

(3.10)

From the natural frequency and damping ratio of the rig and the system’s moment of inertia, the mechanical stiffness and mechanical damping is given by:

K r = ω n2 wind −off I zz

(3.11)

and

C r = 2ζ

wind − off

ω n wind −off I zz

(3.12)

31 The moment of inertia of the system (model and support system), Izz, is determined experimentally from the wind-off free oscillation tests. The moment of inertia can be written as:

I zz =

K

ω

r 2 n wind − off

(3.13)

where Kr is the torsional stiffness of the springs and ωn

wind-off

is the free oscillation

natural frequency from the wind-off test. From Figure 3.1, the free body diagram of the system is shown in Figure 3.2.

a Ks F b

T

b

o Ks Δx F

Figure 3.2

Free body diagram for the free oscillation system

With the distance b = 0.1 m defined as the lateral distance from the spring to the axis of rotation. The spring displacement denoted as Δx is given by:

Δx = b sin β

32 And for a small angle of β, sin β ≈ β. Thus Δx=bβ. Torsional stiffness can be defined as:

Kr =

Kr =

T

β

=

2 Fb

β

2 K s Δxb

β

= 2K s b 2

(3.14)

F and T is the force and torque respectively, and Ks is the spring linear stiffness with the values given in Table 3.2. By plotting the series of spring’s torsional stiffness for the springs range against their respective squared value of response natural frequency, ωn wind-off, the value for moment of inertia of the system is represented by the gradient of the graph.

3.3.4

Estimation of side force derivatives

To estimate the side force derivatives, the free oscillation test was conducted at two different locations along the longitudinal axis, one, at the middle of the wheelbase and the other at any arbitrary longitudinal distance from the mid axis which had been selected as 50 mm fore. By assuming that the flow characteristics are the same for the two longitudinal reference positions and the same vector of side force is generated and act at the same centre of pressure, the estimation is made possible. The reference axis and the assumed location for center of pressure, cp, is as depicted in Figure 3.3.

33 cp

cp Yβ

Yβ lcp2

lcp1

Nβ2 Nβ1

O2

O1 (b)

(a)

Figure 3.3

Two axis measurement tests conducted to estimate the side force

derivatives

Let Figure 3.3 (a) represents the first axis of rotation. The yaw moment Nβ about the reference point O1 is given by:

Yβ l cp1 = N β 1

(3.15)

For the second axis of rotation in Figure 3.3 (b):

Yβ l cp 2 = N β 2

Subtracting equation (3.15) and equation (3.16),

(3.16)

Yβ (l cp1 − l cp 2 ) = N β 1 − N β 2

34 (3.17)

In coefficient term:

Cy β

(l

cp1

Cy β =

− l cp 2 ) l

(Cn

β1

= Cn β 1 − Cn β 2

− Cn β 2 )l

Δl cp

(3.18)

(3.19)

Similarly, the damping derivative of the side force is given by:

Cy r =

(Cnr1 − Cnr 2 )l Δl cp

(3.20)

where Δlcp = lcp1 – lcp2

3.4

Parameter Identification: Conventional Technique

Following the mathematical analysis described in section 3.3.2, the two parameters that need to be identified from the recorded time series data are the natural frequency, ωn, and the damping ratio, ζ. The conventional method relies on the theoretical framework of the problem at hand. When a single-degree-of-freedom oscillatory system with viscous damping is excited by an impulse input, its response takes the form of time decay as in Figure 3.4, given by:

y (t ) = y o e −ζωnt sin ω d t

(3.21)

35

Impulse response of a simple oscillator

Figure 3.4

3.4.1

The general solution for natural frequency, ωn, and damping ratio, ζ

To represent the response from the oscillation test rig, equation (3.21) can be written in terms of yaw angle, β, as:

β (t ) = β o e −ζω t sin ω d t n

(3.22)

where βo is the initial yaw angle and ωd is the damped frequency in rad/s. The time taken for the maximum amplitude to reach half of its value is termed as time to half amplitude, t1/2. Thus, the half of maximum amplitude can be written as:

β1 2 = β o e

−ζω n t1 2

β1 2 1 −ζω t = =e βo 2

n 12

36 Solving for the damping ratio, ζ:

⎛1⎞ ln⎜ ⎟ ⎝2⎠ ζ = (− t1 2ω n )

ζ =

0.6931 t1 2 ω n

(3.23)

From the graphical representation of natural frequency,ωn, damped frequency, ωd and damping ratio, ζ, on the s-plane as in Figure 3.5,

Im (s) +jωd

ωn

ωd

φ

ζωn

Re (s)

-jωd

Figure 3.5

Graphical representation of natural frequency, damping ratio and

damped frequency

the relationship between all the three parameters can be written as:

ω n = ω d 2 + ζ 2ω n2

(3.24)

37 with ωd = 2πfd with fd is the damped frequency in Hz. The damped frequency is to be obtained from the frequency analysis of the system. Substituting equation (3.23) into (3.24), the solution for ωn is given as:

⎛ 0.6931 ⎞ ⎟ ω n = 4π 2 f d2 + ⎜ ⎜ t1 2 ⎟ ⎝ ⎠

3.4.2

2

(3.25)

Estimation of yaw stiffness and damping derivatives

From equations (3.8) and (3.11), to get the normalized yaw stiffness derivatives, Nˆ β :

K − Ka Kr Nˆ β = r − = ω n2 wind −on − ω n2 wind −off I zz I zz

Therefore,

⎧⎪4π 2 ( f d2 wind −on − f d2 wind −off ) + ⎡K ⎤ Nˆ β = − ⎢ a ⎥ = − ⎨ ⎪⎩ ⎣ I zz ⎦ ⎛ 1 1 − 0.69312 ⎜ 2 ⎜ (t1 2 ) (t1 2 wind −off ) 2 wind − on ⎝

Similarly, for calculating the normalized yaw damping derivatives, Nˆ r ,

C − Ca Cr Nˆ r = r − = (2ζω n ) wind −on − (2ζω n ) wind −off I zz I zz

⎞⎫⎪ ⎟ ⎟⎬ ⎠⎪⎭

(3.26)

38

⎡ ⎡C ⎤ 1 1 − Nˆ r = − ⎢ a ⎥ = −1.3863⎢ ⎢⎣ (t1 2 ) wind −on (t1 2 ) wind −off ⎣ I zz ⎦

⎤ ⎥ ⎥⎦

(3.27)

Thus, by estimating the time to half amplitude and damped frequency from the time response of the free oscillation test of wind-on and wind-off mode, the normalized yaw moment derivatives can be evaluated.

3.4.3

Estimation of the damped frequency and time to half amplitude

The damped frequency, fd, can be estimated from the maximum peak of the plotted power spectral density calculated from the signal. According to Kay and Marple (1981), the power spectral density, PSD, describes how the power (or variance) of a time series is distributed with frequency showed in Figure 3.6. Mathematically, it is defined as the Fourier Transform of the autocorrelation sequence of the time series. An equivalent definition of PSD is the squared modulus of the Fourier transform of the time series, scaled by a proper constant term. The damped frequency of the response is given by frequency which have the highest power (density) recorded in the graph.

39

Figure 3.6

Power spectral density obtained by Fourier transform of the

measured time response data

The time to half amplitude, t1/2, is defined as the time taken for the maximum amplitude to reach half of its value and can be estimated by interpolating the rate of decay of peak amplitude. The rate of decay of peak amplitudes is described by a polynomial used to curve fit the identified peak amplitude. The range of time response used for identification is between ±10° down to ±1°. This range is selected for its low noise to signal ratio and it is free from the effects associated with initial release (Mansor, 2006).

The degree of the polynomial was selected based on the measure of the “goodness” of fit. The measurement is the residual, the difference between the observed and predicted data. By curve fitting the data with different degrees of polynomial from 2 to 6, the residuals are calculated. Increasing the polynomial degree from 2 to 5 shows

40 significant improvement in the residual while further increment slightly improve the residual. Thus, a polynomial with degree of 5 was used for curve fitting the peak amplitude as shown in Figure 3.7. Through interpolation, the time at which the amplitude has reached half of its maximum value, t1/2, were interpolated.

The curve fitted peak amplitudes

Identified peak amplitudes

10

8

Yaw angle, deg

6

4

2

0

-2

-4

-6

-8

-10

0

5

10

15

20

25

30

35

40

Time, second

Figure 3.7

3.5

The time series response with the curve fitted peak amplitude

Identification Technique: Results and Discussions

The estimation for the system’s moment of inertia was carried out according to section 3.3.3 and the resulting plot is shown in Figure 3.8.

41

ωn2

Figure 3.8

Torsional stiffness versus measured wind-off damped frequency

to estimate the moment of inertia of the oscillating model rig for middle reference axis and front reference axis.

The moment of inertia is given by the gradient of the plotted graph for both mid and front axes of rotation. Their values are 0.101 kgm2 and 1.205 kgm2 respectively.

From section 3.4, the process of identification of aerodynamic derivatives can be summarized as below:

1. From wind-off test, the data were filtered with a low pass filter of cut-off frequency of 10 Hz.. 2. The range of time series data selected was between ±10° and ±1°. The windoff damped frequency, fd

wind-off,

was obtained from power spectral density

and wind-off time to half amplitude, t1/2 wind-off, was obtained from the rate of

42 decay of peak amplitude. An example of the identified fd wind-off and t1/2 windoff

for spring K3 is shown in Figure 3.9 and Figure 3.10.

Maximum power occurs at f=1.3592 Hz

Figure 3.9

Damped frequency obtained from PSD

43

Amplitude (deg)

t1/2=21.97

Time (second)

Figure 3.10

The determination of time to half amplitude

3. Steps 1 and 2 were repeated for wind-on test to get fd wind-on and t1/2 wind-on. 4. From equations (3.26) and (3.27), the normalized yaw stiffness derivatives and yaw damping derivatives were calculated respectively. 5. From equations (3.3) and (3.4), the coefficients of aerodynamic derivatives, Cnβ and Cnr are given by:

Cn β =

2 Nˆ β I zz

ρV Al 2

and

Cn r =

2 Nˆ r I zz ρVAl 2

6. To calculate the side force derivatives, another set of test was conducted with the reference axis moved 50 mm ahead of middle axis. Following the steps 15, the aerodynamic side force derivatives were obtained by applying equations (3.19) and (3.20).

44 The results from the above procedure are tabulated in Table 3.3. The results are the estimated aerodynamic yaw derivatives and aerodynamic side force derivatives for the range of reduced frequency investigated. By using a different degree of polynomial to curve fit the peak amplitudes in estimating the t1/2, the end result of calculated derivatives is not significantly affected.

Table 3.3

The aerodynamic derivatives results at tunnel speed of 10 m/s

Reduced Frequency

Yaw moment derivatives

Side force derivatives

Cnβ

Cnr

Cyβ

Cyr

0.13

0.5120

-0.0769

1.9719

0.7967

0.20

0.5842

-0.0898

3.2180

0.5764

0.25

0.5321

-0.0901

2.1192

0.5729

0.40

0.5462

-0.1009

2.4591

0.2913

0.46

0.5202

-0.1002

2.0465

0.4492

0.56

0.5847

-0.1062

3.4285

0.8613

0.68

0.5615

-0.1027

2.6974

0.5731

0.72

0.4426

-0.1091

1.4104

0.3870

The yaw damping derivatives obtained through the derived mathematical analysis are in negative values. The negative symbol indicates that the aerodynamic loads restore the system damping while positive symbol shows that the aerodynamic loads reduce the system’s damping. From the obtained results, physically, the aerodynamic loads provide a positive fluid damping hence improving the dynamic stability of the system.

45 As expected, the yaw moment measured at the front axis is lower than the measured value at the middle axis. The center of pressure where the side force acts normally is ahead of the center of gravity of vehicles. By moving the reference axis in front of the middle axis, the moment arm is reduced hence reducing the yaw moment produced. The yaw moment derivatives measured at both axes is tabulated in Table 3.4.

Table 3.4

Measurement of yaw moment derivative at two axes measurement

Reduced Frequency

0.13

Yaw moment stiffness derivatives, Cnβ Middle axis Front axis 0.5120 0.3542

0.20

0.5842

0.3268

0.25

0.5321

0.3626

0.40

0.5462

0.3495

0.46

0.5202

0.3564

0.56

0.5847

0.3104

0.68

0.5615

0.3457

0.72

0.4426

0.3297

For consistency purpose, the resulting aerodynamic derivatives from this work were compared with the values obtained from Mansor (2006). The aerodynamic stiffness values are expected to be the same since it is governed by the natural frequency of the system which was estimated through power spectral density method. However, some differences are expected in the values of aerodynamic damping which is governed by the damping ratio that was estimated from the time to half amplitude. The estimation depends on the interpolation of the identified peak amplitudes. In Mansor (2006), the peak amplitudes were selected based on the changing of the gradient sign of the plotted time series data while in this work, the peak amplitudes were identified by selecting the maximum amplitude value in each cycle. The later approach can reduced the mistakes of selecting the wrong amplitude as the peak amplitude since in dealing with real data the

46 measurements are always influenced by noise. The resulting aerodynamic derivatives in this current research work and from Mansor (2006) are shown in Figures 3.11 and 3.12.

0.65 Mansor Current work

0.6

Cnβ

0.55 0.5 0.45 0.4 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Reduced frequency, km −0.075 Mansor Current work

−0.08 −0.085

Cnr

−0.09 −0.095 −0.1 −0.105 −0.11 0.1

0.2

0.3

0.4

0.5

0.6

Reduced frequency, km

Figure 3.11

Aerodynamic yaw moment derivatives, Cnβ and Cnr

0.7

0.8

47 3.5

Mansor Current work

3

Cyβ2.5 2 1.5 1 0.1

0.2

0.3

0.4

0.5

Reduced frequency, km

0.6

0.7

0.8

3.5 Mansor Current work

3 2.5

Cyr

2 1.5 1 0.5 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Reduced frequency, km

Figure 3.12

3.6

Aerodynamic side force derivatives, Cyβ and Cyr

Conclusion

In this chapter, the lateral aerodynamic derivatives; yaw moment and side force, have been identified from the analysis of the angular displacement of the car model that had been measured from the dynamic wind tunnel test. The dynamic test was conducted using an oscillating test rig where the frequency of oscillation of the model represents the frequency of disturbing wind input.

48 In conducting the dynamic wind tunnel test to estimate the aerodynamic derivatives of ground vehicle, the following identification technique is still new and as a novel approach, the parameter identification technique was carried out in a conventional manner. This approach is only valid with the assumption that the dynamic system behavior is linear.

Following the conventional technique, the identification was achieved in an indirect manner. From the recorded time series data, the power spectral density had to be calculated first to obtain the damped frequency. The identified peak amplitudes then were curve fitted first to enable the interpolation of the time to half amplitude. From these two intermediate parameters, the natural frequency and damping ratio of the response were obtained and the aerodynamic derivatives were calculated.

The improvement on the identification technique is proposed in Chapter 4 where the natural frequency and damping ratio of the time response will be approximated using neural network. This modern computational approach is proposed to reduce the identification steps and the error in identification can be minimized by improving the network performance.

49

CHAPTER 4

4. IDENTIFICATION TECHNIQUE: NEURAL NETWORK

4.1

Introduction

An alternative identification scheme to the conventional parameter identification technique discussed in Chapter 3 is proposed in this chapter. A multilayer feedforward neural network (MLNN) trained with backpropagation algorithm was used to approximate the natural frequency,ωn, and damping ratio, ζ, from recorded impulse time response data. This approach estimates the ωn and ζ from the recorded time series data in a more direct manner. The MLNN model was trained using training data containing a set of impulse response of the general second order system with its corresponding range of ωn and ζ values. The trained MLNN model was used to identify the natural frequency and damping ratio of the system by feeding in the measured yaw angle response. The identified ωn and ζ values are required to calculate the aerodynamic derivatives using the equations derived in section 3.3.2. The training of the neural network is an iterative process which would be continuously retrained until the estimated ωn and ζ values coincide with the target values.

Following the introduction to this chapter in this section, section 4.2 describes the training and test data set that were be used to train the network. An introduction to the multilayer neural network is given in section 4.3 and section 4.4 explains the training algorithm that is proposed to be use for training the MLNN. Jones (1991) and Montgomery (1996) recommended on using simpler models at the start of study and

50 later the design parameters are improved iteratively through out the experimentation. Hassan (2002) stated that it is usually a major mistake to design a single large comprehensive experiment at the start of a study. Thus, each properties of the network were investigated while holding other important parameters constant and reported in section 4.5. The recommended design and the application of MLNN in estimating the aerodynamic derivatives is given in section 4.6 and the results are presented and discussed in section 4.7. Chapter 4 is concluded in section 4.8.

4.2

Training and testing data set

For function approximation purpose, the network should be taught by a proper input-output pairs. The training data set is the example of input-output pairs that will be used for the network to learn to approximate the function properly and to adapt to the new environment. The test data set is another set that has not been seen by the network during training that is used to test the generalization of the network.

In this study, the data set to train and validate the neural network were generated from an equivalent system transfer function. Given that the motion of the oscillating test rig is of pure yawing motion as described in section 3.3.2, the training data can be generated from an equivalent second order system of a mass and spring damper system. The impulse response to be used as the training data set and the test data set can be generated from the general second order systems transfer function given by:

G ( s) =

ω n2 s 2 + 2ζω n + ω n2

(4.1)

51 The time response data of impulse excitation were used as the network inputs. One of the necessary conditions in successfully utilizing the neural network model is to have the sought outputs that have significant dependence on the output data (Liu and Han, 2003) and for this work, the suitable output for the network are the natural frequency and damping ratio.

The neural network was trained based on the standard plot of the impulse response of a second order system as shown in Figure 4.1. The amplitudes as a function of time will determined the damping ratio of the system and from the known period of cycles, the natural frequency of the system can be determined from the ωnt axis.

ζ=0.01 ζ=0.02 ζ=0.04 ζ=0.06 ζ=0.08 ζ=0.1

ω nt

Figure 4.1

Standard plot for second order system impulse response

52 4.2.1

Training samples distribution

Apart from proper network architecture and the efficient training algorithm, the selection of training samples also plays an important role in obtaining a reliable MLNN model for the studied problem (Rogers, 1994). Generally, an ideal set of training samples should be adequate to represent the total sample space.

One method of selecting the sample is by using a complete possible combination of input output pairs (Manson et al., 1989) but the resulting number of samples is impractically large to be applied for a complex engineering problem (Liu and Han, 2003). Rogers (1994) introduced the linear method where the training samples are generated by starting at the lower bound of each parameter and then stepping through the sample space at a given increment until reaching the upper bound. Another recent approach by Atalla and Inman (1998) is by random generation of the characteristic parameters within their variation ranges which could produce a good training result.

4.3

Multilayer Neural Network

A multilayer feed-forward neural network (MLNN) typically consists of a set of sensory neurons that constitute the input layer, one or more hidden layers of computation nodes, and an output layer of computation nodes. The input signals propagated through the network in a forward direction. Nodes receive their inputs exclusively from outputs of nodes in the previous layer and outputs from these nodes are passed exclusively to nodes in the following layer. The synaptic weights and thresholds are adjusted to make the actual response of the network move closer to the desired response and this process is called training or learning.

Multilayer neural networks can be used to approximate almost any function if there are enough neurons in the hidden layers. Consider a three layer MLNN such as in

53 Figure 4.2 where p are the input to the network, f is the transfer function or activation function of the neuron, w are the weights, and b are the thresholds or biases. The net input to neuron i in layer k+1 is given by:

sk

n k +1 (i ) = ∑ w k +1 (i, j )a k ( j ) + b k +1 (i )

(4.2)

j =1

The n k +1 is also termed as the induced local field of the neuron i (Haykin, 1999). The output of neuron i will be:

a k +1 (i ) = f

k +1

(n k +1 (i ))

(4.3)

Thus, for an M layer network, the system equations in matrix form are given by:

a0 = p

a k +1 = f

(4.4)

k +1

(W k +1 a k + b k +1 ),

k = 0, 1, ... , M-1

(4.5)

54 p(1)

w(11,1)

n1(1)

Σ

f1

a1(1)

w(21,1)

Σ

b(11)

p(2)

f1

a1( 2 )

w(1s1,k )

1 (s ( s1)

f1

a1( s1)

n(22 )

w(2s 2, s1)

n(2s 2 )

Σ

b

Figure 4.2

w(31,1)

n(31)

Σ

f2

a(22 )

n(32 )

Σ

b(22 )

n(1s1)

Σ

a(21)

f3

a(31)

b(31)

Σ

b(12 )

p(3)

f2

b(21)

n(12 )

Σ

n(21)

f3

a(32 )

b(32 )

f2

2 ((ss 2 )

b

a(2s 2 )

w(3s 3, s 2 )

Σ

n(3s 3)

f3

a(3s 3)

b(3s 3)

Three layer feedforward network

The assigned task for the network is to learn associations between a specified Q set of input-output pairs {(p1,t1), (p2,t2), …, (pQ,tQ)}. The performance of the network can be measured by introducing a performance index. A typical performance index for learning based rule network is given by:

Q

Q

q =1

q =1

Fe = ∑ (t q − a qM ) T (t q − a qM ) = ∑ eqT eq

4.3.1

(4.6)

The selection of activation function

The choice of activation function depends on the training algorithm used. Ideally hardlimited is desirable that mimic the natural biological system. The computation of δ for each neuron of the MLNN requires the derivatives of the activation function, f

k +1

associated with the neuron if backpropagation algorithm is used, where δ is defined as the sensitivity of the performance index to the changes in the net input. Therefore the activation function chose for the neuron should be differentiable. The commonly used of

55 a continuously differentiable nonlinear activation function in MLNN is sigmoidal nonlinearity and one of the forms is the logistic function.

The general form of logistic function is defined by:

f (n j ) =

1 1 + exp(− n j )

(4.7)

where nj is the induced local field of neuron j. According to this nonlinear activation function, the amplitude of the neuron’s output is squashed inside the range of 0 ≤ aj ≤ 1.

There is another form of activation function named hyperbolic tangent function which is also among the commonly used form of sigmoidal nonlinearity and this activation function squashed the output into range of [-1,1].

The two nonlinear

sigmoidal activation functions are described in Figure 4.3.

1 1+ e

Figure 4.3

n

−n j

−n

e j −e j tanh(n j ) = n j −n e +e j

Commonly used nonlinear sigmoidal function

The output layer’s activation function determines the range of the network output. If the logistic function is used, the network output is restrained in the range of

56 [0,1] while the tangent sigmoidal function will squash the network output in the range of [-1,1] and a pure linear activation function will give an unlimited output range, [-∞,+∞] where aj = nj.

4.4

Training Algorithm

The most popular training algorithm for Multilayer network is the backpropagation algorithm (BP). For BP training algorithm, the network error is propagated in a backward direction, hence the name backpropagation. The BP algorithm is a basic and the most widely used weight updating method of MLNN (Gupta et al., 2003). There are several different backpropagation training algorithms. To name a few: resilient backpropagation algorithm which performs well for pattern recognition problem; Lavenberg-Marquardt algorithm that generally have the fastest convergence on function approximation problem and advantageous if accurate training is required; and the conjugate gradient algorithm that works well for both pattern recognition and function approximation problem. However, in general the available training algorithm for MLNN has an issue in determining the optimal number of input and hidden neurons as well as the hidden layers, and usually they have to be determined by trial and error.

Foresee and Hagan (1997) have proposed the Bayesian Regularization training algorithm. This algorithm is the modification of the Levenberg-Marquardt training algorithm that is consistently produces networks with good generalization. Furthermore, this algorithm helps to reduce the difficulty of determining the optimum network architecture by introducing a parameter termed as the effective parameter that calculates the number of parameters that is useful to the MLNN. The Bayesian regularization and the Lavenberg-Marquardt optimization technique are described in detail in APPENDIX B and APPENDIX C. In this study, for the above mentioned attributes, Bayesian Regularization was selected as the training algorithm to train the MLNN.

57 4.4.1

Procedure for Bayesian Regularization

The

procedures

of

Bayesian

regularization

with

Lavenberg-Marquardt

optimization involve two phases. In the first phase the input signals are propagated in the forward direction. The inputs are presented to the first hidden layer and propagated to the output layer and the network outputs are calculated. The difference in the actual network output and the desired targeted output is calculated to produce prediction error and the summation of total weights is calculated as well. Regularization adds an additional term to the network objective function; the sum of squares of the network weights, Fw. Thus, the objective function for regularization, Freg, becomes:

Freg = βFe + αFw

(4.8)

where α and β are the regularization parameters and Fw is the sum of squares of the network weights. The α and β are given by:

α MP =

γ n−γ and β MP = MP 2 Fw (w ) 2 Fe (w MP )

(4.9)

where γ is the effective number of parameters used and MP is the minimal point.

The second phase involves the backpropagation of the of the error signals from the output layer to the first layer. Weights and biases are updated according to these error signals and the processes are repeated until the weights and biases converge or the performance goal is met. The procedures for Bayesian optimization of the regularization parameters, with the Gauss-Newton approximation to Hessian matrix (Foresee and Hagan, 1997) and the Lavenberg-Marquardt updating technique (Hagan and Menhaj, 1994) are summarized below:

58 1. Initialize α, β and the weights. α is set to 0 and β = 1 and initialize the weights and biases are according to Nguyen-Widrow method (Nguyen and Widrow, 1990). The Marquardt parameters are set to initial value of μ = 0.05 and φ = 10. 2. Present all inputs to the network and calculate the corresponding network outputs (using (4.4) and (4.5)), and the resulting errors ( e = t q − a qM ). Compute the sum of squares of errors over all inputs and the sum of squares of total weights. Use the Lavenberg-Marquardt algorithm to minimize the objective function F (w ) = βFe + αFw = V ( x ) . Compute the Jacobian matrix. 3. Compute the effective number of parameters γ = N − 2αtr (H ) −1 making use of the Gauss-Newton approximation to Hessian available in the LavenbergMarquardt training algorithm: H = ∇ 2 F (w ) ≈ 2βJ T J + 2αI N where J is the Jacobian matrix of the training set errors. 4. Compute the new estimates for the objective function parameters α and β using equation (4.9). 5. Calculate Δx given by:

[

Δx = J T ( x )J ( x ) + μI

]

−1

J T ( x )e ( x )

where J is the Jacobian matrix. Recompute the sum of squares of errors using x + Δx . If the sum of squares is not reduced, then increase μ by factor of φ and resolve the f( Δx ) to obtain the new Δx . If this new sum of squares is smaller than that computed in step 1, then reduce μ by factor of φ and let x = x + Δx and go back to step 1.

6. Now iterate steps 2 through 5 until convergence or when the sum of squares has been reduced to some error goal.

59 When this Gauss-Newton approximation to Bayesian regularization algorithm is used, good results are obtained if the training data is first mapped into the range [1,1] and typically both inputs and outputs are scaled (Foresee and Hagan, 1997). With the available computed value of effective number of parameter, γ, the condition of the network can be administered. If the final γ value is very close to the actual number of parameters, N, then the neural network may not be large enough to properly represent the true function. Thus, the network size can be increased by adding more hidden layer nodes and retrain. If the larger network has the same γ value, then the smaller network is large enough for the application, else, more hidden layer nodes may be added. The consistency of the network results can also be checked. If the network is sufficiently large, then the second larger network will achieve comparable values for γ, Fe and Fw.

4.5

Network Properties

Besides designing the network to be able to learn to approximate the function properly from the training set, it is important for the network to generalize as well. Apart from using the appropriate learning algorithm, the optimum size and the distribution of the training samples as well as the network topology can influence the network performance and its generalization property. According to Haykin (1999), generalization is influenced by three factors: (i) the size of the training set, and how well it represent the environment of interest, (ii) the topology of the neural network, and (iii) the physical complexity of the problem at hand. The first two factors can be addressed through the design of the network. Thus in addressing both factors, the problem can be tackle from two different perspectives (Hush and Horne, 1993):

•

The neural network topology is fixed, and the size of the training set is investigated for a good generalization to occur.

•

The size of the training set is fixed, and the best network topology that can achieve good generalization is determined.

60 The basic network architecture consists of an input layer that is represented by the normalized time function amplitude and the scaled cycles period, t, n hidden layers with m nodes, and an output layer with two nodes representing the scaled values of estimated ωn and ζ. The representation of the network architecture is shown in Figure 4.4. The estimated scaled values of ωn and ζ are represented by a1 and a2 respectively. In this section, the properties of the presented network were studied, i.e. the assignment of initial weights and biases, the size of n and m, the input representation, and the effect of noise.

Normalized amplitude

n1

nn m11

1 2

m

m1n

a1

Time series amplitude

a2

mmn

m1m +

t

Input layer

Figure 4.4

Hidden layer

Neural network architecture

Output layer

61

4.5.1 Assignment of initial weights and biases

In the demonstration of the Bayesian regularization by Foresee and Hagan (1997), the weights and biases were initialized according to Nguyen-Widrow rules. The Nguyen-Widrow method initializes the weights and biases so that the active regions of the layer's neurons will be distributed approximately even over the input space. The Nguyen-Widrow method can reduce the number of neurons wasted since the distribution of the random values of the initial weights and biases are even. Furthermore, the training of the network is faster compared to randomly distributed weights and biases value and has been proved by Nguyen and Widrow when training “Truck-Backer-Upper” (Nguyen and Widrow, 1989) where the training time were reduced from 2 days to 4 hours.

Two networks with the same structure trained with the same training algorithm may produce different estimation error after the random assignment of the values of the weights and biases. The error for network A may falls at different location on the surface error compared to the error for network B. Consider a case where the error for network A falls at a location far from the minimum error surface while the error for network B located much nearer to minimum error surface. Thus, network A will need to be trained with an epochs larger than network B to reach convergence. This can be illustrated as in Figure 4.5. The network was trained five times with different initial random values assigned to the weights and biases according to Nguyen-Widrow rules and trained for 2400 epochs. The performance index of the network starts to merge and converge together as the training epochs reach approximately 900 and the final numbers of effective parameters of the network are approximately the same. Typically one epoch of training is defined as a single presentation of all input vectors to the network. It is a good practice to train the network with a quite large number of epochs and train the network a few times with different initial random values. In this case, 1000 epochs may be considered enough to ensure the network has reached the global minimum error surface.

62

Figure 4.5

Network performance index with five different initial values

4.5.2 Number of input cycles

With the available records of impulse response over a period of t second, the lagged outputs y(t-j), j > 0 were used as the network input. However, the sufficient and optimum number of cycles of the response to be used has to be determined. The more cycles to be included as the input node the bigger the network size will be. However, small number of delayed output to be included as the input node may be insufficient to properly estimate the values of ωn and ζ.

According to Cybenko (1989), neural network with one hidden layer structure is sufficient to approximate functions. Therefore, the initial study on the optimum number of response cycle was carried out using a single hidden layer network. Batch training was performed where the weights and biases of the network were updated only after the

63 entire training set had been applied to the network. The gradients calculated at each training example were added together to determine the change in the weights and biases.

The investigation on the optimum number of response cycle was carried out by taking 1, 2, 3 and 4 response cycle as the network input. Satisfying sampling rules (Sinha, 2000; Roffel et al., 1989), a complete cycle of impulse response was sampled into 30 data points resulting into 30 input nodes for 1 response cycle used, 60 input nodes for 2 response cycle used, 90 input nodes for 3 response cycles used and 120 input nodes for 4 response cycle used. Through experiments, the minimum number of hidden nodes that can give proper estimation results is 6. Thus, the training was conducted with 6 hidden nodes and repeated with 8 and 10 nodes in the hidden layer and the results of the training are depicted Figure 4.6.

1.00E-05 6 hidden nodes 8 hidden nodes 10 hidden nodes

8.00E-06

SSE

6.00E-06

4.00E-06

2.00E-06

0.00E+00 1

Figure 4.6

2 3 Number of response cycle

4

Network performance with increasing number of impulse

response cycle used

64 Although by increasing the input cycles can reduce the network performance index, the network size becomes large and took longer training period and becomes impractical. Since smaller network is in favor to avoid overfitting if bigger size of network used, three cycles of impulse response is sufficient for the network to properly associate the input with its correspondence output.

4.5.3 Number of input nodes

The number of input nodes specifies the dimension of the network input. Increasing the number of input nodes to represent the impulse response increases the sampling rate of the time response. In order that there would be no loss of information during the sampling process, the sampling frequency should be at least twice the largest frequency component of the signal being sampled (Sinha, 2000) or the sampling time is about 1/10 of the dominant time constant (Roffel et al., 1989).

Satisfying both conditions, the lowest number of input nodes to represent the 3 cycle response was taken to be 40. The investigation was carried out using a network structure of 6 nodes in a single hidden layer by assuming that it is sufficient enough based on the previous input cycle study. The number of input nodes would be increased incrementally and the effect on the network performance index is depicted in Figure 4.7. By increasing the number of input nodes the network performance is improved as well but to the expense of network size and training time. Since the performance of the network with 40 input nodes is considerably low, it is sufficient to represent the three impulse response cycle with 40 sampled data.

65 3.00E-05

SSE

2.00E-05

1.00E-05

0.00E+00 40

50

60

70

80

90

100

Number of input nodes

Figure 4.7

Performance index of the network as the input nodes increase

4.5.4 Training samples population

The generalization depends on the number of training samples that is being used. When the network was trained with an insufficient training sample, the generalization of the network becomes poor and it improves as the number of training samples was added. The network was first introduced to 50 training samples and the samples distribution is as shown in Figure 4.8. A network structure of 41-6-2 was used for the investigation. After completing the training process, the network was simulated with the training set and the test set to test for generalization. The test set is another set of 100 samples that has not been used during the training of the neural network. The mean squared error (MSE) for both estimation results compared to the actual values were calculated. Then, the number of training samples was increased and the training and the estimation process were repeated.

66 The results of training the network with different number of training samples on its generalization property are tabulated in Table 4.1. The neural network estimation on the test set improves as the training samples are added into the training set although the accuracy on the training set starts to decrease slightly. This is expected since more samples were learned by the neural network. The network is said to have a good generalization property if the estimation on the test set is as accurate as the estimation on the training set.

Training sample distribution 30

Natural frequency, ωn

25

20

15

10

5

0

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Damping ratio, ζ

Figure 4.8

Random training samples distribution

0.08

0.09

0.1

67

Table 4.1

Number of

Estimation error by MLNN for the training set and test set

Training set

Test set

MSE for ωn

MSE for ζ

MSE for ωn

MSE for ζ

50

0.00001

3.4411×10-10

0.86870

4.6500×10-6

100

0.00001

7.9773×10-10

0.04880

4.1554×10-8

150

0.00015

3.8335×10-9

0.01160

1.8051×10-8

200

0.00019

2.0986×10-9

0.00560

3.9502×10-9

250

0.00210

5.4253×10-9

0.00520

1.2479×10-8

300

0.00190

7.2328×10-9

0.00330

1.3569×10-8

training samples

Figure 4.9 depicts the network performance as the training samples increases. More network nodes were used to memorize the additional sample shown by the number of parameters that are effectively used by the neural network in Figure 4.9 (a). From 200 samples and above, the number of effective parameters used by the network is approximately the same. The generalization of the network improves as more training samples are presented to the network as in Figure 4.9 (b) and (c). The generalization of the neural network for both estimated values is the same when the number of training samples has reached 250 and it does not significantly improved with more added training samples. A minimum of 250 pairs of noise free input and output are sufficient to teach the network the relationship between the time series function and ωn and ζ values.

MSE of ζ

MSE of ωn

Effective number of parameters

68

Figure 4.9

Results from neural network after being trained with different

number of training samples

4.5.5 Effect of number of nodes in the hidden layer

With an adequate number of hidden nodes, MLNN is said to be able to approximate any function given that there is solutions to that function. One question that should be addressed is whether the network performance can be improved by increasing the network complexity. The available value of the effective number of parameters, γ,

69 resulted from the use of Bayesian regularization, gives an additional indication as to when the network hidden nodes used is sufficient to solve the given task.

The nodes in the hidden layer were set to 2 and the network performance and the number of effective parameters used were monitored. Then the study was repeated by adding two nodes in the network hidden layer in steps until there is no significant improvement to the network performance. The additional nodes in the hidden layer can further improve the network performance index as shown in Figure 4.10. However, from the calculated value of γ, the network with more than 10 nodes in the hidden layer only use approximately 65% of the total available free parameters effectively. Since it is recommended to design a smaller size of network while maintaining good network performance, 6 nodes in the hidden layer can be considered adequate.

100

SSE

10-1

10-2 10-3 10-4 10-5 10-6 10-7 10-8 Number of hidden nodes

Figure 4.10

The improvement on network performance with increasing

number of hidden nodes

70

4.5.6 Effect of number of hidden layers

Haykin (1999) stated that the universal approximation theorem does not state whether a single hidden layer network is optimum enough for generalization. The universal approximation theorem stated that a single hidden layer is sufficient for a multilayer perceptron to compute a uniform ∈ approximation to a given training set represented by the set of inputs x1, …, xmo and a desired target output ƒ( x1, …, xmo) where ∈ is the small positive number of approximation error. The universal theorem is supported by Cybenko (1989) stating that a single hidden layer, feed forward neural network is capable of approximating any continuous, multivariate function to the desired degree of accuracy and that failure to map a function arises from poor choices of free parameter values or an insufficient number of hidden neurons (Hassoun, 1996).

Adding one more hidden layer to the network will further increase the complexity of the network. From the investigation of the optimal hidden nodes, further increment in the number of nodes can further reduced the network performance index, i.e. the estimation of the network is closer to the real value. On the other hand, the ratio of the number of effective parameters used over the total available parameters starts to reduced indicating that the network is oversized. In this study, the effect of adding another hidden layer on the network performance index, network generalization property, and the ratio of the effectives parameters used were examined. The number of nodes in the first hidden layer was set to 6 and the number of nodes in the second hidden layer was varied from zero to eight and the result is presented in Figure 4.11.

71 1.00E-02

1.00E-03

SSE

1.00E-04

1.00E-05

1.00E-06

1.00E-07 0

1

2

3

4

5 nd

Number of nodes in 2

Figure 4.11

6

7

8

hidden layer

Network performance with increasing number of hidden nodes in

second hidden layer

The additional hidden layer for MLNN can further improve the performance index of the network with an increment of i × j free parameters where i is the number of nodes in the first hidden layer and j is the additional nodes for second hidden layer. Table 4.2 shows the generalization of the network as the number of nodes in the second hidden layer is increased.

72

Table 4.2

Generalization of the network with two hidden layers

Training samples

Test samples

Number of nodes in 2nd

MSE for

MSE for

MSE for

MSE for

hidden layer

estimation of

estimation of

estimation of

estimation of

ωn

ζ

ωn

ζ

0

6.39×10-7

4.86×10-12

7.81×10-4

2.03×10-9

2

1.00×10-7

1.11×10-12

1.00×10-3

1.25×10-9

4

1.08×10-8

2.36×10-13

9.71×10-4

1.69×10-9

6

1.26×10-8

3.95×10-13

1.12×10-4

2.95×10-9

8

3.55×10-9

1.22×10-13

9.92×10-5

4.04×10-9

Although the network performance index can be further reduced with additional nodes in the second hidden layer, the generalization of the network is not significantly improved. This can be showed by comparing the MSE of the estimated values between the training samples and test samples. However, the role of more hidden layer to the network performance when dealing with noisy data input will be studied and addressed in the next section. So far, the training of the network only included the noise free training samples; clean generated signals without any disturbances, to the network.

4.5.7 Effect of noise

Unlike the simulated data that have been used for training and validation of the network, real world data are always associated with noise. For the network to be able to handle the real data corrupted with noise, noisy simulated data were introduced to the network during training. A random white noise with normal distribution was added to the signal generated from the general second order system transfer function as depicted

73 in Figure 4.12. The random noise was generated with randn generator in the matlab function.

Random white noise

ωn2 G(s) = 2 s + 2ζωn + ωn2

ωn , ζ

Figure 4.12

+ +

y

Noise injected to the generated impulse response, y

The level of noise is given in its rms value. The root mean square (rms) is a statistical measure of the magnitude of a varying quantity. For n collection of values, {x1. x2, ..., xn}, the rms value is given by:

x rms =

1 n 2 ∑ xi = n i =1

x12 + x 22 + L + x n2 n

The noise level injected was between 0.02 - 0.2 rms values with the corresponding peak signal-to-noise ratio of 42 – 22 dB. Peak signal-to-noise ratio (PSNR) represents the ratio between the maximum possible power of a signal and the power of corrupting noise that distorts the signal.

Two network properties were studied with respect to the effect of noise; the network hidden layer structure and the number of input nodes to represent the time

74 series amplitude. In studying the optimum hidden layer structure, different network architectures as in Table 4.3 were used and the accuracy on the network estimation results are shown in Figures 4.13 and 4.14. The three cycle’s time series amplitude and its corresponding period were presented with 41 input nodes since in section 4.5.3 has shown that a minimum of 40 nodes is sufficient to represent the time series data.

Table 4.3

Neural network architectures used to study the effect of hidden layer

structure on neural network performance

Results

Neural network structure Number of input

Number of nodes

Number of nodes

nodes

in 1st hidden layer

in 2nd hidden layer 0

6

4 8 0

41

12

4 8 0

20

4 8

Figure 4.13 (a) and Figure 4.14 (a) Figure 4.13 (b) and Figure 4.14 (b) Figure 4.13 (c) and Figure 4.14 (c)

The results in Figures 4.13 and 4.14 show that in general, the additional hidden layer does not improve the network performance in estimating the values of ωn and ζ. The performance is measured in terms of percentage error of the estimated values. Since the additional hidden layer did not play any significant role in improving the network performance, it is sufficient to opt for single hidden layer neural network instead without unnecessarily increasing the network complexity and training.

75 Another network property studied in the effect of noise is the number of input nodes. The higher sampling rate used to sample the time series data, the larger network input size will be, and vice versa. The sensitivity of the sampling rate used on the effect of noise was investigated using single hidden layer networks. The investigation was repeated with different number of hidden nodes as in Table 4.4. The results presented in Figures 4.15 and 4.16 show that the performance of neural network is purely random with respect to the number of input nodes. Therefore, a minimum number of input nodes should be selected to minimize the neural network size.

Table 4.4

Neural network structures for number of input nodes investigation

Neural network structure Number of input

Number of nodes

nodes

in 1st hidden layer

41 51

6

61 41 51

12

61 41 51 61

20

Results

Figure 4.15 (a) and Figure 4.16 (a) Figure 4.15 (b) and Figure 4.16 (b) Figure 4.15 (c) and Figure 4.16 (c)

76

6 hidden nodes in first hidden layer

0.35

12 hidden nodes in first hidden layer

0.35

20 hidden nodes in first hidden layer

0.35

1 hidden layer 0.3

Percentage error (%)

Percentage error (%)

0.25 0.2

0.25

0.25

0.15

Percentage error (%)

0.3

0.3

0.2 0.15

0.1

0.1

0.05

0.05

0 0.05

0.1 Noise rms value

0.15

0.2

Figure 4.13

0 0

0.05

0.1 Noise rms value

0

0.2

12 hidden nodes in first hidden layer

9

7

7

6

6

6 5 4 3

Percentage error (%)

8

7

5 4

0.2

3

2

1

1

0

0 0.15

0.2

8 nodes in 2nd hidden layer

3

1

0.1

4 nodes in 2nd hidden layer

4

2

Noise rms value

1 hidden layer

5

2

0 0

0.05

0.1

0.15

Noise rms value

(b)

Percentage error on the test set for ζ values

0.2

0

0.05

0.1 Noise rms value

0.15

0.2

(c)

76

Figure 4.14

0.15

20 hidden nodes in first hidden layer

9

8

(a)

0.1

Noise rms value

(c)

8

0.05

0.05

Percentage error on the test set for ωn values

Percentage error (%)

Percentage error (%)

0.15

(b)

6 hidden nodes in first hidden layer

0

8 nodes in 2nd hidden layer

0.1 0.05

(a)

9

0.15

0 0

4 nodes in 2nd hidden layer

0.2

77

20 hidden nodes

12 hidden nodes 0.35

0.35

0.3

0.3

0.3

0.25 0.2 0.15 0.1

Percentage error (%)

0.35

Percentage error (%)

Percentage error (%)

6 hidden nodes

0.25 0.2 0.15 0.1

0

0.05

0.1 Noise rms value

0.15

0.2

0.2 0.15 0.1

0 0

0.05

0.1

0.15

0

0.2

0.05

0.1

0.15

0.2

Noise rms value

Noise rms value

(a)

Figure 4.15

61 hidden nodes

0.25

0

0

51 hidden nodes

0.05

0.05

0.05

41 input nodes

(b)

(c)

The percentage error for ωn values for network with increasing number of input nodes

6 hidden nodes

20 hidden nodes

12 hidden nodes

41 input nodes

8

8

7 6 5 4 3

7 6 5 4 3

2

2

1

1

0

0 0

0.05

0.1 Noise rms value

(a)

0.2

51 input nodes

6 5 4

61 input nodes

3 2 1 0

0

0.05

0.1 Noise rms value

(b)

0.15

0.2

0

0.05

0.1

0.15

0.2

Noise rms value

(c)

The percentage error for ζ values for network with increasing number of input nodes

77

Figure 4.16

0.15

10 9 8 7

Percentage error (%)

9

10 9 Percentage error (%)

Percentage error (%)

10

78

4.6

Application of Neural Network

For the application of MLNN in modal parameter identification to estimate the ground vehicles aerodynamic derivatives, the network parameters used in designing the MLNN was based on the properties of the network studied in section 4.5. The estimation made based on the three cycles of the impulse response and the time series amplitude were represented by 40 input nodes. The period of three cycles was represented by the last input node to the network. Thus, the network has a total of 41 input nodes. The neural network was trained with 250 training samples.

The range of the training samples used was based on the conventional identification results where the interested range of ωn value is 2.5 - 26.5 rad/s and the range for ζ is 0.001 - 0.1. The hyperbolic tangent function was used for the hidden layer transfer function and the output layer transfer function was set to pure linear activation function. The amplitude of the time series data were normalized between range of [-1,1] and centered at its zero mean axis. As suggested by Foresee and Hagan (1997), the rest of the input (the period of three cycles) and output (the target values of ωn and ζ ) of the network were scaled into the range of [-1,1]. The scaling of the network input can be carried out using the following formulas:

p n = (UB − LB )

( pi − pi min ) + LB ( pi max − pi min )

(4.10)

where pn is the scaled input, UB and LB is the upper and lower bound respectively, pi is the original input and pi min and pi max is the minimum and maximum value of the p inputs. The output of the network was scaled in exactly the same manner.

A single hidden layer network was used and the selected hidden layer structure was based on the study in section 4.5.7. The results on the training of single hidden layer

79 networks with 41 input nodes in section 4.5.7 are tabulated in Table 4.5. The training results show that the network performance on the training set improves as the number of hidden nodes is increased. The number of effective parameters in the network also increased for the network to sufficiently represent the function. In validating the trained network model with the test set, in general, the percentage estimation error from the smallest network, i.e. the neural network with 6 hidden nodes with higher noise level is the lowest compared to the other network structure as shown in Figures 4.17 and 4.18. The trend is observed in the network with 41 input nodes (a), 51 input nodes (b) and 61 input nodes (c). This shows the overfitting of the larger network where the network overfit the training data and does not generalize as good as the smaller network. Therefore, the recommended hidden layer structure for the MLNN design is a single hidden layer network with 6 hidden nodes and the recommended design for MLNN is summarized in Table 4.6.

Results of neural network training with different number of

Table 4.5 hidden nodes

Number of Neural network structure

effective SSE

parameters/ total number of parameters

41-6-2

0.7011

254/266

41-12-2

0.4274

499/530

41-20-2

0.2534

822/882

80

0.35

0.35

0.3

0.3

0.3

0.25

0.25

0.25

0.2 0.15 0.1

Percentage error (%)

0.35

0.2 0.15 0.1

0

0.05

0.1

0.15

0.2

0.05

0.15 0.1

0.1

0.15

0

0.2

61 input nodes

51 input nodes

9

10 9

8

8

8 7

6 5 4 3

7 6 5 4 3

1

2 1

0

0

2

0.2

6 hidden nodes 12 hidden nodes

6 5 4 3 2

20 hidden nodes

1 0 0

0.05

0.1 Noise rms value

(b)

0.15

0.2

0

0.05

0.1

0.15

0.2

Noise rms value

(c)

Neural network percentage error in estimating ζ values for test set

80

Figure 4.18

0.15

Percentage error (%)

Percentage error (%)

7

(a)

0.2

(c)

10 9

Noise rms value

0.15

Neural network percentage error in estimating ωn values for test set

10

0.1

0.1 Noise rms value

(b)

41 input nodes

0.05

0.05

Noise rms value

(a)

0

20 hidden nodes

0 0

Noise rms value

Figure 4.17

12 hidden nodes

0.2

0

0

6 hidden nodes

0.05

0.05

0.05

Percentage error (%)

61 input nodes

51 input nodes

Percentage error (%)

Percentage error (%)

41 input nodes

81 Recommended input representation and MLNN design parameters

Table 4.6

Input representation

Design Parameters

3 complete cycles and the period of 3

1 hidden layer with 6 hidden nodes

cycles are used as the input for function approximation 40 input nodes to represent the time series

Training algorithm: Lavenberg-Marquadt

data and 1 input node to represent the

with Bayesian Regularization

period of 3 cycles Input and output pairs are normalized so

Training samples: 250 time series

they fall in the range of [-1,+1]

examples

4.6.1 Aerodynamic derivatives estimation procedure

After the network had undergone a proper training process, it can be used for the aerodynamic derivatives identification purpose. The step by step estimation procedure is listed as follows:

1.

The measured response from the wind-off wind tunnel testing was first simulated with the MLNN to obtain the mechanical stiffness and damping and given by:

Kr = ω n2 wind −off I zz

and

Cr = 2ζ wind −off ω n wind −off I zz

respectively.

2.

The overall systems stiffness and damping obtained by simulating the MLNN with the measured response from wind-on test and given by

82 (K r − K a ) = ω n2 wind −on and I zz

3.

(C r − C a ) = 2ζ wind −onω n2 wind −on I zz

Calculate the normalized aerodynamic stiffness and aerodynamic damping using equation:

K − Ka Kr Nˆ β = r − = ω n2 wind −on − ω n2 wind −off I zz I zz

and

C − Ca Cr Nˆ r = r − = (2ζω n ) wind −on − (2ζω n ) wind −off I zz I zz

4.

Convert the normalized aerodynamic stiffness and damping into the form of aerodynamic coefficient derivatives using equation

C nβ =

4.7

2 Nˆ β I zz

ρV Al 2

and

C nr =

2 Nˆ r I zz ρVAl 2

Results and Discussion

The results on training of the selected network architecture are shown in Figure 4.19. The capability of the network to deal with noisy input data was tested by simulating the network with 3 samples of generated impulse response. The response was corrupted with noise and the MLNN capability to handle noisy signals was compared with the conventional method (procedure described in section 3.4). The estimation

83 results for both approaches are listed in Table 4.7 and shows that both methods can

Number of parameters

handle noisy input data with small estimation errors.

0

Figure 4.19

Training results of the recommended MLNN design

84

Table 4.7

The estimation of noisy time response data by neural network and conventional method

Sample 1 Noise level

Sample 2

Conventional

Neural Network

Conventional

Estimation

Estimation

Estimation

(rms

Sample 3 Neural Network

Estimation

Conventional

Neural Network

Estimation

Estimation

ωn

ζ

ωn

ζ

ωn

ζ

ωn

ζ

ωn

ζ

ωn

ζ

7.5

0.012

7.5

0.012

15.5

0.003

15.5

0.003

23.5

0.006

23.5

0.006

0.02

7.33

0.0122

7.50

0.0116

15.36

0.0030

15.48

0.0026

23.39

0.0060

23.55

0.0056

0.04

7.34

0.0121

7.50

0.0116

15.36

0.0030

15.48

0.0027

23.39

0.0060

23.55

0.0055

0.06

7.33

0.0121

7.50

0.0118

15.36

0.0030

15.48

0.0027

23.39

0.0060

23.54

0.0055

0.08

7.33

0.0118

7.50

0.0113

15.36

0.0030

15.47

0.0026

23.39

0.0060

23.53

0.0058

0.10

7.33

0.0118

7.49

0.0121

15.36

0.0029

15.48

0.0028

23.39

0.0059

23.52

0.0059

0.12

7.34

0.0118

7.49

0.0120

15.36

0.0029

15.47

0.0025

23.39

0.0059

23.54

0.0060

0.14

7.33

0.0117

7.51

0.0123

15.36

0.0029

15.48

0.0029

23.39

0.0059

23.55

0.0060

0.16

7.33

0.0117

7.48

0.0118

15.36

0.0029

15.48

0.0030

23.39

0.0058

23.55

0.0055

0.18

7.34

0.0112

7.46

0.0115

15.36

0.0029

15.48

0.0034

23.39

0.0057

23.56

0.0064

0.20

7.33

0.0116

7.53

0.0112

15.36

0.0029

15.46

0.0022

23.39

0.0057

23.51

0.0057

value)

84

85

4.7.1 Simulation of the Network with Measured Data

To account for the uncertainties inherited by the real measured data, the values of

ωn and ζ were estimated from several three cycle responses and the average value of ωn and ζ were used to calculate the aerodynamic derivatives. Through a pilot test for a response from spring K4 (refer Table 3.2) at wind speed of 10 m/s, Figure 4.20 has shown that the values of ωn and ζ vary through out the response. Thus by taking the average values of ωn and ζ is an appropriate step.

ωn = 12.76 ζ = 0.0055 ω = 12.72 n ζ = 0.0064

Figure 4.20

ωn = 12.79 ωn = 12.77 ζ = 0.0064 ζ = 0.0065 ωn = 12.82 ωn = 12.85 ζ = 0.0059 ωn = 12.80 ζ = 0.0062 ωn = 12.86 ζ = 0.0068 ωn = 12.92 ζ = 0.0055 ζ = 0.0063

The variation of ωn and ζ through out the response

The estimation results of the aerodynamic derivatives from neural network method were compared with the estimation results from the conventional method in chapter 3 and tabulated in Table 4.8. The side force derivatives were obtained in the

86 same manner as in section 3.3.4. The estimated aerodynamic derivatives by neural network are slightly different from the estimated aerodynamic derivatives by conventional method. Since the actual derivatives value is remain unknown, a validation technique is needed to validate the results from both techniques.

Table 4.8 Aerodynamic derivatives for side force and yaw moment obtained from conventional method and neural network

Reduced frequency

Neural Network

Conventional method

Yaw moment

Side force

Yaw moment

Side force

derivatives

derivatives

derivatives

derivatives

Cnβ

Cnr

Cyβ

Cyr

Cnβ

Cnr

Cyβ

Cyr

0.13

0.5323

-0.0983

2.8313

0.5413

0.5516

-0.0829

2.8663

0.5638

0.19

0.5798

-0.1102

2.8463

0.3188

0.6294

-0.0968

4.1513

0.3363

0.25

0.6230

-0.1100

3.3963

0.1575

0.5733

-0.0971

3.0425

0.3325

0.39

0.5722

-0.1211

3.1375

0.0162

0.5885

-0.1087

3.3813

0.0538

0.44

0.5473

-0.1175

2.9425

0.0263

0.5604

-0.1080

2.9513

0.1988

0.55

0.6336

-0.1250

3.7413

0.1788

0.6299

-0.1144

4.3438

0.5613

0.66

0.5638

-0.1178

2.6038

0.3500

0.6049

-0.1107

3.6300

0.3063

0.70

0.4823

-0.1167

2.9800

0.3038

0.4768

-0.1175

2.2100

0.1238

For validation of the results obtained from both conventional method and neural network, the following procedure in Figure 4.21 was carried out. From the Cnβ, Cnr, Kr and Cr values that were obtained through the identification process by conventional method and neural network, these coefficients of the aerodynamic derivatives were converted into the form of yaw damping derivatives and yaw stiffness derivatives, Ca and Ka respectively. The system’s transfer function then was simulated using the values from conventional method and neural network with impulse input. Both simulation results were compared with the real measured time series data. The differences between the generated impulse response and the measured impulse response were calculated and shown in Figure 4.22.

87

Cnβ,Cnr, Kr and Cr from Neural Network

Ca, Cr, Ka and Kr from conventional method

Im pulse response data

1 Ca = ρV 2 Al 2Cnr 2

⎡ (Cr − Ca ) ⎤ ⎥ ⎢s + I β ( s) zz ⎦ ⎣ = β (0) ⎡ 2 (Cr − Ca ) ( K r − K a ) ⎤ s+ ⎢s + I I zz ⎥⎦ zz ⎣

Cnβ,Cnr, Kr and Cr from Conventional Method

Measured time series data

Im pulse response data

1 K a = ρV 2 AlCnβ 2

Simulation using the system Transfer Function

Ca, Cr, Ka and Kr from neural network

+

+

mse (neural network) -

mse (conventional method)

Figure 4.21

Validation test for conventional method and neural network

88 3.5 Conventional method

mean sqaured error

3

Neural network

2.5 2 1.5 1 0.5 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Reduced frequency

Figure 4.22

Mean squared error (mse) from both techniques

Although the neural network estimation error is not always below the estimation error of conventional method, neural network showed to be able to estimate the aerodynamic derivatives through modal parameter identification as well. The average error from neural network estimation is lower than the average error from the conventional method. With the estimation by neural network, the derivatives can be obtained directly from the ωn and ζ without having to acquire the time to half amplitude and the damped frequency first.

Figure 4.23 shows the simulated data from the conventional method’s result and neural network results and compared with the real measured data. The data were generated from spring K4 results and both generated data are close to the actual measured response.

89

Exponential decay of peak amplitudes

Figure 4.23

The simulated data compared with the real measured data for

spring K4

4.8

Conclusion

A single hidden layer MLNN has been successfully designed to approximate the natural frequency and damping ratio of second order systems from the impulse response. By using the Lavenberg-Marquadt training algorithm with Bayesian regularization, the training of the network is made possible. The MLNN has been proven to be able to approximate ωn and ζ adequately for the noise free input as well as for the noisy input data. The introduction of the noisy training samples during the training of the network has taught the network to recognize both noisy and noise free inputs.

The generalization of the network has been shown to be dependent on by the size of the training samples and the architecture of the network. All the studies in

90 investigating the optimum design of neural network were conducted in a trial and error manner. For this problem application, one hidden layer is sufficient to achieve the design objectives. Additional hidden layer does not play any significant role to improve the network performance. To ensure that the network has converged properly, it is advisable to train the network with large number of epochs and train a few times with different initial random values of weights and biases.

From the comparative study between the neural network and conventional method, there are some differences in the estimated values of the aerodynamic derivative. The validation test has shown that on average, the estimation error from neural network is lower compared with the conventional method estimation. Furthermore, the steps in calculating the aerodynamic derivatives are reduced and the value of ωn and ζ can be obtained directly from the recorded time response data.

On the other hand, there are some restrictions on the input representation of this designed network. Only the low damped impulse response that could maintain the transient response up to three cycles can be simulated to the network. In the next chapter, a new input representation scheme is proposed where the whole response sampled at appropriate sampling frequency will be presented to the network. But, for the impulse response to be sampled for the whole response, a large number of input nodes will be used to represent them. This will results in a very large network size. Thus, the application of principal component analysis, PCA, will be introduced to the network design for input node optimization.

91

CHAPTER 5

5. NEURAL NETWORK: INPUT NODE OPTIMIZATION USING PCA

5.1

Introduction

The application of Principle Component Analysis (PCA) to optimize the size of input node is introduced in this chapter. PCA is a useful statistical technique that has been widely used in pattern recognition and image compression and is a common technique in finding patterns in data of high dimension (Smith, 2002). With minimal additional efforts, PCA enables reduction of data dimension for a complex data set to reveal the sometimes hidden, simplified structure that often underlie it (Shlens, 2005).

This chapter offers a different approach for input representation of the designed neural network. Time response for t seconds was fed directly to the network instead of selecting only 3 cycles as in chapter 4 (Figure 5.1). This new approach resulted in a complex input pattern needed to be learned by the network. Furthermore, the size of the input node was also increased to adequately represent the time response. However, less data preprocessing were involved before feeding in the data to neural network and the application of the neural network will not be restricted to low damping system.

Normalized amplitude

Normalized amplitude

92

Time (s)

(a)

Figure 5.1

+ t3

ωnt

(b)

Input representation (a) 8 s input data (b) 3 cycles input data and

period of 3 cycles

Increasing the input nodes means increasing the network size and with a complex input pattern to be learned, larger number of hidden nodes are expected needed by the network for a proper input output mapping process. This will increase the computational effort for training the network and to revise the optimal network design parameter during the iterative training process which will become cumbersome.

The PCA technique gives the opportunity of reducing the input dimension by transforming the data into the principal components and only the significant principal components will be selected. The objective is to reduce the dimension of the data while retaining most of its information content. It involved a linear transformation of the original data, X, an m × n matrix where m is the number of individuals and n is the dimension of the data. Let Y be another m × n matrix related to X by a linear transformation P. The re-representation of the data set X can be written as:

93

PX = Y

(5.1)

The new input matrix, Y, were used as the network input node, instead of the actual time series data. The capability of the network to learn from the input features, Y, was studied and the results were compared with the results estimated conventionally.

This chapter is organized into 6 sections. Section 5.2 explains the underlying principle of PCA and the procedures in extracting the principal components from the data. The design of the MLNN with input optimization using PCA is presented in section 5.3. Section 5.4 describes the application with the measured data from the dynamic test and the results are presented and discussed in section 5.5. Section 5.6 will conclude on the application of PCA in optimizing the input node for function approximation by MLNN.

5.2

Principals of PCA

From a lecture note provided by Shalizi (2006) based on O’Hagan (1984), the principal analysis can be reviewed in a simple view as follows. The simplest way to derive the principal components mathematically from a multivariate data set is by finding the projection that maximizes the variance. The data are centered first so that every feature has a mean of zero.

The original p-dimensional feature vectors are to be projected onto a onedimensional line that runs through the origin. The line is specified by a unit vector along r r r r it, w , and then the projection of a data vector xi on to the line is xi ⋅ w , which is a scalar. If the data vectors are put into an m × n matrix X, then the projections are given by Xw, which is an n × 1 matrix. The mean of the projection will be zero since the mean r of xi is zero. The variance then is given by:

94 r r 1 ( xi ⋅ w) 2 ∑ n i 1 = (Xw) T (Xw) n 1 = w T X T Xw n T T X X =w w n = w T Vw

σ w2r =

(5.2)

r r r The unit vector w is constraint to w ⋅ w = 1 , or w T w = 1 so that the σ w2r is

maximized. For this constraint maximizing problem, let the function to be maximize is f(w) where:

f ( w) = w T Vw

(5.3)

and an equality constraint, g(w) = c where

g ( w) = w T w and c = 1.

The constraint equation is rearranged so that its right-hand side is zero, g(w) – c =0. An extra variable, the Lagrange multiplier λ, is added and say u(w, λ) = f(w) + λ(g(w) – c). The u is the new objective function, and differentiates it with respect to both arguments and set the derivatives equal to zero:

∂u ∂f ∂g =0= +λ ∂w ∂w ∂w ∂u = 0 = g ( w) − c ∂λ

95 Thus, by maximizing with respect to λ will give us back the constraint equation, g(w) = c. For the projection problem previously,

u = w T Vw - λ (w T w - 1) ∂u = 2Vw - 2λw = 0 ∂w Vw = λw

(5.4)

Thus, the desired vector w is an eigenvector of the covariance matrix V, and the maximizing vector will be the one associated with the largest eigenvalue, λ.

The covariance matrix, V, is an n × n matrix, so there will be n different eigenvectors that are orthogonal to each other. Since the vectors are orthogonal to each other, their projections will be uncorrelated. The eigenvalues will give the share of the total variance described by each component.

5.2.1 Procedures for PCA

The original data with m individuals and n samples were arranged to form an m × n matrix and labeled as X.

⎡ x11 x12 x13 K x1n ⎤ ⎢x x x K x ⎥ 2n ⎥ X = ⎢ 21 22 23 ⎢ M M M K M⎥ ⎢ ⎥ ⎣ x m1 x m 2 x m3 K x mn ⎦

The covariance matrix, V of matrix X was calculated. The covariance is given by:

96 m

Cov( x j , x k ) =

∑ (x i =1

ij

− x j )( xik − x k )

(5.5)

(m − 1)

m

where x j =

∑x i =1

m

ij

, and j, k = 1, 2, …, n. The covariance matrix, V, has the following

form:

⎡ s11 s12 s13 ... s1n ⎢s s s 23 ... s 2 n V = ⎢ 21 22 ⎢ M M M M ⎢ ⎣ s n1 s n 2 s n 3 ... s nn

⎤ ⎥ ⎥ ⎥ ⎥ ⎦

where sjk is the covariance of variables xj and xk when j ≠ k and the diagonal element sjj is the variance of variable xj when j = k. The following step is called the eigenvalue decomposition where the eigenvalues and eigenvectors of matrix V was calculated and arranged as:

⎡ λ1 0 0 ... 0 ⎤ ⎢ 0 λ 0 ... 0 ⎥ 2 ⎥ eigval = ⎢ ⎢ M M M M⎥ ⎢ ⎥ ⎣ 0 0 0 ... λ n ⎦

and

⎡ w11 ⎢w eigvec = ⎢ 12 ⎢ M ⎢ ⎣ w1n

w21 w22 M w2 n

w31 ... wn1 ⎤ w32 ... wn 2 ⎥⎥ M M ⎥ ⎥ w3n ... wnn ⎦

97 where eigval is the eigenvalues and eigvec is the corresponding eigenvectors. The eigenvalues were arranged in a manner where λ1> λ2> λn. The eigenvectors in the first column belongs to eigenvalue of λ1. The second column of eigenvectors belongs to eigenvalue of λ2 and so on. The components in each column of eigvec are the linear transformation, P. The first principal component, PC1 is then a linear combination of the original variables xm1, xm2, … , xmn with P:

n

PC1 = w11 x m1 + w12 x m 2 + w13 x m3 + L + w1n x mn = ∑ w1 j x mj j =1

that varies as much as possible for the samples, n. Similarly, the second principal component, PC2

n

PC 2 = w21 x m1 + w22 x m 2 + w23 x m3 + L + w2 n x mn = ∑ w2 j x mj j =1

and on the condition that PC2 is uncorrelated with PC1. The third principal component, PC3:

n

PC 3 = w31 x m1 + w32 x m 2 + w33 x m 3 + L + w3n x mn = ∑ w3 j x mj j =1

and also on the condition that PC3, PC2 and PC1 are uncorrelated. Other principal components can be expressed in a similar way. There can be up to n principal components for n variables.

The remaining question left is how many PCs that should be retained in the “new” data that is enough to represent the variations of the original data. The components are chosen in sequence as the best descriptors of the data. Generally, the last few components do not account for much of the variance, and therefore can be ignored.

98

5.2.2 Selection of the principal components

There are three threshold methods that are usually used in selecting the principal components. The first method is by looking at the variance percentage carried by each component eigenvalue. The sufficient percentage of the total variance is suggested as 80 to 90%. If the first component, the one with largest eigenvalue that describes a certain proportion of the total variance, is considered to describe an insufficient percentage of the total, then the second PC is also considered. In combination with the first PC, this will increase the percentage of the total variance of the PCs selected. If the sufficient percentage is still not met, then the next PC in sequence will be considered.

The second method available is known as the Kaiser criterion that chooses only the PCs with eigenvalues ≥ 1. However, it has been argued that a cut-off of 1 retains too few variables and based on simulation studies, it has been suggested that eigenvalues of approximately 0.7 as the correct level for retaining (Gordon et al., 2007).

The third method is using the scree graph or scree plot. In the scree graph, the eigenvalues are plotted against the PC numbers. The principal components that are retained usually are those on the slope of the graph before the decrease of eigenvalues levels off to the right of the plot as shown in the example in Figure 5.2. Although this technique may lead to the inclusion of too many components compared with the Kaiser criterion, the scree graph is often practical for data exploration.

99

Figure 5.2

Scree graph of the correlation matrix. The arrows show the

number of PCs that are probably retained (Gordon et al., 2007).

5.3

MLNN Design

In designing the MLNN, the properties that were investigated are (i) the number of training samples, (ii) the selection of network input, (iii) the number of hidden layer and hidden nodes to be used, and (iv) the effect of noise. In investigating the properties of the MLNN, the same approach as in section 4.5 were used where each property was investigated individually while holding other parameters constant.

5.3.1 Number of training samples

The required numbers of training samples were revised first for it is expected that the training samples required for training the network will increase due to the complex input pattern representation. When training the network in chapter 4, 250 training samples are sufficient for the learning process. However, in this design, more training

100 samples were required for the network to learn properly and exhibit a good generalization property. The investigation started with 500 training samples and increased until the performance and the generalization of the neural network improved.

5.3.2 Input node selection

Following the procedure in section 5.2.1, the time series examples in Figure 5.3 (a) was transformed into its corresponding principal components in Figure 5.3 (b). As can be noticed in Figure 5.3 (b), at higher number of principal components, the variances between the data are very small and almost negligible. These low variance principal components can be eliminated for the following identification process.

(a)

Figure 5.3

(b)

Data transformation into principal components

The PCA transformed of impulse response data were used as the input representation. It is important to assess how many PCs are needed for the network to be able to learn the accurate approximation for the values of ωn and ζ and as stated earlier, the choice of the number of PCs that can best describe the data would be made in sequence. The selection on number of PCs was made based on scree graph method, but

101 using the network performance index in the y axis, rather than the eigenvalues. Neural network was first trained with 10 numbers of PCs as the input nodes and then the network estimation on the test set was measured. Later, the input node size was increased by adding 5 more PCs to the input nodes and the network performance on the test set was measured again. The test was repeated until the number of PCs used reached 30 where the network performance does not show any significant improvement with the added PCs as the network input. The results are shown in Figure 5.5.

5.3.3 Hidden layer and hidden nodes

The effect of the number of hidden layer and hidden nodes used in the network design was investigated by using n numbers of PCs selected as the input nodes. The study was conducted on a single hidden layer network with varying number of hidden nodes. Later, additional hidden layer was added to the network with an increment steps in the number of nodes in the second hidden layer while fixing the first hidden layer. This study is to improve the network performance and ensuring its good generalization property by optimizing the number of nodes and hidden layers used.

5.3.4 Effect of noise

The same procedure as in section 4.5.7 was applied to this MLNN design to include the effect of noise in neural network estimation. The noise levels were given in their rms values and were introduced to the network during the training stage. The trained network was then tested with another set of noisy impulse response and the network performance was studied.

102

5.3.5 Results

From the studied network properties the following results were obtained. The first result is on the study of the optimum number of training samples to be used. Figure 5.4 shows the estimation results from neural network for (a) ωn and (b) ζ, when the network was trained with 500 samples up to 2000 samples. From both two figures, neural network trained with 1000 training samples is capable of giving proper estimation for ωn and ζ for novel impulse response. Although further additional training samples up to 2000 samples can improve the network generalization, the training samples are too rich to represent the whole population and may lead to exhaustion.

Following the study on selection of principal components, the scree graph in Figure 5.5 shows a significant improvement in the network performance index as the number of PCs used as the input node increased from 10 to 30 PCs. The network performance index is given by the sum squared error, SSE. The gradient of the network performance starts to level off when the component number used reaches 20.

However, in order to properly represent the original data, the first two criteria for principal components selection have to be satisfied as well. The first criteria stated that the first n number of components that contributes to the total variance percentage of 80 90% will be selected. For the minimum total percentage of 80%, 32 principal components (Figure 5.6 (b)) with the summation of their variances contributes 80.2% of the total variance were selected. These components already satisfied the Kaiser criteria (Figure 5.6 (a)). Thus, the impulse response in time sequence represented by 160 data points (Figure 5.7) was transformed into principal components in sequence of variance represented by 32 data points (Figure 5.8).

103 (a) Estimation error for ω n

3.00E-03

mean squared error

2.50E-03

2.00E-03

train

1.50E-03

test

1.00E-03

5.00E-04

0.00E+00 0

500

1000

1500

2000

2500

Number of training samples

(b) errorfor forℵζ (a) Estimation Estimation error

4.50E-07 4.00E-07

mean squared error

3.50E-07 3.00E-07 2.50E-07

train test

2.00E-07 1.50E-07 1.00E-07 5.00E-08 0.00E+00 0

500

1000

1500

Number of training samples

Figure 5.4

The revised training samples

2000

2500

104 Network performance

0.4

0.35

sum squared error

0.3

0.25

0.2

0.15

0.1

0.05

0 0

5

10

15

20

25

30

35

Number of Principal Components Used

Figure 5.5

The performance of the network designed with different number

of principal component used as input node

80%

Figure 5.6

(a) The variance in each component number (b) The percentage of

total variance of the component number used

105

ωn = 26.0 ζ = 0.027

ωn = 12.0 ζ = 0.019

Dimension, n

Figure 5.7

Samples of two impulse response in time sequence

ωn = 12.0 ζ = 0.019

Data projection

ωn = 26.0 ζ = 0.027

Principal components

Figure 5.8 components

The transformation of impulse response into their principal

106 Although in Chapter 4 shows that a single hidden layer is sufficient to represent the underlying function of the time series data with its corresponding natural frequency and damping ratio, in this approach the result shows that an additional hidden layer can further improves on the network generalization by producing lower estimation error for the test set. This is due to the complex presentation of the samples of impulse response. The network structures used in the study are tabulated in Table 5.1. The study was conducted using different network structures in the first hidden layer with varying number of hidden nodes in the second hidden layer with 0, 5 and 10 nodes. The network structures with zero hidden nodes in the second hidden layer i.e. single hidden layer networks, are grouped in Set 1. The network structures with 5 hidden nodes in the second hidden layer are grouped in Set 2 and network structures with 10 hidden nodes in second hidden layer are grouped in Set 3. The results of the study are depicted in Figures 5.9 and 5.10.

In estimating the ωn values, the two hidden layer network gives lower percentage error as shown in Figure 5.9 where the percentage error in Set 2 (Figure 5.9b) and Set 3 (Figure 5.9c) are lower than those in Set 1 (Figure 5.9a) and the performance also improved with increasing the number of hidden nodes in the first hidden layer. On the other hand, the same trend is not observed for the estimation of ζ values. The performance of the network is randomly changing with varying network structure but the two hidden layer networks keep lower percentage error of ζ values at higher noise level compared with the single hidden layer network. However, the estimation error for

ζ value is quite high compared with the MLNN design in Chapter 4.

107

Table 5.1

Single and two hidden layers network structure

Set 1: Single hidden layer network Input layer

Hidden layer

Output layer

32

10

2

32

15

2

32

20

2

32

25

2

Set 2: Two hidden layers network Input layer

First hidden

Second

Output

layer

hidden layer

layer

32

10

5

2

32

15

5

2

32

20

5

2

32

25

5

2

Set 3: Two hidden layers network Input layer

First hidden

Second

Output

layer

hidden layer

layer

32

10

10

2

32

15

10

2

32

20

10

2

32

25

10

2

108

Netw ork structure in Set 2

5

5 10 nodes

510 nodes

10 nodes

415 nodes

15 nodes

20 nodes 3

20 nodes

25 nodes 2

25 nodes

4 3 2

percentage error

6

4 15 nodes 3 20 nodes 2

25 nodes

1

1

1

0

0

0 0

0.05

0.1

0.15

0

0.2

0.05

0.15

0

0.2

Netw ork structure in Set 1

25

25 Series1

20

percentage error

0.2

5

Series2

20

15 Series3

10

Netw ork structure in Set 3 30

Series4

10

Noise level (rms value)

(a)

0.15

0.2

15 Series3

Series4

10 nodes 15 nodes 20 nodes 25 nodes

10

0

0 0.1

Series2

5

5

0

25 Series1 20

percentage error

15

0

0.05

0.1 Noise level (rms value)

(b)

Network performance in estimating ζ values

0.15

0.2

0

0.05

0.1

0.15

0.2

Noise level (rms value)

(c)

108

Figure 5.10

0.15

(c)

Netw ork structure in Set 2 30

0.05

0.1 Noise level (rms value)

Network performance in estimating ωn values

30

0

0.05

(b)

(a)

Figure 5.9

0.1 Noise level (rms value)

Noise level (rms value)

percentage error

Netw ork structure in Set 3

6

percentage error

percentage error

Netw ork structure in Set 1 6

109

5.3.6 Discussion

In this approach, the whole impulse response was used as the input to the network. However, this resulted with a very large input node size which is 160 nodes. Thus, through PCA approach, the input node dimension was reduced to 32 nodes only. For this new approach, two hidden layer network can give better estimation of ωn and ζ values instead of a single hidden layer network. The new network architecture is shown in Figure 5.11. Based on the above study, two hidden layer network was used and the network structure selected was 32-20-5-2 where its average estimation error is the lowest compared to the other structures.

Data projection

n1

n2 m11

m12

Principal components

PCA

m12 a1

a2

m52

m120

Input layer

Figure 5.11 network

Hidden layer

Output layer

The application of PCA in reducing the input dimension of neural

110

5.4

Application with the Measured Data

For the application of the designed MLNN to the measured data in estimating the aerodynamic derivatives, the following procedures were followed. First the measured data were resampled according to the Nyquist sampling theorem. Then, the sampled time series data were transformed into its corresponding principal components and feed to the trained neural network. From the estimated values of ωn and ζ of the wind-off and wind-on measured time series data, the aerodynamic derivatives was evaluated.

5.4.1 Resampling of the data

The yaw angle recorded during the dynamic test was highly sampled at 1 kHz sampling rate. For a recorded time of 2 minutes, this resulted into a 120000 recorded data. The raw data was resampled according to the Generalized Nyquist Sampling Rate Theorem (Sinha, 2000; Malinowsky and Zurada, 1993; Song et al., 2005). The response was included up to 8 seconds where the minimum cycle is three.

The Nyquist frequency was calculated based on the highest frequency in the range and selected as 20 Hz. For the 8 s of recorded time series sampled at 20 Hz, the resulting data points to represent the time series is 160 points. In the next section, the dimension size of 160 was reduced by using the PCA technique.

5.4.2 Preprocessing of the input data

The sampled time series data was transformed into its principal components using the transformation matrix P. The dimension of the eigenvectors of the covariance matrix of the original data, y, is 160 × 160. Since only 32 PCs was used as the input

111 nodes, the transformation matrix P has the dimension of 160 × 32 and the principal components of the time series, yPC is given by:

yPC = PT y

(5.6)

5.4.3 Estimation of Aerodynamic Derivatives

The principal component, ypc, was the new input to the network. The same estimation procedure as in section 4.6.1 was carried out and the step by step procedure can be reviewed in APPENDIX D.

5.5

Results and Discussion

5.5.1 The recommended network design

The recommended parameters for the network design are summarized in Table 5.2 based on the findings in section 5.3.5. Two hidden layers was used with 20 nodes in the first hidden layer and 5 nodes in the second hidden layer. The matrix P used to transform the original data into principal components and the scaling factor of the targeted output become the property of the designed network. These transformation matrix and scaling factors will be used to transform future input before feeding to the network and rescaled the produced output.

112

Table 5.2

Recommended MLNN design for input optimization with

PCA

Input representation

Design Parameters

The number of PCs to be included is

2 hidden layers used with the network

set to 32 PCs

structure of [32-20-5-2]

The inputs and outputs were scaled so

Training algorithm: Lavenberg-

that the input range is [-1, +1]

Marquadt with Bayesian Regularization Training samples: 1000 time series examples

The capability of the network in handling noisy input was tested with another samples of generated data where each of the samples were injected with random white noise of rms values of 0.02, 0.1 and 0.2. The test was conducted by fixing the ζ value and varying the values of ωn and the estimated ωn values from MLNN were evaluated. The same procedure was repeated by fixing the ωn while varying ζ values. The results are presented in Figures 5.12 and 5.13. The designed MLNN is capable to accurately estimate the ωn values but the estimation on ζ values is not as good but better for high natural frequencies.

113 Fixed ζ at 0.003

30

Estimated ωn value

25

20 actual value 0.02 noise level

15

0.1 noise level 0.2 noise level

10

5

0 0

5

10

15

20

25

30

Targeted ωn value Fixed ζ at 0.009

Estimated ωn value

30

25

20 actual value 0.02 noise level

15

0.1 noise level 0.2 noise level

10

5

0 0

5

10

15

Targeted ωn value Fixed ζ at 0.025

20

25

30

Estimated ωn value

30

25

20 actual value 0.02 noise level

15

0.1 noise level 0.2 noise level 10

5

0 0

Figure 5.12

5

10

15

20

Targeted ωn value

MLNN estimation on ωn values

25

30

114 Fixed ωn at 3.5 rad/s

0.04

Estimated ζ value

0.035 0.03 0.025

actual value 0.02 noise level

0.02

0.1 noise level 0.2 noise level

0.015 0.01 0.005 0 0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Targeted ζ value Fixed ωn at 15.5 rad/s

0.04

Estimated ζ value

0.035 0.03 0.025

actual value 0.02 noise level

0.02

0.1 noise level 0.2 noise level

0.015 0.01 0.005 0 0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Targeted ζ value Fixed ωn at 23.5 rad/s

0.04

Estimated ζ value

0.035 0.03 0.025

actual value 0.02 noise level

0.02

0.1 noise level 0.2 noise level

0.015 0.01 0.005 0 0

0.005

Figure 5.13

0.01

0.015

0.02

0.025

Targeted ζ value

0.03

MLNN estimation on ζ values

0.035

0.04

115

5.5.2 Estimation of the aerodynamic derivatives

Further, the MLNN was simulated with the measured data from the dynamic wind tunnel test to estimate the aerodynamic yaw moment derivatives, Cyβ and Cnr. The estimation was carried out according to the procedure described in section 5.4. The estimated result then was compared with the results in Chapter 3 and presented in Table 5.3. The spring code can be referred in section 3.3.1

Table 5.3

Results for aerodynamic derivatives from MLNN utilizing PCA

for input optimization compared with the results obtained conventionally

Neural Network with PCA

Conventional method

Reduced

Yaw moment

Side force

Yaw moment

Side force

frequency

derivatives

derivatives

derivatives

derivatives

Cnβ

Cnr

Cyβ

Cyr

Cnβ

Cnr

Cyβ

Cyr

0.13

-0.086

-0.101

-6.480

-0.115

0.552

-0.083

2.866

0.564

0.19

-0.825

-0.031

-5.519

0.380

0.629

-0.097

4.151

0.336

0.25

0.116

-0.111

1.739

0.074

0.573

-0.097

3.043

0.333

0.39

-0.228

-0.056

0.383

0.293

0.589

-0.109

3.381

0.054

0.44

2.646

-0.112

29.979

0.996

0.560

-0.108

2.951

0.199

0.55

1.759

0.076

11.959

4.354

0.630

-0.114

4.344

0.561

0.66

-0.904

-0.077

-19.484

0.480

0.605

-0.111

3.630

0.306

0.70

0.428

-0.115

10.926

-0.065

0.477

-0.118

2.210

0.124

The estimation from the MLNN differs significantly from the conventional estimation. Since in section 5.5.1 shows that the estimation on ωn values are good and it can be expected that the estimation on the Cnβ is good as well. However, the results show otherwise.

116 The incapability of this MLNN design to approximate the aerodynamic derivatives may be due to the presentation of the time series data. Unlike in Chapter 4 where the MLNN was trained based on the standard plot of second order system impulse response and the response cycle is constraint to three. With the whole response included in this new MLNN design, when the signal starts to damp off, the noise become dominant than the signal and thus, neural network models the noise instead of the signal. This new approach does not lead to a robust neural network design.

To prove that PCA is workable for input optimization for MLNN function approximation, PCA was applied to the network design in chapter 4 and 10 PCs were included as the network input. The resulting yaw moment aerodynamic derivatives are tabulated in Table 5.4 and compared to the results in section 4.7.1.

Table 5.4

Yaw moment derivatives estimated by MLNN with and without

application of PCA

Neural Network with PCA Reduced

Spring

frequency

code

Yaw moment derivatives

Network without PCA Yaw moment derivatives

Cnβ

Cnr

Cnβ

Cnr

0.13

K1

0.5723

-0.0989

0.5713

-0.0906

0.19

K2

0.5748

-0.1080

0.5621

-0.1088

0.25

K3

0.6075

-0.1124

0.6011

-0.1181

0.39

K4

0.5777

-0.1189

0.5918

-0.1221

0.44

K5

0.5519

-0.1172

0.5738

-0.1258

0.55

K6

0.6117

-0.1295

0.6274

-0.1327

0.66

K7

0.5879

-0.1179

0.5703

-0.1160

0.70

K8

0.5004

-0.1185

0.5593

-0.1170

117

5.6

Conclusion

The application of principal components analysis to reduce the dimension of network input data has been demonstrated. The neural network shows the ability to learn function approximation based on the input features extracted by PCA. The technique is very useful for offline network training process since in this application, a 160 input dimension can be reduced by 1/5 of its original dimension to only 32 dimensions.

Although it is expected that by using input features as the network input the designed network can handle noisy signal but this network designed appears to be sensitive to noise. Thus, to improve the performance of the network with noisy environment, the network was trained using noisy input data. Based on the generated impulse response from the standard second order system’s transfer function, the network manages to cater noisy signal up to 0.2 rms value.

It was found that the designed MLNN does not perform well when dealing with the real measured time response data. This may be due to the variations in the damping and frequency of the system through out the recorded signal which arises from the uncertainties in the physical system. The network performance might be improved by introducing a higher noise level signal during the training process and using larger network size. However, the use of Bayesian Regularization is not practical for network with up to few thousands of free parameter for it will take a very long training period due to the massive calculation of the Jacobian matrix. Furthermore, the cutoff variance percentage in the selection of PCs can be further investigated since the suggested 80% used was based on the previous study of different system.

The application of PCA to the MLNN design compared with MLNN studied in Chapter 4 shows that the network input size can be reduced by ¼ of its original dimension while maintaining a good approximation property. This implies that the input node dimension of neural network can be optimized using PCA.

118

CHAPTER 6

6. CONCLUSION

6.1

Conclusion

To date, there are very few research work reported on the identification of aerodynamic derivatives of ground vehicle and not many involves system identification process. The identification tools in the area of ground vehicle’s aerodynamic are limited and relatively new to aircraft field. The aim of the current research work in designing an alternative approach to conventional method for identification of ground vehicle’s aerodynamic derivatives has been studied and successfully achieved.

A multilayer feedforward neural network has been successfully designed to estimate the natural frequency and damping ratio based on the recorded impulse response data. The training was done by providing the samples of impulse response generated from a similar system’s transfer function. Two method of input representation were tested; first, the training based on the standard plot of second order system where three cycles of response were included, and second, the training based on the whole 8 seconds recorded impulse response data. It was found that the network can learn much better with a clear and simple input representation of impulse data in the former scheme. Less training samples were needed by the network to maintain good estimation property and have good generalization as well. Furthermore, through

119 this representation, a smaller size of network is sufficient to carry out the estimation task.

Although the universal theorem and Cybenko (1989) has stated that a single hidden layer network is sufficient to represent the seek function, an additional hidden layer can help the network to generalize better when the input representation is complex and tougher for the network to learn, without having to expand on the first hidden layer size. This is essentially advantageous if the network input dimension is considerably large. If the number of input nodes is much larger than the number of nodes in the first hidden layer, then the additional nodes in the second hidden layer will results in a much lower number of additional free parameters than expanding on the nodes of first hidden layer itself. A short training period can be expected for the use of the Bayesian Regularization algorithm on a network with up to few thousands of network free parameters will take a very long training period and will not be efficient.

The network input node dimension reduction can be carried out through the application of PCA. The network can learn to approximate function based on the most prominent principal components of the original data. The PCA technique has been widely used in the area of pattern recognition and in this study, it has shown that the application can be extended to the function approximation area as well. However, although the network estimates the natural frequency and damping ratio based on extracted impulse response features, it still could not eliminate the effect of noise in the input data.

The application of the designed MLNN to the identification of ground vehicles aerodynamic derivatives lead to good estimation results. Thus, the primary objective of this research work has been fulfilled. With the application of MLNN to the experimental measured data, it shows there are variations of damping inherited in the system which suggest the existence of other source of noise, apart from the data

120 acquisition equipment noise which is relatively low. The sources of noise might originate form the wake and buffeting loads which are negligible at lower wind speed.

6.2

Contribution of the Research Work

This research work has made a number of contributions and can be summarized as follows:

(i)

This work has found new application areas for two existing well known method: the multilayer neural network for the application in identification of ground vehicle’s aerodynamic derivatives, and principal component analysis for data dimension reduction in neural network application for function approximation.

(ii)

A new approach for estimation of natural frequency and damping ratio from impulse response data has been introduced with MLNN application. This approach has provided the method of direct estimation without having to go through the intermediate step. The current research work does not intend to replace the conventional method of estimation but rather as an alternative approach and comparison purposes.

(iii) The use of Bayesian Regularization has proved the algorithms ability in training the network to give an accurate estimation and exhibits good generalization property. The available variable termed as the number of effective parameters used acted as an indicator as to when the selected network architecture started to oversized.

(iv) To date, this research has contributed two publications as follows:

121 a. “Application of Neural Network in Estimating the Aerodynamic Derivatives of Simple Ground Vehicle Model” Proceedings of the International Conference on Control, Instrumentation and Mechatronics Engineering (CIM’07). Johor Bharu, Malaysia. (28-29 May 2007). 459464.

b. “Identification of Aerodynamic Coefficients of Ground Vehicles Using Neural Network” Intelligent Vehicles Symposium, 2007 IEEE. Istanbul, Turkey. (13-15 June 2007). 50-55. The paper has been fully reviewed by four reviewers before accepted for publication.

6.3

Recommendations for Future Research

As a novel approach in applying the neural network to identify the ground vehicle’s aerodynamic derivatives and applying the PCA for neural network function approximation, this research work offers few research opportunities for future work:

(i)

Improve the use of principal component analysis. Other data reduction technique can also be investigated for neural network robustness in dealing with noisy input data. Other training algorithm can also be investigated and compared to the current research work in terms of its feasibility and neural network generalization.

(ii)

Since the actual air flow around ground vehicle simple body has not yet being fully understood and the derived equation of motion of the oscillating test rig is still limited to low wind tunnel speed, it would be a great contribution if neural network can be used to model the system rather than second order system and useful data can be generated from the network model and used for training another neural network for identification purpose. Later on, this technique can be

122 extended to a more complex vehicle shape and improve the current available identification technique.

(iii) The estimation of the systems modal parameters by neural network based on its impulse response can be utilized to other similar system. This can be used to measure the consistency of the technique and it can also be further developed for learning on-the-fly which would be advantageous.

(iv) Modeling of neural network based on higher order response and investigates its capability as universal approximator.

123

REFERENCES

Ahmad, S. M., Shaheed M. H., Chipperfield A. J., and Tokhi, M. O. (2002). Non-linear Modelling of a One-Degree-of-Freedom Twin-Rotor Multi-Input Multi-Output System Using Radial Basis Function Networks. Proceedings of the Institution of Mechanical Engineers, 197-208.

Ahmed, S. R., Ramm, G., and Faltin, G. (1984). Some Salient Features of the TimeAveraged Ground Vehicle Wake. SAE Paper 840300.

Atalla, M. J., and Inman, D. J. (1998). On Model Updating Using Neural Networks. Mechanical Systems and Signal Processing.12 (1), 135-161.

Barnard, R. H. (1996). Road Vehicle Aerodynamic Design: An Introduction. Harlow, Essex : Longman.

Baker, C. J., and Reynolds, S. (1992). Wind-induced Accidents of Road Vehicles. Accident Analysis and Prevention, 24(6), 559-575.

Bearman P. W., and Mullarkey S. P. (1994). Aerodynamic Forces on Road Vehicles Due to Steady Side Winds and Gusts. RAeS Vehicle Aerodynamics Conference, 15-16 July, 1994. Loughborough, UK.

Billings, S. A., Jamaluddin, H. B., and Chen, S. (1992). Properties of Neural Networks with Applications to Modelling Non-linear Dynamical Systems. Int. J. Control, 55(1), 193-224.

124 Ceylan, R., and Özbay, Y. (2007). Comparison of FCM, PCA and WT Techniques for Classification ECG Arrhythmias Using Artificial Neural Network. Expert Systems with Applications, 33, 286-295.

Cybenko, G. (1989). Approximations by Superpositions of a Sigmoidal Function. Mathematics of Control, Signal and Systems, 2, 303-314.

Davis, J. P. (1983). Wind Tunnel Investigation of Road Vehicle Wakes. Doctor of Philosphy Thesis, Imperial College of Science and Technology, London.

de Silva, C. W. (2000). Vibration: Fundamentals and Practice, CRC Press LLC: Boca Raton.

El- Badawy, A. A. (2000). Structural Identification and Buffet Alleviation of TwinTailed Fighter Aircraft, Doctor of Philosphy Thesis, University, Blacksburg, Virginia.

Fillipone A. (2003). Unsteady Gust Response of Road Vehicles. Journal of Fluids Engineering, 125(5), 806-812.

Foresee, F. D., and Hagan M. T. (1997). Gauss-Newton Approximation to Bayesian Regularization. Proceedings of the 1997 International Joint Conference on Neural Networks. Houston, Tx. USA, 1930-1935.

Garcia-Velo, J., and Walker, B. K. (1995). Aerodynamic Parameter Estimation for HighPerformance Aircraft Using Extended Kalman Filtering. Journal of Guidance, Control and Dynamics, 20(6), 1257-1259.

125 Garry, K. P., and Cooper, K. R. (1986). Comparison of Quasi-static and Dynamic Wind Tunnel Measurements on Simplified Tractor-Trailer Models. Journal of Wind Engineering and Industrial Aerodynamics 22(2-3), 185-194.

Gordon, K., Howell, S., McGoverin, C., Clark, A., Lam, F., Rades, T., Strachan, C., Destari, P., Crawford, M. (2007). Principal Component Analysis. Unpublished note, University of Otago.

Gupta, M. M., Jin, L., and Homma, N. (2003). Static and Dynamic Neural Networks: From Fundamentals to Advance Theory. Hoboken, N. J.: Wiley-IEEE Press.

Hagan, M. T. and Menhaj, M. B. (1994). Training Feedforward Networks with the Marquardt Algorithm. IEEE Transactions on Neural Networks, 5(6), 989-993. Hassan, A. B. (2002). On-line Recognition of Developing Control Chart Patterns. Doctor of Philosphy Thesis, Universiti Teknologi Malaysia, Skudai.

Hassoun, M. H. (1996). Fundamentals of Artificial Neural Networks. Proceedings of the IEEE, 84 (6), 906. Haykin, S. (1999). Neural Networks A Comprehensive Foundation (2nd ed.). Upper Saddle River, N. J.: Prentice Hall. Hemon P. and Noger C. (2004). Transient Growth of Energy and Aeroelastic Stability of Ground Vehicle. Journal of Comptes Rendus Mecanique. 332(3), 175-180.

Howell J. P., and Everitt, K. W. (1983). Gust Response of a High Speed Train Model. ASME, Fluids Engineering Division, 7, 81-89.

126 Howell, J. P. (1993). Shape Features Which Influence Crosswind Sensitivity. IMechE, Paper C466/036, 43-52.

Hucho, W. H. E. (1997). Aerodynamics of Road Vehicles, SAE International.

Hucho, W. H., and Emmelmann, H. J. (1978). Theoretical Prediction of the Aerodynamic Derivatives of a Vehicle in Crosswind Gusts. SAE, 129-136.

Humphreys, N. D., and Baker, C. J. (1992). Forces on Vehicles in Cross Winds from Moving

Model

Tests.

Journal

of

Wind

Engineering

and

Industrial

Aerodynamics, 41-44, 2673-2684.

Hush, D. R., and Horne, B. G. (1993). Progress in Supervised Network: What's New Since Lippman?. IEEE Signal Processing Magazine, 10, 8-39.

Jones, B. (1991). Design of Experiments. In Pyzdek, T., and Berger, R. W., Quality Engineering Book (pp. 329-387). New York: Marcel Dekker.

Juang, J.-N. (1994). Applied System Identification, Englewood Cliffs, New Jersey: Prentice Hall.

Kay, S. M., and Marple, S. L. (1981). Spectrum Analysis - A Modern Perspective. Proceedings of the IEEE, 69(11), 1380-1419.

Le Good, G. M., and Garry, K. P. (2004). On the Use of Reference Models in Automotive Aerodynamics. SAE International, 375-384.

Levinski, O. (2001). Prediction of Buffet Loads Using Artificial Neural Networks, Victoria, Australia: DSTO Aeroautical and Maritime Research Laboratory.

127 Lewis, E., Sheridan, C., O'FArrell, M., King, D., Flanagan, C., Lyons, W. B., and Fitzpatrick, C. (2007). Principal Component Analysis and Artificial Neural Network Based Approach to Analysing Optical Fibre Sensors Signal. Sensors and Actuators A, 136, 28-38.

Liu, G. R., Lam, K. Y., and Han, X. (2002). Determination of Elastic Constants of Anisotropic Laminated Plates Using Elastic Waves and a Progressinve Neural Network. Journal of Sound and Vibration, 252(2), 239-259.

Liu G. R., and Han X. (2003). Computational Inverse Techniques in Nondestructive Evaluation. Boca Raton, Florida: CRC Press.

Liu, X. Q., Chen, X. G., Wu, W. F., and Zhang, Y. Q. (2006). Process Control Based on Principal Component Analysis for Maize Drying. Food Control, 17, 894-899.

Malinowski, A., and Zurada, J. M. (1993). Sampling Rate for Information Encoding Using Multilayer Neural Networks. International Joint Conference on Neural Networks, Nagoya, 1705-1708.

Manson, R. L., Gunst, R. F., and Hess, J. L. (1989). Statistical Design and Analysis of Experiments. New York: John Wiley & Sons.

Mansor, S. (2006). Estimation of Bluff Body Transient Aerodynamic Loads Using an Oscillating Model Rig. Doctor of Philosophy Thesis, Loughborough University, Loughborough.

Ming-Der, C. (1990). Estimation of Steady and Unsteady Aerodynamic Parameters at High Angles of Attack. Doctor of Philosophy Thesis, Texas A&M University, Texas.

128 Mirapeix, J., Garcia-Allende, P. B., Cobo, A., Conde, O. M., and Lopez-Higuera, J. M. (2007). Real-time Arc-welding Defect Detection and Classification with Principal Component Analysis and Artificial Neural Networks. NDT&E International, 40, 315-323.

Montgomery, D. C. (1996). Design and Analysis of Experiments (4th ed.). New York: John Wiley & Sons.

Nguyen, D., and Widrow, B. (1989). The Truck Backer-Upper: An Example of Self Learning in Neural Networks. Proceedings of the 1989 International Joint Conference on Neural Network, June 1989. IEEE, II-357-363.

Nguyen, D., and Widrow, B. (1990). Improving the Learning Speed of 2-Layer Neural Networks by Choosing Initial Values of the Adaptive Weights. International Joint Conference on Neural Network, San Diego, CA, 21-26.

O'Farrell, M., Lewis, E., Flanagan, C., Lyons, W. B., and Jackman, N. (2004). Principal Component Analysis and Pattern Recognition Combined with Visibile Spectroscopy in the Classification of Food Quality. Sensors, 2004. Proceedings IEEE. 2, 597-600.

O'Hagan, A. (1984). Motivating Principal Components, and a Stronger Optimality Result. The Statistician, 33(3), 313-315.

Passmore, M. A., Richardson, S., and Imam, A. (2001). An Experimental Study of Unsteady Vehicle Aerodynamics. Journal of Automobile Engineering, IMechE, 215(7), 779-778.

129 Raol, J. R., and Jategaonkar, R. V. (1995). Aircraft Parameter Estimation Using Recurrent Neural Networks- A Critica Appraisal. AIAA Atm. Flight Mechanics Conference, 7-9 August. Baltimore, Maryland. (AIAA-95-3504-CP, 119-128).

Richardson, K. A. (2000). Identification of Aerodynamic Coefficients with a Neural Network, Doctor of Philosophy Thesis, Princeton University.

Roffel, B., Vermeer, P. J., and Chin, P. A. (1989). Simulation and Implementation of Self-Tuning Controllers. Englewood Cliffs, New Jersey: Prentice Hall.

Rogers, J. L. (1994). Simulating Structural Analysis with Neural Network. Journal of Computational Civil Engineering, 8(2), 252-265.

Sage, A. P., and Melsa, J. L. (1971). System Identification, Academic Press: New York.

Scanlan R. H. (1997). Some Observation on the State of Bluff-body Aeroelasticity. Journal of Wind Engineering and Industrial Aerodynamics, 69-71, 77-90.

Shaheed M. H. (2005). Feedforward Neural Network Based Non-linear Dynamic Modelling of a TRMS Using RPROP Algorithm. Aircraft Engineering and Aerospace Technology: An International Journal, 77(1), 13-22.

Shalizi C. (2006). More on Principal Component Analysis. Unpublished note, Carnegie Mellon University.

Shlens, J., (2005) A Tutorial on Principal Component Analysis. Unpublished note, Institute for Nonlinear Science, University of Califiornia, San Diego.

130 Sinha, N. K. (2000). Identification of Continuous-time Systems from Samples of Input-output Data: An Introduction. Sadhana, 25(2), 75-83.

Smith, L.I., (2002) A Tutorial on Principal Components Analysis. Unpublished note.

Song, S., Yu, Z., and Chen, X. (2005). A Novel Radial Basis Function Neural Network for Approximation. International Journal of Information Technology, 11(9), 4653.

Sri-Jayantha, M., and Stengel, R. F. (1987). Determination of Non-linear Aerodynamic Coefficients Using the Estimation-Before-Modelling Method. J. Aircraft, 25(9), 796-804.

Stasiak, M., Sikora, J., Filipowicz, S. F., and Nita, K. (2007). Principal Component Analysis and Artificial Neural Network Approach to Electrical Impedance Tomography Problems Approximated by Multi-region Boundary Element Method. Engineering Analysis With Boundary Elements, 31, 713-720.

StatSoft, I., (2007) Electronic Statistics Textbook. Tulsa, OK: StatSoft.

Tran, V. T. (1991). A Calculation Method for Estimating the Transient Wind Force and Moment Acting on a Vehicle. SAE Transactions, 100(6), 477-488.

Watkins, S., and Saunders, J. W. (1995). Turbulence Experienced by Road Vehicles Under

Normal

Driving

Conditions.

Society

of

Automotive

Engineers

International Congress, SAE 950997, Detroit.

Wharington, J. M., Blythe, P. W., and Herszberg, I. (1993). The Development of Neural Network Techniques for the System Identification of Aircraft Dynamics. In The Fifth Australian Aeronautical Conference. September, 1993.

131 Ya, X. Z. (2007). Artificial Neural Networks Based on Principal Component Analysis Input Selection for Clinical Pattern Recognition Analysis. Talanta, 73, 68-75.

Yaacob, M. S., and Jamaluddin, H. and Wong K.C. (2002). Fuzzy Logic Approach for Estimation of Longitudinal Aircraft Parameters. The Auronautical Journal of Royal Aeronautical Society, 585-594.

Yip, C. K., Crolla, D. A., and Horton, D. N. L. (1992). The Influence of Aerodynamic Effects on Car Handling. IMechE, Paper C389/035, 11-23.

Yoshida, Y., Muto, S., and Imaizumi, T. (1977). Transient Aerodynamic Forces and Moments on Models of Vehicles Passing Through Crosswind. SAE Technical Paper series no. 770391. 1-14.

Zhengqi, G., Gang, F., Jun, W., and Seemann, W. (2001). Investigation of the Transient State Steering Stability in Side Winds for a High Speed Vehicle. Int. J. of Vehicle Design, 26(5), 562-572.

Zhu, Q. X., and Li, C. F. (2006). Dimensionality Reduction with Input Training Neural Network and Its Application in Chemical Process Modeling. Chinese Journal of Chemical Engineering, 14(5), 597-603.

132

APPENDIX A

Neural Network The artificial neural network consists of a network of interconnected artificial neurons as depicted in Figure A. 1. This artificial neural network is non-linear statistical data modeling tools that involves a network of simple processing elements (neurons) determined by the connections between the processing elements and element parameters (input (p), weight (w), bias (b)). The artificial neuron can be seen as a conceptual device that receives input from one or more input lines. The strength of the inputs that enter the neuron is varied and determined by its connector’s weight.

bk

p1

w11 w12

p2

w1m

nk

f(nk+b)

neuron

pm Input layer

Figure A. 1

output layer

View of a simplified artificial neural network

a

133 There are two processes involves inside the neuron, the linear combiner and the activation functions. The linear combiner added all the input signals weighted by the respective synapses of the neuron. The activation function, f(⋅), limits the amplitude of the output of the neuron in the permissible amplitude range of the output signal to some finite value. The bias, b, has the effect of increasing or lowering the net input of the activation function, depending on whether the value is positive or negative, respectively. The neuron k can be described mathematically by the two following equation:

m

n k = ∑ wkj p j

(A. 1)

a k = f (nk + bk )

(A. 2)

j =1

And

During the learning process, the value for weights and biases will be iteratively changed until the output of the network matched the desired output. There are two modes of learning process, the supervised and unsupervised learning. Supervised learning is also known as learning with a teacher.

134

APPENDIX B

Bayesian Regularization Bayesian Regularization addresses the problem of generalization. Typically, the performance function minimized by neural network during training process is consists of the sum of network squared error. Regularization adds an additional term; the sum of squares of the network weights, Fw. Thus, the objective function for regularization, Freg, becomes:

Freg = βFe + αFw

(B. 1)

where α and β are the regularization parameters, Fe = V and Fw is the sum of squares of the network weights.

By including the network weights in the performance function, the size of the network is constrained. The network performance will be smooth if the weights in a network are kept small. If αβ, training will reduce the size of weights at the expense of network errors. Thus, it is important in setting the optimal value of the regularization parameters and it can be approached by applying the Bayesian framework developed by MacKay (1992) in determining the optimal setting of the regularization parameters.

Bayes’ rule is a result in probability theory that relates the conditional and marginal probability distributions of random variables. The application of Bayes’ rule in

135 optimizing the regularization parameters of α and β (Foresee and Hagan, 1997) will give:

P (α , β | D, M ) =

P( D | α, β, M ) P(α, β | M ) P( D | M )

(B. 2)

where D represents the data set and M is the particular neural network model used. P(α, β| M) is the prior density that represents our knowledge of the regularization parameters before any data is collected, P( D| α, β, M) is the likelihood function, which is the probability of data occurring, given the regularization parameters, and P( D| M) is a normalization factor, which guarantees that the total probability is 1.

Assuming a uniform prior density for the regularization parameters, α and β, then maximizing the posterior is achieved by maximizing the likelihood function, and solving the likelihood function gives:

P ( D | α, β, M ) =

=

P ( D | w , β, M ) P ( w | α , M ) P(w | D, α, β, M )

Z F (α, β) Z D (β) Z W (α)

(B. 3)

and

Q

⎛π ⎞ 2 Z D (β) = ⎜⎜ ⎟⎟ ⎝β⎠

(B. 4)

N

⎛π ⎞ 2 Z W (α ) = ⎜ ⎟ ⎝α⎠

(B. 5)

136 N

1

Z F ≈ (2π ) 2 (det((H MP ) −1 )) 2 exp(− F (w MP ))

(B. 6)

and MP denotes the minimum point where the gradient is zero. H is the Hessian matrix of the objective function and given by:

H = β∇ 2 Fe + α∇ 2 Fw

(B. 7)

Inserting equation (B.4), (B.5) and (B.6) into (B.3), the optimal values for α and β can be obtained. This is done by taking the derivative with respect to each of the log of equation (B.3) and set them equals to zero that yields to:

α MP =

γ n−γ and β MP = MP 2 Fw (w ) 2 Fe (w MP )

(B. 8)

where γ = N - 2αMP tr(HMP)-1 is called the effective parameters, and N is the total number of parameters in the network. The parameter γ is a measure of how many parameters in the neural network are effectively used in reducing the error function and it can range from zero to N.

For the computation of Hessian matrix of the objective function, Freg at the minimum point wMP, Foresee and Hagan (1997) proposed on the use of Gauss-Newton approximation to Hessian matrix which is readily available if the Lavenberg-Marquardt optimization algorithm is used to locate the minimum point. Thus, the additional computation required for optimization of the regularization is minimal.

137

APPENDIX C

Lavenberg-Marquardt Optimization Technique The application of Marquardt algorithm in backpropagation algorithm to train MLNN is given by Hagan and Menhaj (1994). While backpropagation is a steepest descent algorithm, Lavenberg-Marquardt algorithm is an approximation to Newton’s method. Consider a function of V( x ) to be minimize with respect to the parameter vector x , then the Newton’s method would be

[

]

−1

Δx = − ∇ 2V ( x ) ∇V ( x )

(C. 1)

where ∇ 2V ( x ) is the Hessian matrix and ∇V ( x ) is the gradient. If V( x ) is assigned to be a sum of squares function

Q

V ( x ) = ∑ ei2 ( x )

(C. 2)

i =1

then it can be shown that

∇V ( x ) = J T ( x ) e ( x )

(C. 3)

∇ 2V ( x ) = J T ( x ) J ( x ) + S ( x )

(C. 4)

where J(x) is the Jacobian matrix given by:

138 ∂e1 ( x ) ⎡ ∂e1 ( x ) ∂e1 ( x ) K ⎢ ∂x ∂x 2 ∂x n 1 ⎢ ⎢ ∂e2 ( x ) ∂e2 ( x ) ∂e2 ( x ) K ⎢ ∂x ∂x 2 ∂x n J (w ) = ⎢ 1 ⎢ M M O M ⎢ ⎢ ∂e N ( x ) ∂e N ( x ) K ∂e N ( x ) ⎢⎣ ∂x1 ∂x 2 ∂x n

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦

(C. 5)

and

N

S ( x ) = ∑ ei ( x )∇ 2 ei ( x )

(C. 6)

i =1

For the Gauss-Newton method, it s assumed that S ( x ) ≈ 0 , and the update (C.4) becomes

[

∇x = J T ( x ) J ( x )

]

−1

J T ( x )e ( x )

(C. 7)

The Lavenberg-Marquardt modification to the Gauss-Newton method is

[

∇x = J T ( x ) J ( x ) + μI

]

−1

J T ( x )e ( x )

(C. 8)

The parameter μ is multiplied by some factor, φ, whenever a step would result in an increased V ( x ). When a step reduces V ( x ), μ is divided byφ. Thus, the LavenbergMarquardt method can be thought as a combination of steepest descent and the Gauss Newton and the transition is determined by the μ value (Hagan and Menhaj, (1994)). When the current solution is far from the minimum surface, the algorithm behaves like a steepest descent method: slow, but guaranteed to converge. However, when the current solution is close to the minimum surface, it becomes a Gauss-Newton method and exhibits fast convergence

139 The essence of Lavenberg-Marquardt algorithm is the computation of the Jacobian matrix. For the mapping problem of neural network using the Bayesian regularization, the terms in Jacobian matrix can be computed by a simple modification to the backpropagation algorithm.

From standard backpropagation algorithm, δk is given by:

δ k (i ) ≡

∂Vˆ ∂n k (i )

(C. 9)

and defined as the sensitivity of the performance index to changes in the net input of unit (i) in layer k and:

∂Vˆ ∂Vˆ ∂n k (i ) = k = δ k (i )a k −1 ( j ) k k ∂w (i, j ) ∂n (i ) ∂w (i, j )

(C. 10)

∂Vˆ ∂Vˆ ∂n k (i ) = k = δ k (i ) k k ∂b (i ) ∂n (i ) ∂b (i )

(C. 11)

It can also be shown that the sensitivity satisfy the following recurrence relation T

δ k = F& k (n k )W k +1 δ

k +1

(C. 12)

where

⎡ f& k (n k (1)) 0 ⎢ 0 f& k (n k (2)) F& k (n k ) = ⎢⎢ M M ⎢ ⎢⎣ 0 0

⎤ ⎥ 0 K ⎥ ⎥ O M ⎥ K f& k (n k ( sk ))⎥⎦

K

0

(C. 13)

140 and

df k (n) f& k (n) = dn

(C. 14)

This recurrence relation is initialized at the final layer by:

δ M = − F& M (n M )

(C. 15)

Each column of the matrix in (C.13) is a sensitivity vector which must be backpropagated through the network to produce one row of the Jacobian.

141

APPENDIX D

NETWORK INPUT OPTIMIZATION WITH PCA

2000 training data were used to train a MLNN. The data were sampled at 20 Hz for 8s. The training data were arranged into an m×n matrix form of X with n = 160 dimensions and m = 2000 number of training samples and arranged as in Table D. 1.

Table D. 1 Time

Data matrix for principal component analysis

t1

t2

⋅⋅⋅

tn

1

X11

X12

⋅⋅⋅

X1n

2

X21

X22

⋅⋅⋅

X2n

⋅

⋅

⋅

⋅

⋅

⋅

⋅

⋅

⋅

⋅

⋅

⋅

m

Xm1

Xm2

Training Samples

⋅⋅⋅

Xmn

142 The covariance matrix, S of matrix X is calculated and the eigenvalues and eigenvectors for covariance matrix S is obtained. Based on the criteria selection in section 5.3.2, 32 principal components, PCs will be retained for network input. Using the MATLAB command, transformation matrix P is given by:

P = m (: , 1:32);

(D. 1)

Where m is the eigenvectors of the covariance matrix, S. Before training the network, the input matrix was transformed into its principal component: Xt = PT XT

(D. 2)

The superscript T denotes the transpose of the matrix and Xt is the transformed matrix.

The range of principal components values and the targeted output were restricted to [-1,+1]. The scaling process is done by:

pn =

(1 − (−1)) × ( p − min( p )) + (−1) max( p) − min( p )

(D. 3)

pn is the scaled data and p is the original data. The min and max denotes the minimum

and maximum value of the original data and will be stored as the properties of the designed network and used to transform future input and output. The network then will start the training based on the scaled input, pn, and scaled output, tn. To use the network for estimating the ωn and ζ, the measured time response will be resampled from 1kHz to 20 Hz and the response up to 8s will be considered. The 160 sampled data representing the time response will be transformed into principal components after their mean was removed and the amplitudes were normalized with its maximum amplitude. The transformation is given by:

143 yt = P y

(D. 4)

where yt is the transformed data , y is the original normalized amplitude and P is the transformation matrix determined during the network design process. The yt then was scaled according to the network range ([-1,+1]) using equation (D.3) with min(P) and max(P) are the properties of the network stored during the scaling of the training input data. The scaled yt then simulated with the network with the following MATLAB command:

a = sim (net, yt);

net is the designed network and a is the 2 dimensional output from the network containing the value of scaled ωn and ζ. The actual value of ωn and ζ can be obtained by transforming the network results into its original range by:

The simulation with the network was carried out for both wind-off and wind-on measured data to obtain the mechanical and overall (mechanical + aerodynamic) systems stiffness and damping. The normalized aerodynamic stiffness and damping is given by:

K − Ka Kr Nˆ β = r − = ω n2 wind −on − ω n2 wind −off I zz I zz

C − Ca Cr Nˆ r = r − = (2ζω n ) wind −on − (2ζω n ) wind −off I zz I zz