Implementation of Linear Model Predictive Control using a Field Programmable Gate Array

Implementation of Linear Model Predictive Control using a Field Programmable Gate Array∗ Adam Mills, Adrian Wills, Steven Weller, and Brett Ninness† ...
Author: Noah Tyler
1 downloads 2 Views 2MB Size
Implementation of Linear Model Predictive Control using a Field Programmable Gate Array∗ Adam Mills, Adrian Wills, Steven Weller, and Brett Ninness†

Abstract The paper investigates the design of a field programmable gate array based custom computer architecture solution for implementing model predictive control. The solution employs a primal logarithmic-barrier interior-point algorithm in order to handle actuator constraints. The solution also incorporates practical aspects of a control algorithm including state observation and data sampling. The resulting circuit is profiled by application to a disturbance rejection control problem of a 14’th order lightly damped flexible beam structure with actuator constraints. This is achieved at 2 kHz sampling frequency and with 16-sample prediction horizon.

Keywords: Model Predictive Control, Interior-Point Method, Conjugate-Gradient, Field Programmable Gate Array (FPGA),

1

Introduction

The benefits of model predictive control (MPC), particularly its capacity to address multivariable systems subject to constraints, have led to widespread interest in the approach [13, 14, 16, 18]. Unfortunately, balancing the advantages of the MPC approach, is the difficulty of the associated computational requirements. These arise from the need to solve a constrained optimisation problem within the chosen sampling period. Historically, this has resulted in the benefits of MPC being realised only on systems with relatively slow dynamics, for which the sampling period is commensurately long and provides sufficient time for the associated compute platform to solve the necessary optimisation [13]. More recently, there has been significant expansion in the availability of high performance computing platforms at low cost. In particular, field programmable gate array (FPGA) platforms offering very significant custom computing architecture resources, have ∗

This work was supported by the Australian Research Council through their Discovery Project Program. All authors are with the School of Electrical Engineering and Computer Science, University of Newcastle, Callaghan, NSW 2308, Australia (e-mail: {Adam.Mills, Adrian.Wills, Steven.Weller, Brett.Ninness}@newcastle.edu.au) †

1

expanded rapidly in capability at very modest cost [22]. The control community has noticed this, and has begun a vigorous program of examining how these computer architecture advances can be employed to deliver the advantages of MPC on a wider range of control problems [1, 2, 3]. One line of research, known as “explicit MPC” has examined how the control solution can be decomposed offline into piecewise affine control laws valid over precomputed affine regions [9, 21]. This approach takes advantage of recent advantages in the availability of cheap high density computer memory, capable of storing the potentially large number of affine controllers and state regions [9, 24]. Another line of research has examined how advances in processor speed and/or the potential for implementation of custom designed FPGA-based compute platforms may be exploited [4, 8, 11, 12, 23, 24]. The current paper is a contribution to this latter line of research and employs an online method for solving the MPC optimisation problem. The main contribution of this paper is the development of an FPGA-based custom architecture platform for MPC control based on an primal logarithmic-barrier interior-point approach. A further contribution, is the verification of this control design by application to an experimental apparatus. This complements previous work by the authors where the use of an alternative active-set method for solving the optimisation problem was examined [26]. Examining the relative benefits and tradeoffs involved with these two approaches to MPC implementation is a topic of current interest [11], and it is intended that the work here in combination with [26] contributes to this area. By way of motivation, relative to a standard architecture microprocessor solution an FPGA-based custom architecture solution (such as presented here and in [26]) has two advantages. It can simply and directly be “dropped in” as a computing core in a larger embedded systems design. Alternatively, it can be an advanced initialisation of a custom application specific integrated circuit (ASIC) design. The capabilities of this design are illustrated by profiling its performance in solving a disturbance rejection problem for a 14’th order resonant structure involving a lightly damped flexible beam. The computing platform involves a widely available and modest cost Altera Stratix III EPSL150F115C2 FPGA, clocked at 70 MHz. These results are presented as encouraging evidence that there is clear potential for the benefits of MPC to be more widely applied by taking advantage the current and future advances in the the availability of flexible, cheap and high performance computing hardware. The paper is organised as follows. In Section 2 the problem formulation is discussed. This leads to an MPC optimisation problem and Section 3 details an interior-point algorithm for solving this problem. This algorithm is implemented on an FPGA architecture and the details of this implementation are provided in Section 4. The FPGA design is validated by application to a resonant system in Section 5 and some concluding remarks are provided in Section 6.

2

2

Problem Formulation

This paper considers the control of linear, time invariant, discrete time systems, which are modeled in state-space form, according to xt+1 = Axt + But + wt , yt = Cxt + Dut + vt ,

(1) (2)

Here ut ∈ Rm is the system input (control variable), yt ∈ Rp is the system output, xt ∈ Rn is the state variable, while wt ∈ Rn and vt ∈ Rp are unknown disturbances on the state and output, respectively. The model predictive control (MPC) approach considered here delivers the control variable ut+1|t by solving at time t a constrained optimisation problem of the form u?t = arg min V (ut , xt+1 ) s.t. b` ≤ ut ≤ bu u

(3)

t

where, for some user chosen prediction horizon N , ut , [ut+1|t , ut+2|t , · · · , ut+N |t ].

(4)

In the above, the subscript t + k|t is used to denote a future control action at time t + k, which is based on measurements at time t. MPC operates by using the first element u?t+1|t of u∗t as the control action to be applied at the next time interval. It then moves on to the next time instant, t + 1, and solves (3) again to deliver u?t+2|t+1 as the first element of u∗t+1 , and so on. Accordingly, there is delay of one sample between measurement and control action that is intrinsic to the MPC approach (3) - see [13, Section 2.5] for further discussion of this point. In this paper, the control cost V (ut , xt+1 ) is assumed to have the following quadratic form V (ut , xt+1 ) , uTt Hut + uTt f (xt+1 ) (5) where the matrix H ∈ RN m×N m is assumed to be positive definite, and f (xt+1 ) : Rn → RN m is assumed to have the affine form f (xt+1 ) = Φxt+1

(6)

for some user chosen matrix Φ ∈ RN m×n . The constraints in (3) determined by the upper and lower bound vectors bu and b` (respectively) represent simple bounds on allowed control action. For example, to accommodate physical limits on the actuator movement. The formulation (3) addresses a wide range of commonly encountered control problems [25]. At the same time, this formulation does not cater for more sophisticated constraints such as rate limits on control moves and state constraints. The effect of allowing these more general constraints is that the resulting optimisation problem becomes slightly more difficult to solve. This will be discussed in more detail in Section 3. 3

Importantly, this MPC approach requires knowledge of the future system state xt+k , for k = 1, . . . , N . It must therefore be predicted based on information available at the present time t, and these predictions substituted for xt+k . We will denote this prediction as xbt+k|t . When k = 1, the predictor employed here will be of the linear observer form xbt+1|t = Axbt|t−1 + Bu?t|t−1 + Let , et = yt − C xbt|t−1 − Du?t|t−1 .

(7) (8)

The observer gain matrix L is user specified. In Section 5, it will be selected as the steady state Kalman filter gain. This is a common choice due to the predicted state then being of minimum variance. This minimum variance property can be preserved for k > 1 by employing the predictor xbt+k|t = Axbt+k−1|t + But+k|t ,

for k > 1

(9)

which is “initialised” for k = 2 by using the predictor (7),(8) in the right hand side of (9). In what follows in section 5.4, a prediction ybt+k|t of future system responses will also be required. This can be derived from the predicted state in the obvious fashion ybt+k|t = C xbt+k|t + Dut+k|t .

3

(10)

Proposed Interior-Point Solution

The MPC approach just described requires the on-line solution of the constrained optimisation problem (3). Via the cost specification (5), this is in the form of a strictly convex quadratic program that may be effectively solved using standard algorithms [29]. The two dominant standard approaches in this context are so-called “active-set” and “interiorpoint” methods. The authors have considered the active-set method in related work [10, 25, 26], but in this paper we employ an interior-point technique (see e.g. [31]) to solve the quadratic program in (3). The main reason for choosing an interior-point method in this paper is that each step involves the solution of a linear system of equations with constant dimension. This is in contrast to an active-set solution where a sequence of linear systems are solved and whose corresponding size is related to the number of active constraints. The latter feature of an active-set method complicates the FPGA implementation since a range of matrix and vector sizes must be internally tracked [26]. The interior-point method used here employs a primal logarithmic-barrier approach since it is particularly well suited to the MPC situation considered here. Importantly, for the simply bounded QP problem in (3) it is straighforward to compute an initial primal feasible point, which reduces algorithm complexity. A similar approach is presented in [23] where an inverse barrier is employed instead of the logarithmic barrier used here. From a theoretical perspective, the inverse barrier is not known to lead to a polynomial-time algorithm, whereas the logarithmic-barrier is [17]. If more general constraints are allowed, such as state constraints, then the primal barrier method may require a separate stage that computes an initial primal feasible point. This 4

is not a problem if soft constraints are used for state constraints (see e.g. [20]) since it is always possible to find an iniital primal feasible point. In [8] they adopt a primaldual interior-point method, which can alleviate the need of this separate stage and offers improved numerical robustness. A primal barrier method for solving problem (3) transforms the problem into a class of related problems that do not have explicit constraints. Rather, the constraints are represented via a smooth “barrier” function B(·). This resulting class of problems is typically parametrized by the positive scalar µ as V (ut , xt+1 , µ) , uTt Hut + uTt f (xt+1 ) + µB(ut ),

(11)

with the barrier B(·) of the logarithmic form [5] B(ut ) , −

N m X

ln(bu (i) − ut (i)) + ln(ut (i) − b` (i))

i=1

and with ut (i), b` (i) and bu (i) being the i’th elements of the respective vectors. The utility of writing the problem in the form of (11) is that it becomes a smooth problem that is directly amenable to Newton’s method. Importantly, in the limit as µ → 0 the solution to (11) converges to the solution of (5) (see e.g. [5]). Along this line, the application of Newton’s method to (11) results in the following update for the control action ut ← ut − αH−1 g

(12)

where g and H are respectively the gradient vector and Hessian matrix of the cost function V with respect to ut , and the damping factor α is a positive scalar that ensures a reduction in the cost function. More specifically, for the problem in (11), the i’th element of the gradient vector can be represented as µ µ g(i) = H(i, :)ut + f (i) + − (13) bu (i) − ut (i) ut (i) − b` (i) where H(i, :) refers to the i’th row of H. The Hessian matrix can be expressed as H = H + µD

(14)

where D is a diagonal matrix whose i’th diagonal element is given by D(i) =

1 1 + . 2 (bu (i) − ut (i)) (ut (i) − b` (i))2

(15)

Crucially, the above gradient and Hessian expressions are only true when the variables ut are strictly primal feasible (hence the name primal barrier method), i.e. when they satisfy b` < ut < bu . However, provided that bu (i) 6= b` (i) for all i, it is always possible to find an initial ut that satisfies this, for example ut (i) =

bu (i) − b` (i) . 2 5

(16)

Algorithm 1 Interior-Point Solution Given an initial value for µ, a final value µ , and a value ζ ∈ (0, 1), perform the following steps: 1: Initialise the control action via (16). 2: while µ > µ do 3: Compute the search direction ρ = H−1 g via (13), (14) and (15). 4: Find a scalar α such that b` < ut − αρ < bu . 5: Update the control action via ut ← ut − αρ. 6: Reduce the barrier weighting according to µ ← ζµ. 7: end while Moreover, during the update of the control action in (12), the scalar α can be chosen to maintain strict feasibility. These ideas are made concrete in the following algorithm. The most computationally demanding step in Algorithm 1 involves the solution of ρ = H−1 g, which typically requires O(N 3 m3 ) floating point operations on a serial processor. There are many ways to compute this, but the approach considered here is to employ a conjugate gradient method (see e.g. Section 10 in [7]). Other researchers have also considered this approach within the MPC context [19]. Our rationale for using this approach is that it is an iterative method whose numerical operations are relatively simple. In some cases, it is acceptable to stop the method after just a few iterations. This reduces the overall complexity while often delivering good approximate solutions [19]. To be clear, we are interested in using a conjugate gradient method for solving Hρ = g, which only affects line 3 in Algorithm 1. These ideas are made more concrete in the following full algorithm description.

4

Custom Architecture Implementation

One option for experimenting with and evaluating the preceding MPC approach would be to implement Algorithm 2 on a standard architecture floating point processor [25]. This has the advantage of straightforward implementation via coding in a high level language such as C. There are associated disadvantages in that the required processors are relatively power hungry and result in hardware with a non-trivial physical footprint. This makes MPC implementation in small, portable, battery powered and high volume applications problematic. Motivated by this, the work here investigates a custom architecture design based on a hardware description language (HDL) specification for FPGA implementation. The rationale for this choice is that: 1. It allows the implementation of custom refinements specifically tailored to the MPC control design such as parallelism, pipelining and custom numerical representations;

6

Algorithm 2 Final Interior-Point Algorithm Given a value ν ∈ (0, 1), positive integers K and L, a positive value  > 0 and an initial value for µ > 0 perform the following steps: 1: Initialise the control action via ut (i) = (bu (i) − b` (i))/2 for i = 1, . . . , N m. 2: for j = 1 : K do 3: Compute an initial estimate of ρ by setting ρ(i) = g(i)/(H(i, i) + µD(i)) for i = 1, . . . , N m. 4: Compute the residual r = g − Hρ − µDρ. 5: Set σ = rT r. 6: Set k = 0 and β = 0. 7: while k < L and σ >  do 8: p ← r + βp. 9: w = Hp + µDp. 10: γ = σ/pT w. 11: ρ ← ρ + γp. 12: r ← r − γp. 13: η = σ. 14: σ = rT r. 15: β = σ/η. 16: k ← k + 1. 17: end while 18: for i = 1 : N m do 19: if ρ(i) < ut (i) − bu (i) then 20: ρ(i) = ν(ut (i) − bu (i)). 21: else if ρ(i) > b` (i) − ut (i) then 22: ρ(i) = ν(ut (i) − b` (i)). 23: end if 24: end for 25: Update the control action via ut ← ut − αρ. 26: Update the barrier weighting via µ ← ζµ. 27: end for 2. Via this, a circuit of minimum size and power dissipation while still maintaining speed can be achieved; 3. Modern embedded system designs are increasingly FPGA based, and the resulting HDL specification can then be deployed as a “core” in existing hardware with little or no additional cost or hardware footprint; 4. In applications where power consumption, size, or low price in high volume are a priority, this core can form the basis for a more refined application specific integrated circuit (ASIC) design.

7

With this in mind, this section provides details of our custom architecture design, which in broad terms is illustrated diagrammatically in Figure 1. This illustrates a logical flow

Stratix III FPGA

Observer and linear term calc.

yt Control action to amp (0 - 2.5V)

f (� xt+1|t ) ADC Board

Interface logic

Piezo voltage (0 - 5V)

Interior-point QP solver

u�t+1|t

Figure 1: Top level MPC circuit design wherein at the beginning of a sample interval t, the FPGA obtains new measurements of the system output yt from the A/D and converts them into a custom floating point number format. The state observer then uses these measurements to predict the state xbt+1|t and compute the linear term f (xbt+1|t ) = Φxbt+1|t . Next, the QP is solved to produce a new control action u?t+1|t . Finally, this control action is converted into the correct format for the D/A circuit, which is loaded with the new value at the beginning of the next sample interval. Within this broad structure, the implementation of the sub-blocks is detailed in what follows.

4.1

Interior-Point Implementation

The QP solution via an interior point approach defined in Algorithm 2 is the heart of the implementation, and also the most complex component. A graphical illustration of the 8

design profiled here is presented in Figure 2. To explain it, assume that f (xbt|t−1 ) has been computed by the observer and is available. The QP solver begins operation by taking this value and calculating the associated cost Hessian and gradient. This is done using a parallel and fully pipelined architecture that a column of the Hessian and an element of the gradient are issued at every cycle. The gradient vector calculation involves an inner product, two divides and five additions for each element of the vector. The Hessian calculation requires four divides, and five additions, for each column of the matrix. An important point is that the H matrix is stored in a way that allows access to an entire column in one clock cycle. Further explanation of this design involves the details of how various arithmetic operations are implemented, which will now be explained. 4.1.1

Custom Floating-Point Format and Arithmetic

A fundamental feature of the design is the use of a custom numerical format which via experimentation is tailored to the needs of the control implementation. On the one hand, using a custom floating-point format allows the design of highly optimised floating-point arithmetic units, which will minimise circuit size and maximise speed without unduly compromising control performance. On the other hand, this approach involves significant overhead compared with using a standard floating-point library that, in hindsight, is difficult to justify. To explain the custom floating-point format, a given quantity z is represented according to z = (−1)s × m × 2x (17) where s is the sign bit, m is the nm -bit unsigned fixed-point mantissa that has an explicit 1-bit of integer precision and nm − 1 bits of fractional precision. x is a signed, unbiased, 2’s complement, nx -bit exponent. The explicit one in the mantissa is used as a zero flag. This format is then employed, with customised choices for nm and nx , within the following arithmetic circuits Multiplier The floating-point multiply operation is implemented in one clock cycle. The first part of the operation involves multiplying the mantissas of the operands in parallel with the addition of the exponents. The result of the multiply is twice the length of the operand mantissas, with 2 integer bits and 2 × nm − 2 fractional bits. Since the mantissas have the range [1, 2) the result of the multiply has range [1,4) and requires possibly a shift of one to the right, with the new exponent adjusted accordingly. The double length result is then truncated and packed into the result. Adder The floating-point addition operation is slightly more difficult than the multiply due to the alignment and normalization stages. It is implemented in one cycle, the initial part of the operation is determining which operand has the larger exponent. The mantissa of the operand with the smaller exponent is shifted to the right by the difference in the exponents. 9

u

l

mu_init Mux Initialise x Register

8

x_init x*

mu

Mux x

FLP Divider mu*

H f

Calc p, rhs, d

rhs

p

Mux

Calc rho

d

Inner Product rho

r

Update p rhold

Demux p

Check p

p

Update x

Figure 2: Top level interior-point circuit design The aligned mantissas are added together in parallel and at the same time they are fed into a lead-zero anticipator to determine the shift amount required to normalise the result. The result of the addition is shifted so there is a leading one in the 1-bit integer spot of the mantissa and the larger exponent of the operands is adjusted accordingly and both are 10

stored in the result. Divider The floating-point divide operation is implement in two clock cycles. The first cycle computes the inverse mantissa of the divisor by using the mantissa as an index to a lookup-table (LUT). This is feasible in this case because the reduced precision format allows reasonable size LUTs. The second cycle multiplies the inverse divisor mantissa by the mantissa of the dividend, in parallel with the subtraction of the divisor exponent from the dividend exponent. The normalization stage is identical to that of the floating-point multiply. Multiply Accumulate This operation is implemented in two clock cycles and computes a + (b × c). The first cycle is identical to the general floating-point multiply without the normalization and truncation of the result and gives the result of b × c. The results of this multiplier are stored in registers along with the a operand so that the operation is pipelined and new operands can start the multiply stage. The second stage functions the same way as a general floating-point add, only using the double length operands. Fused Multiply Add This operation is implemented in two clock cycles and computes (a × b) + (c × d). It is basically the same as a multiply accumulate only the first stage consists of two multiplies in parallel. Again this operation is pipelined by registers placed between the addition and multiply stages, enabling a result to be obtained every clock cycle once the pipeline has been filled. Inner Product A vital part of the interior-point algorithm involves inner products of various vectors. To ensure that these operations complete in a timely manner, the inner product circuit is implemented in parallel wherever feasible. In particular, Figure 3 shows the inner product circuit used in the design for bi = Ai x, where Ai is a row vector and x is a column vector. The dimension of the inner product block is sixteen, due to both the state dimension and prediction horizon. This means that the circuit consists of 5 pipelined stages. It then takes 5 cycles to fill the pipeline, after which a result will be issued every cycle. This structure naturally allows for matrix-vector multiplications by adding a pipeline around the inner product. Each clock cycle allows a new row of the matrix A to be passed into the inner product circuit and each cycle after the circuit delay, it will provide bi . This feature was heavily exploited in the design. Due to this pipeline an entire matrix vector operation can be produced in 21 cycles. 1. The first stage of the pipeline consists of a bank of parallel multipliers for calculating the product of the mantissas and adders for summing the exponents. The results are then clocked into parallel registers. 11

2. The second stage of the inner product involves finding the maximum of the new exponents and shifting the double length multiply result mantissas so they are aligned. The shifted mantissas are then added in pairs, resulting in 8 new mantissas. Again the results are clocked into parallel registers. 3. The third stage adds the 8 mantissas in two sets of 4 so that there are two left. The results are then clocked into parallel registers. 4. The fourth stage adds the final 2 mantissas together. The results are then clocked into parallel registers. 5. The fifth stage normalises the result, adjusts the maximum exponent which was calculated in the second stage and store them in the result. Ai,1 x1

Ai,2 x2

FLP Multiplier

FLP Multiplier

Register

Register

FLP Adder

Ai,n-1 xn-1 Ai,n xn

...

...

FLP Multiplier

Register

Register

FLP Adder

Register

Register

. . .

FLP Multiplier

. . .

...

Register

Register

FLP Adder

Register

bi

Figure 3: Inner product circuit. In the computation of f (xbt+1|t ) a modified inner product circuit is used that has an extra stage that allows for the accumulation of previous results. This allows inner products of dimension greater than 16 to be calculated using the same hardware with a slight time penalty.

12

4.2

Observer Implementation

Substituting (8) into (7) in the observer equations delivers xbt+1|t = Axbt|t−1 + Bu?t|t−1 + Lyt − LC xbt|t−1 − LDu?t|t−1 = (A − LC)xbt|t−1 + (B − LD)u?t|t−1 + Lyt

(18) (19)

which is the matrix-vector computation 

xbt+1|t =

h



xbt|t−1  ? (A − LC) (B − LD) L  ut|t−1   yt i

(20)

and this consists of many repeated inner products. Through analysis and experimentation, it was determined that the observer calculations required a much larger precision than that of the rest of the circuit. This would mean a longer propagation delay through all floatingpoint operations, and in turn this would mean that the system would have to be clocked slower. However the observer can be decoupled from the f calculation as is shown in Section 4.3. This means that the larger precision can be used, and the hardware requirements can also be reduced by calculating the estimated state in a more serial method, because the timing requirements can be relaxed. Taking these requirements into account, the inner product hardware block was not used, instead the main components of this block are a multiply accumulate unit, a ROM containing the matrix and a controller running the inner and outer loop. At the beginning of each sample the observer is enabled, and it runs in the background while the rest of the algorithm runs to calculate a new estimated state. Within the inner loop an element from a row of the matrix is read and multiplied with the corresponding element of the state vector, the result is accumulated with the running total of previous elements of the row and vector. The outer loop of the controller increments the index of the row of the matrix that is being processed. The observer is clocked at 40MHz, and takes 1500 cycles (37.5 µs) to compute the next state.

4.3

Calculation of f (xb t+1|t )

Substituting (19) into (6) yields 

f (xbt+1|t ) =

h



xbt|t−1  ? Φ(A − LC) Φ(B − LD) ΦL  ut|t−1   yt i

(21)

which again is a matrix-vector operation. However, since the calculation of f (xbt+1|t ) is in the critical path of the circuit it needs to be computed as fast as possible. To this end, a pipelined inner product hardware block is used. The modified inner product block that accumulates a result is used. It takes 6 cycles to fill the pipeline and a result is issued every 2 cycles, so it takes 38 cycles to calculate f . Due to the smaller precision used, this circuit is clocked at 70MHz. 13

5

Experimental Results

The performance of the custom architecture MPC design just summarised is illustrated in this section via the control of a lightly damped resonant structure.

5.1

Apparatus Description and Control Objective

The experimental setup comprises a uniform aluminium beam, clamped at one end, and free at the other, as illustrated in Figure 4. It is a representation of many systems encountered in the field of active vibration control [6]. The beam is 970mm in length, 5mm in thickness, and 25mm in width. Control and disturbance forces may be applied via Physik Instrumente PIC151 piezoelectric ceramic transducers which are 70mm in length, 25mm in width, and 0.25mm in thickness. The transducer centers are mounted 105mm (control) and 195mm (disturbance) from the clamped base. They are activated by 200V PDL200 high voltage amplifiers. These induce lateral control ut and disturbance dt beam displacements that are proportional to the applied voltage. The resulting displacement yt that occurs 105mm from the base is proportional to the mechanical strain at that point, and this is measured by buffering and acquiring the induced open-circuit voltage of a further piezoelectric transducer mounted there, on the other side of the beam to the actuation transducers. The control objective is a disturbance rejection one. Namely, the control action ut is to be used to minimize displacement yt resulting from the disturbance dt . The supply rail limits of the voltage amplifiers imply hard constraints on the control authority ut , which should be respected in the control design.

HV Amp

u

d

HV Amp

y

Buffer

Figure 4: Left: Photograph of experimental apparatus; Right: Plan view schematic of the experimental apparatus.

5.2

Apparatus Model

The MPC strategy considered in this paper is dependent on a model for the system to be controlled. For the flexible beam apparatus just described, this model may be obtained 14

by first principles physical laws [6]. The success of this approach depends on very careful and accurate physical measurement. Hence here we elect to obtain the model via system identification techniques. For this purpose, the frequency response between the actuations ut , dt and displacement yt was measured at 3201 non-equidistant points in the range 1–500Hz. These were used together with the subspace-based identification method developed in [15] to provide an initial n = 14’th order state-space system estimate of the form ξt+1 = Ap ξt + Bp ut + Bd dt , yt = Cp ξt + Dp ut + Dd dt + et "

dt et

#

"

∼N

0 0

# "

,

σd2 0 0 σe2

(22) (23)

#!

(24)

where the subscript “p” denotes “plant” and Ap ∈ R14×14 . This model is then used as an initialisation that is further refined to deliver a final maximum-likelihood estimate using the techniques developed in [30]. This dual stage approach was implemented using the freely available system identification toolbox [27, 28]. The frequency response of the resulting model from ut to yt is shown as the solid line in Figure 5, which can be compared to the measured frequency response shown as a dash dot line. The close agreement suggests accurate modeling. Similar comments apply to the modeling from disturbance dt to yt illustrated in Figure 6. This model is now augmented due to some practical considerations. The first is that the buffer amplifier connected to the piezoelectric sensor can induce a constant offset of the displacement measurement yt . Ignoring this issue will result in a MPC solution with artificially high DC gain. We address this by augmenting the estimated dynamics model (22),(23) so as to induce integral action in the MPC strategy. This is achieved by adding a new state ζt as follows "

ξt+1 ζt+1

#

"

=

Ap 0 0 1 "

+ yt =

h

#"

Bd 0 0 1

Cp 1

i

"

ξt ζt

#

#"

dt µt

ξt ζt

"

+

Bp 0

#

ut

#

,

(25)

#

+ Dp ut + Dd dt + et

(26)

where µt is the noise associated with the unknown DC component from the buffer, which is modeled as µt ∼ N (0, σµ2 ). (27) The second important modeling consideration for this application is that of high frequency modes not captured by the description (25),(26). Any high frequency control action that excites such modes will have a devastating effect on control performance. This can be addressed by penalizing control action at any frequency above that of the highest modeled 15

Estimated system: input 1 to output 1 30

20

10

Magnitude (dB)

0

−10

−20

−30

−40 System response Estimated SS model via EM −50

1

2

10

10 Frequency (Hz)

Figure 5: Magnitude (dB) ut to yt frequency response of identified beam model (solid line) versus measured frequency response (dash dot line) modes shown in Figure 5 and 6. This is achieved via a standard technique involving augmenting the model (25),(26) to deliver a new signal uhf t , which is a high-pass filtered hf version of ut . The purpose of this is that ut may then be included as part of the penalty term V for the MPC action (3). In this paper, a state space model ηt+1 = Af ηt + Bf ut uhf t

= Cf ηt + Df ut

(28) (29)

was computed to correspond to an eighth order Butterworth high-pass filter with 3dB cut-off point at 500Hz. Adding this to the model (22), (23) delivers a final augmented model xt+1 = Axt + But + wt , zt = Cxt + Dut + vt 16

(30) (31)

Estimated system: input 2 to output 1 20

10

0

Magnitude (dB)

−10

−20

−30

−40

−50

−60

−70 System response Estimated SS model via EM −80

1

2

10

10 Frequency (Hz)

Figure 6: Magnitude (dB) dt to yt frequency response of identified beam model (solid line) versus measured frequency response (dash dot line) where  

ξt

xt =

"

ηt





Bd 0 " # dt  wt =  0 1 ,  µt 0 0 

"

(32) #" #

Dd 1 vt = 0 0





Ap 0 0  A=  0 1 0 , 0 0 Af "

#

y zt = hft , ut

   ζt  ,

#

"

(34)

#

Dp D= . Df 17

(33)



Bp  B=  0 , Bf

C 1 0 C= p , 0 0 Cf

dt , et

(35)

The above model has 23-states comprised of 14 for the beam dynamics, 8 for the high-pass filter and 1 for the DC component. Finally, it is important to address the fact that due to the finite 200V amplifier supply rails, this linear model is only valid for input amplitudes satisfying |ut | ≤ 0.5Volts. This is modeled via the constraint for Γut+1 ≤ b(xt+1 ) used in the MPC formulation (3) with the choices " # h i IN Γ= , b(·) = 0.5 12N , b0 = 0, Ψ = 0. (36) −IN In the above, IN denotes the N × N identity matrix and 12N is used to denote a 2N × 1 column vector with all entries equal to 1.

5.3

Observer Design

An essential use for the model just derived is the computation of an estimate xbt+1|t of the system state xt+1 via an observer of the form (7). This involves choosing the observer gain L in (7), and in this paper we use the steady state Kalman gain. This approach depends on the state and measurement noise in the model (30),(31) obeying the Gaussian distribution "

#

"

Q o So wt ∼N 0, SoT Ro vt

#!

.

(37)

In the above, the subscript “o” denotes observer. Considering the disturbance models (24),(27) the covariance matrices in (37) are given by 

T





# Bd 0 Bd 0 "    σd 0  Qo =  0 1  0 1 , 0 σµ 0 0 0 0 

(38)



#" #T Bd 0 "   σd 0 Dd 1 So =  0 1 , 0 0 0 0 0 0 "

#"

Dd 1 Ro = 0 0

σd 0 0 σe

#"

(39)

#T

Dd 1 0 0

.

(40)

The steady state Kalman gain matrix L is then given as 

L = AΣC T + So



CΣC T + Ro

−1

,

(41)

where Σ is computed as the solution to the discrete algebraic Riccati equation (DARE) 



Σ = Qo + AΣAT − L CΣC T + Ro LT .

18

(42)

5.4

Quadratic Program Formulation

Central to the MPC strategy is the formulation of the cost function V employed in (4). In this application, it is chosen as V (ut , xt+1 ) =

N  X

T zt+k Qzt+k + uTt+k|t Rut+k|t



k=1

+ xTt+N +1 P xt+N +1 .

(43)

In this expression, the symmetric and positive definite matrices P, Q and R can be considered as controller tuning parameters that will be further commented on shortly. The penalty term involving zt+k is included in (43) for two reasons. To explain them, recall that zt+k is defined in (32) to be a two element vector composed of the beam displacement yt+k and a high pass filtered version uhf t+k|t of the control action. Therefore, the penalty on zt+k involves a penalty on the yt+k component, which is chosen to reflect that we are seeking to solve a disturbance rejection problem - namely that beam displacement yt be as little affected as possible by an external disturbance dt . Similarly, the penalty on the uhf t+k|t in zt+k is imposed to limit the bandwidth of the control action in the interests of not exciting unmodeled high frequency modes. The penalty term involving ut+k|t in (43) is included to allow tuning of the control input energy. Finally, the penalty term involving xt+N +1 is included here to ensure nominal stability by choosing the matrix P as the positive definite solution to the following DARE [14] 



P = C T QC + AT P A − K B T P B + R + DT QD K T , 

K = − AT P B + C T QD



B T P B + R + DT QD

−1

.

While this type (43) of cost formulation is of a common sort in MPC applications, it involves future responses yt+k and future states xt+k , which are unknown at time t when the control ut+1|t is to be computed. Here we employ a standard approach to this problem by using predicted values ybt+k|t and xbt+k|t in place of yt+k and xt+k . These may be computed using the observer mentioned in the previous section, and defined via (7)-(9). In turn, due to the linear structure of this observer, it may be expressed via the following linear algebra formulation zt+1 = Λxbt+1|t + Πut , (44) where





              zb  t+N |t  xbt+N +1|t

       



zt+1 ,

zbt+1|t zbt+2|t .. .

Λ,

19

C CA .. .



      N −1  CA 

AN

(45)

and



Π,



D CB CAB .. .

         

D CB D ...

CAN −2 B · · · AN −1 B ···

· · · CB D · · · AB B

     .    

(46)

In this case, the cost function V defined in (43) with predicted values substituted as appropriate can be expressed in the required form (5),(6) as V (ut , xbt+1|t ) = uTt Hut + uTt f (xbt+1|t ) + c

(47)

where c is a constant term, H and f are given by ¯ + R, ¯ H , ΠT QΠ

f (xbt+1|t ) , 2Φxbt+1|t ,

¯ Φ , ΠT QΛ

(48)

¯ R ¯ are block diagonal matrices given by and Q,  

¯,  Q  





Q ..

  ,  

. Q



¯,  R 



R R ..



. R

P

5.5

  .  

Achieved Control Performance

We now profile the performance of the custom architecture MPC controller described in Sections 3 and 4 on the apparatus described in section 5.1 using the model and control design just presented in sections 5.2 to 5.4. More specifically, we used the following choices for the state and measurement noise covariances employed in (38)–(40) σd = 10−6 ,

σµ = 1,

σe = 10−12 ,

(49)

and the MPC cost weighting matrices Q and R from (43) were chosen as "

#

1 0 Q= , 0 100

R = 1.

(50)

The disturbance dt was chosen as a periodic linear swept sine-wave signal (chirp) with a period of 20 seconds starting at 1Hz and finishing at 500Hz. The amplitude of dt was set at 2.0 Volts, which was chosen to ensure that the input ut would encounter the constraint limits ±0.5 Volts. The custom architecture design was implemented on an Altera Stratix III EPSL150F115C2 FPGA, interfaced to an A/D, D/A board in order to measure system responses and to output control action. The control sample rate was set at 2 kHz which is approximately five 20

times the highest frequency mode we are seeking to control. The prediction horizon was selected as N = 16 (which corresponds to an 8ms horizon in realtime). This value was chosen because the hardware was easier to design for prediction horizons that are a factor of 2. At the same time, early synthesis results indicated that a prediction horizon of N = 32 resulted in a circuit that would violate the chosen 500µs control interval. Prediction horizons between these two values can be accommodated, but for the purposes of verifying the controller design, this was not deemed to be important. For the results show here, the custom floating-point number format was selected to have nm = 9 mantissa bits and nx = 8 exponent bits. This choice was based on simulation studies using a custom designed bit-accurate software library to emulate Algorithm 2. Using this software, we were able to validate the efficacy of the algorithm at the chosen precision. It is likely that even fewer mantissa bits could be used, be this was not tested on the hardware. At the same time, the objective here is to validate the VHDL design rather than determine the tradeoff between floating-point precision and controller performance. For this choice, the effective machine precision is approximately equal to 2 × 10−3 , which is the value employed for  in Algorithm 2. The remaining parameter values for Algorithm 2 were selected as follows. The initial µ value in (11) was chosen as µ = 128 and the scaling for µ in step 26 of Algorithm 2 was selected as ζ = 1/2. This means that after K = 16 iterations of the interior-point method, µ = 1.53 × 10−5 . The multiplier ν in steps 20 and 22 of Algorithm 2 for retracting from the constraint boundaries was chosen as ν = 0.95 to ensure that the iterates remain strictly primal feasible. Finally the number of allowed iterations L used in the conjugate gradient method was selected at L = N m = 16. Under the assumption of exact arithmetic, this is theoretically sufficient to provide an exact solution. However, the effect of rounding errors introduced by employing finite precision floating-point arithmetic can be dramatic and this is an area that requires further attention. With these choices made, the hardware requirements of the resulting circuit design are shown in Table 1. The final circuit was clocked at 70MHz and the number of clock cycles required per control interval is summarized in Table 2. From this table it can be seen that the control action is computed well inside the 500µs limit required to achieve a 2kHz sample interval. While these numbers are important for the current application, it is difficult to comment on the general computational requirements since they depend so intimately on the compiler efficiency and the control problem being solved. Combinatorial ALUTs 19719 / 113600 (17%)

Memory ALUTs Total Logic Utilization 768 / 56800 (1%) 22%

Table 1: FPGA logic usage for the chosen number format with values nm = 9 mantissa bits and nx = 8 exponent bits. Figures 7 and 8 show different sections of the open (grey outer envelope) and closedloop (blue inner envelope) responses. Figure 7 shows one full scan comprising 20 seconds of data while Figure 8 shows only 2.0 seconds in order to more clearly show the input signal. 21

Operation Number of Clock Cycles A/D conversion 6 f calculation 44 Update state 1495 Solve QP 10340 Pre-load D/A 4 Total 10390

In Critical Path Yes Yes No Yes Yes –

Time 85.7 ns 628.6 ns 21.4 µs 147.7 µs 57.1 ns 148.5 µs

Table 2: Shows the operation being performed (in the order they are computed on the FPGA) and the number of required cycles to complete it. Note that the oberver calculations are performed in parallel and are therefore not part of the critical path.

It can be seen from the top plots in each Figure that the beam vibrations due to the chirp disturbance have been significantly reduced. The bottom plots in each Figure illustrate that the controller is “hitting” constraint limits in order to achieve the reduced vibrations. This demonstrates the efficacy of the custom architecture based MPC solution proposed in this paper.

6

Conclusion

This paper demonstrates the use of custom computing architectures for the purpose of implementing a model predictive control (MPC) algorithm. The approach proposed here is a full solution in that it incorporates A/D and D/A connectivity, a state observer, and an interior-point method for solving the MPC optimisation problem. Importantly, the design is capable of computing the required control action in less than 500µ seconds while still offering good control performance. Acknowledgement The authors would like to sincerely thank Dr. Andrew Fleming for his generous assistance in the building and commissioning of the experimental apparatus used in this paper.

References [1] Implementation of optimization-based control on a chip. Regular Session at the American Control Conference, June 14-16 2006. Minneapolis, Minnesota USA. [2] Model predictive control for fast nonlinear systems: existing approaches, challenges, and applications. 1-day pre-congress workshop at the 45th IEEE Conference on Decision and Control, December 12, 2006. San Diego, CA, USA. [3] Invited session: Hardware implementation of model predictive control. European Control Conference, Budapest, Hungary, August 2009. 22

Measured output (Volts)

2 1 0 −1 −2 0

2

4

6

8

0

2

4

6

8

10

12

14

16

18

10 12 Time (seconds)

14

16

18

1

Control action (Volts)

0.5

0

−0.5

−1

Figure 7: Comparison of open-loop versus closed-loop control. Top: measured output response where the grey line indicates open loop response. Bottom: control action for closedloop response with input limits shown as red dashed lines. [4] G. Constantinides. Tutorial paper: Parallel architectures for model predictive control. In Proceedings of the European Control Conference, Budapest, pages 138–143, 2009. [5] A. V. Fiacco and G. P. McCormick. Nonlinear Programming: Sequential Unconstrained Minimization Techniques. John Wiley & Sons Inc., New York, 1968. [6] C. Fuller, S. Elliott, and P. Nelson. Active Control of Vibration. Academic Press, 1996. [7] G. Golub and C. V. Loan. Matrix Computations: third edition. Johns Hopkins University Press, 1996. [8] J. L. Jerez, G. A. Constantinides, and E. C. Kerrigan. An FPGA Implementation of a Sparse Quadratic Programming Solver for Constrained Predictive Control. In FPGA ’11 Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, page 2011. 23

Measured output (Volts)

2 1 0 −1 −2 10.6

10.8

11

11.2

11.4

11.6

11.8

12

12.2

12.4

10.6

10.8

11

11.2

11.4 11.6 Time (seconds)

11.8

12

12.2

12.4

1

Control action (Volts)

0.5

0

−0.5

−1

Figure 8: Zoomed in comparison of open-loop versus closed-loop control. Top: measured output response where the grey line indicates open loop response. Bottom: control action for closed-loop response with input limits shown as red dashed lines. [9] T. Johansen, W. Jackson, R. Schreiber, and P. Tondel. Hardware synthesis of explicit model predictive controllers. IEEE Transactions on Control System Technology, 15(1):191–197, January 2007. [10] G. Knagge, A. Wills, A. Mills, and B. Ninness. Asic and fpga implementation strategies for model predictive control. In Proceedings of the European Control Conference (ECC), August 2009. [11] M. Lau, S. Yue, K. Ling, and J. Maciejowski. A comparison of interior point and active set methods for fpga implementation of model predictive control. In Proc. European Control Conference, Budapest, August 2009. European Union Control Association. [12] K. Ling, S. Yue, and J. Maciejowski. A FPGA Implementation of Model Predictive Control. In Proceedings of the 2006 American Control Conference, pages 1930–1935, Minneapolis, Minnesota, USA, June 14-16 2006. 24

[13] J. Maciejowski. Predictive Control with Constraints. Pearson Education Limited, Harlow, Essex, 2002. [14] D. Mayne, J. Rawlings, C. Rao, and P. Scokaert. Constrained model predictive control:Stability and optimality. Automatica, 36(6):789–814, 2000. [15] T. McKelvey, H. Ak¸cay, and L. Ljung. Subspace-based multivariable system identification from frequency response data. IEEE Transactions on Automatic Control, 41:960–979, 1996. [16] M. Morari and J. Lee. Model predictive control:Past, present and future. Computers and Chemical Engineering, 23(4-5):667–682, 1999. [17] Y. Nesterov and A. Nemirovskii. Interior-point Polynomial Algorithms in Convex Programming. SIAM, Philadelphia, 1994. [18] S. J. Qin and T. Badgwell. A survey of industrial model predictive control technology. Control Engineering Practice, 11:733–764, 2003. [19] A. Roldao-Lopes, A. Shahzad, G. Constantinides, and E. Kerrigan. More flops or more precision? accuracy parameterizable linear equation solvers for model predictive control. In Field Programmable Custom Computing Machines, 2009. FCCM ’09. 17th IEEE Symposium on, pages 209 –216, apr. 2009. [20] P. O. M. Scokaert and J. B. Rawlings. Feasibility issues in linear model predictive control. AiChE Journal, 45(8):1649–1659, August 1999. [21] P. Tøndel, T. Johansen, and A. Bemporad. An algorithm for multi-parametric quadratic programming and explicit MPC solutions. Automatica, 39(3):489–497, March 2003. [22] K. Underwood. Fpgas vs. cpus: trends in peak floating-point performance. In FPGA ’04: Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays, pages 171–180, New York, NY, USA, 2004. ACM. [23] P. Vouzis, M. Kothare, L. Bleris, and M. Arnold. A system-on-a-chip implementation for embedded real-time model predictive control. IEEE Transactions on Control Systems Technology, 17(5):1006 –1017, sept. 2009. [24] Y. Wang and S. Boyd. Fast model predictive control using online optimization. IEEE Transactions on Control Systems Technology, 18(2):267–278, March 2010. [25] A. Wills, D. Bates, A. Fleming, B. Ninness, and S. Moheimani. Model predictive control applied to constraint handling in active noise and vibration control. IEEE Transactions on Control Systems Technology, 16(1):3–12, 2008.

25

[26] A. Wills, G. Knagge, and B. Ninness. Fast model predictive control via custom integrated circuit architecture. Submitted to IEEE Transactions on Control System Technology, 2010. [27] A. Wills, A. Mills, and B. Ninness. A matlab software environment for system identification. In Proceedings of the 15th IFAC Symposium on System Identification, St. Malo, France, 2009. [28] A. Wills and B. Ninness. UNIT - University of Newcastle Identification Toolbox. Webpage. http://sigpromu.org/idtoolbox/. [29] A. Wills and B. Ninness. QPC - Quadratic Programming in C. Webpage, 2011. http://sigpromu.org/quadprog/. [30] A. Wills, B. Ninness, and S. Gibson. Maximum likelihood estimation of state space models from frequency domain data. IEEE Transactions on Automatic Control, 54(1):19–33, 2009. [31] A. G. Wills and W. P. Heath. Barrier function based model predictive control. Automatica, 40(8):1415–1422, August 2004.

26

Suggest Documents