Numerical Methods for Optimization-based Model Validation
Ekaterina A. Kostina Institute for Aplplied Mathematics (IAM) Interdisciplinary Center for Scientific Computing (IWR) Heidelberg Collaboratory for Industrial Optimization (HCO)
SAMSI Optimization Workshop August 29 - September 2, 2016, Research Triangle Park, USA
What Models We Are Looking For? Why Do We Need These Models?
What Models We Are Looking For?
These? ... No!
What Models We Are Looking For?
Maybe this? ... Also no!
Photo by Manfred Werner-Tsui
Barbara Meier, studied mathematics, the winner of the second cycle of Germany’s Next Topmodel
What Models We Are Looking For? ... these also no!
Möbius band
Klein bottle
But The Models That Allow Simulation From Merriam-Webster’s Dictionary
Why Do We Need Simulation Models? E.g. the concept of “digital twins” (GE, Siemens, ...) GE Report Oct 4, 2015
every new product or a physical economical process has a digital twin which includes a collection of models and algorithms; which accompanies the process from the “origin” (modeling) through its life time, “growing” together with the process; is used among others to analyze data, predict malfunctioning and perform optimal operation. “The Wall Street Journal”, Jan 15, 2016: GE Digital CEO Says “Digital Twins” Will Optimize Machinery, Human Health
Another example: development and admission of drugs Necessary precondition: validated models with reliable parameter estimates
Life Cycle of the Mathematical Model
Outline
1
Parameter Estimation
2
Sensitivity Analysis of Parameter Estimates
3
Design of Optimal Experiments
4
Applications
First Step in Model Validation: Parameter Estimation
Constrained Parameter Estimation Problem determine parameters p and states y to minimize deviation of the model response M[y; p] from measurements η in a suitable norm (e.g. weighted l2 , l1 , hybrid) min y,p
||η − M[y; p]|| ,
such that model equations F[y; p] = 0 and additional “experimental constraints” are satisfied C[y; p] ≥ 0
Mathematical Model Classes for Dynamical Processes F[y; p] = 0 Differential-Algebraic Equations (DAE) x: differential variables z: algebraic variables, y = (x, z)
B(x, z, p)˙x
=
f (x, z, p, q, u)
p: unknown parameters (to be estimated by PE)
0
=
g(x, z, p, q, u)
q: design parameters (to be chosen in OED) u: controls (to be chosen OED)
Partial-Differential Equations (PDE)
ut − ∇(K∇u) = f (u, p) → Semidiscretization in space or Rothe method initial and boundary conditions!
Observation Model and Data
ηi
=
M(ti , ytrue (ti ), ptrue ) + εi at time points ti , i
= 1, ..., m1
Data from multiple experiments under varying conditions
M: nonlinear function of states (differential and algebraic) εi : measurement errors independent in this talk: εi ∼ N (0, σi2 )
Numerical Methods for Parameter Estimation in ODE/DAE/PDE Bock et al 1981, ..., K. et al 1997, ...
Direct “all-at-once” multiple shooting method Constrained Gauss Newton method Structure exploitation (multiple experiments, parameterization in time, spatial discretization of PDE) Efficient line search strategies Use of inexact Jacobians Robust parameter estimation based on l1 or Huber
After (Multiple Shooting) Discretization: Large-Scale l2 Parameter Estimation Problem 1 min kF1 (x)k22 , x 2
s.t.
F2 (x) = 0
Optimization variables: xT := (pT , sT ) ∈ Rn , with dicretization variables s η1 − M(t1 , y(t1 ; p, s), p) .. F1 (x) := Σ−1 ∈ Rm1 . ηm1 − M(tm1 , y(tm1 ; p, s), p)
Cost function:
wi , i = 1, ..., m1 , wi ∈ {0, 1} Σ−1 = diag σ i
Constraints:
F2 (x) :=
Jacobians:
J1 (x) :=
C(p, s)
Discretized BVP ∂F1 (x) ∂x
∈ Rm2
and J2 (x) :=
∂F2 (x) ∂x
Solution Approach: Generalized Gauss-Newton Solve iteratively, given initial guess x0 xk+1 = xk + [tk ]∆xk ∆xk solves linearized problem min
∆x∈Ωk
s.t.
∆xk
=
1 ||F1 (xk ) + J1 (xk )∆x||22 2 F2 (xk ) + J2 (xk )∆x = 0
−J + (xk )F(xk ) J + is a generalized inverse: J + JJ + = J + J1 F1 J= , F= J2 F2
Parameter Estimation and Sensitivity Analysis Dynamical Process
Measurements with Errors
Mathematical Model
Parameter Estimation Problem
Estimates x∗
Qualitative Assessment of x∗
Quantitative Output
Qualitative Output?
Likelihood Ratio Confidence Regions
Dlr (α) = {x | F2 (x) = 0, kF1 (x)k22 − kF1 (x∗ )k22 ≤ γm2 (α)}
Beale 60, Draper and Smith 81, Pazman 93
γm2 (α) is (1 − α)-quantile of χ2 -distribution with m := n − m2 degrees of freedom
Advantage High approximation quality Disadvantage Very difficult to compute
Linearized Confidence Regions
Dlin (α) = {x | J2 (x∗ )(x − x∗ ) = 0, kJ1 (x∗ )(x − x∗ )k22 ≤ γm2 (α)}
Beale 60, Draper and Smith 81, Pazman 93
Advantage Can be computed very efficiently
Another View: Parameter Sensitivity w.r.t. Measurement Errors Introduce an error weight τ ∈ [0, 1]: min kF1 (x)k22
min kF1 (x, τ )k22
s.t. F2 (x) = 0
s.t. F2 (x) = 0
x
x
where (M(t1 ) + τ 1 ) − M(t1 , y(t1 ; s), p) .. F1 (x, τ ) = Σ−1 . (M(tm1 ) + τ m1 ) − M(tm1 , y(tm1 ; s), p)
Measurements M ⇒ estimates x(M) resp. x(0) Measurements M + τ ⇒ estimates x(M + τ ) resp. x(τ ) Question: what is the distribution of x(τ )?
Lemma Taylor series for the solution of the modified problem −1 + Σ • x(τ ) = x + τ J + O(τ 2 ) 0
Another View: Linearized Confidence Regions Dlin (α) = {x∗ + ∆x | ∆x = −J + (x∗ )
η , kηk22 ≤ γm2 (α)} 0
Dlin (α) = {x | J2 (x∗ )(x − x∗ ) = 0, kJ1 (x∗ )(x − x∗ )k22 ≤ γm2 (α)} Linear approximation of the covariance matrix C = E ∆x∆xT Im1 0 + ∗ = J (x ) J +T (x∗ ) 0 0m2 Dlin (α) is enclosed by a (minimal) box n
GL (α) ⊂
×[x
∗ i
− θi , xi∗ + θi ],
i=1
p where θi = Cii γm2 (α) and Cii are the diagonal elements of the covariance matrix Bock 1987
Next Step in Model Validation: Design of Optimal Experiments
Design of Optimal Experiments
Experimental Setup 1
Experimental Setup 2
B
B bad
B
good
B BN
Confidence Regions
Aim: choose optimal experimental conditions (design variables) to gain information about parameter estimates (to reduce parameter uncertainty) Design variables ξ T := qT , wT ∈ Ξ q ∈ Rnq (discretized) control functions and control parameters w ∈ {0, 1}K sampling
Design of Optimal Experiments: Problem Formulation
min φ(ξ; x) ξ
s. t. ml ≤ ψ(t, x, ξ) ≤ mu , t ∈ T 0 = χ(t, x, ξ), t ∈ T w ∈ {0, 1}M
ξ T = (qT , wT ) are design variables x is an estimate of true parameter and (discretized) states values xtrue satisfying the underlying parameter estimation problem
Design of Optimal Experiments: Cost Functionals Typical cost functionals φ is a functional of the covariance matrix Im1 0 C = J + (x) J +T (x) = Im1 0 0m2
0
J1T J1 J2
J2T 0
−1
Im1 0
A: φA (C) = 1n Trace(C) 1
D: φD (C) = det(K T CK) n mit K ∈ Rn×nk E: φE (C) = max{λ| λ is an eigenvalue of C}
OED problem is a complex non-standard nonlinear mixed-integer optimal control problem Cost function already involves 1st order derivatives of PE problem and is implicitly defined by a generalized inverse Integer decisions: sampling design
Optimal Experimental Design is a Complex Non-Standard Nonlinear Mixed-Integer Optimal Control Problem Theory originally developed for linear models e.g. Kiefer, Wolfowitz 50’s, Box, Draper 1987, Pukelsheim 1993
Algorithmic developments: Lohmann 1990, Körkel, 1998, ..., Bauer, Körkel et al, 1998, ..., K. et al 2004, ..., Software VPLAN
structure exploiting, direct, all-at-once optimization algorithms for DE models, based on multiple shooting efficient evaluation of higher order derivatives by structure exploiting “internal” and “algorithmic” differentiation optimal integer controls by exact lower bounds and intelligent approximation strategies robustification of optimization against uncertainties experimental design for model discrimination
What Improvements Can We Expect From Optimal Experimental Designs?
Application: Transport and Degradation of Xenobiotics in Soil
Transport and Degradation of Xenobiotics in Soil Dieses, Bock, Schlöder, Richter 2002
investigation of fate of pesticides in soil needs expensive lysimeter experiments for registration replacement by computer experiments requires validated models! here: optimal mini-lysimeter experiments optimal irrigation optimal solute application
Typical Column Outflow Experiment Water transport: =⇒ 3 unknown parameters n, α, Ks
C(ψ)
K(ψ) = Ks
∂ψ ∂ ∂ = K(ψ) (ψ − z) ∂t ∂z ∂z
(1 − (α|ψ|)n−1 (1 + (α|ψ|n )1/n−1 )2 (1 + (α|ψ|)n )
1−1/n 2
C(ψ) = α(n − 1)(θs − θr )(α|ψ|)n−1 (1 + (α|ψ|)n )1/n−2 z≥0
Initial condition: ψ(0, z) = 670, Upper boundary condition: q(t, 0) = −K(ψ(t, 0)) Lower boundary condition:
∂ψ(t, 0) −1 ∂z
control: water flux
∂ψ(t,Le ) ∂z
= 0,
t≥0
Typical Column Outflow Experiment Solute transport: =⇒ 3 unknown parameters k, b, Dm ∂ θDh (θ) ∂c ∂(qc) ∂(θc) ∂z = − − kc ∂t ∂z ∂z 0.0046ebθ + Dm θ
Dh (θ) =
Initial condition:
c(0, z) = 0,
z≥0
Upper boundary condition: −Dh (θ(t, 0))
∂c(t, 0) q(t, 0) q(t, 0) + c(t, 0) = c0 (t, 0) control: solute input ∂z θ(t, 0) θ(t, 0)
Lower boundary condition:
∂c(t,Le ) ∂z
= 0,
t≥0
Typical Column Outflow Experiment
=⇒ Coupled water and solute transport: 6 unknown parameters
Optimization of Experimental Conditions 2 boundary controls - piecewise constant water flux q(t, 0) changeble only every 24 h 0 ≤ q(t, 0) ≤ 0.6 solute input c0 (t, 0) changeble only every 24 h 50 ≤ c0 (t, 0) ≤ 200 sampling scheme: fixed measurement times 24 outflow measurements - 12 days, every 12h (σ = 0.01) 6 profile measurements (σ = 0.1) only at end of experiment evaluation of solute concentrations at 6 locations
Parameterized Expert Designs Input water flux q(t, 0)
Substance input concentration c0 (t, 0)
Quality of Expert Designs: Criterion Values and Standard Deviations
n α Ks k b Dm
Expert Design A
Expert Design B
A-Criterion = 3.8859
A-Criterion = 0.75
scaled 1.0 1.0 1.0 1.0 1.0 1.0
± ± ± ± ± ±
standard deviation 0.1604 1.5968 3.3752 0.0124 3.0460 0.2644
n α Ks k b Dm
scaled 1.0 1.0 1.0 1.0 1.0 1.0
± ± ± ± ± ±
standard deviation 0.0800 0.8600 1.6736 0.0083 0.9712 0.1033
Optimal Experimental Conditions Input water flux q(t, 0)
Substance input concentration c0 (t, 0)
Comparison: Optimal Design vs Expert Designs A and B
A-Criterion n α Ks k b Dm
1.0 1.0 1.0 1.0 1.0 1.0
Optimal design
Expert design A
Expert design B
0.0472
3.8859
0.750
± 0.0191 ± 0.2438 ± 0.4348 ± 0.0086 ± 0.1852 ± 0.0108
± 0.1604 ± 1.5968 ± 3.3751 ± 0.0124 ± 3.0460 ± 0.2644
± 0.0800 ± 0.8600 ± 1.6735 ± 0.0083 ± 0.9712 ± 0.1032
All parameters can be simultaneously estimated!
Ongoing Developments Rich and well advanced theory of optimal experimental design for linear models Very high relevance in industrial applications BASF: “We save up to 80% costs with these methods and get much more precise results!”
“Bildzeitung”, 17.06.2008
Actual Developments
Rich and well advanced theory of optimal experimental design for linear models Very high relevance in industrial applications BASF: “We save up to 80% costs with these methods and get much more precise results!” Efficient numerical methods for robustification of OED against uncertainties in parameter estimates parabolic PDE models on-line design of optimal experiments
Robustification of OED Based on Second Order Sensitivity Analysis
Robustification of OED Based on Second Order Sensitivity Analysis Consider again parametric PE problem depending on an error weight τ ∈ [0, 1]: min kF1 (x)k22
min kF1 (x, τ )k22
s.t. F2 (x) = 0
s.t. F2 (x) = 0
x
x
where
(M(t1 ) + τ 1 ) − M(t1 , y(t1 ; s), p) .. F1 (x, τ ) = Σ−1 . (M(tm1 ) + τ m1 ) − M(tm1 , y(tm1 ; s), p)
Lemma Taylor series for the solution of the modified problem −1 + Σ • x(τ ) = x + τ J + O(τ 2 ) 0 −1 Σ−1 Σ • x(τ ) = x + τ J + + τ 2 dJ + (I − JJ + ) − 12 J + (dJ)J + + O(τ 3 ) 0 0
Properties of Taylor Series Consider Taylor expansion at the parameter estimates x = x∗ + ∆x + 12 ∆x with η η ∆x = −J + , ∆x = −2 dJ + (I − JJ + ) − 12 J + (dJ)J + , kηk22 ≤ γm2 (α) 0 0 Then: k∆xk2 ≤ θ,
1 ∗ ∗
∆x + 1 ∆x ≤ θ + κ ˜ (x ) + ω ˜ (x )θ θ
2 2 2
where q
Trace(C(x∗ ))γm2 (α),
T ∗
+ ∗ ∂J(x∗ )
∗
, κ
C ∂J1 (x ) (I ⊗ F1 (x∗ )) ˜ (x ) := ω ˜ (x∗ ) := J (x )
∂x ∂x θ :=
Design of Optimal Experiments: “Q” Cost Functional K., Nattermann 2015, 2016
New cost functional ω ˜ (·) Trace(C(·)) Trace(C(·)), φQ (·) = Trace(C(·)) + κ ˜ (·) + 2
ω ˜ is a weighted Lipschitz constant of J (nonlinearity of the model functions) κ ˜ is a weighted Lipschitz constant of J + (compatibility of the data and the model, asymptotic convergence rate of Gauss-Newton) Numerical experiments show that Q-optimal designs are significantly insensitive to uncertainties in parameter estimates, the condition of the corresponding PE problem is improved
→
OED for PDE Models
Optimal Experimental Design for Parabolic PDE Models
Choose design controls ζ as solution of the optimal control problem tr(Cov[u, p, ξ]) + ckζk2L2 (Ω×[0,T])
min
ξ∈L2 (Ω×[0,T]),u∈W(0,T)
s.t.
∂t u + a(u, p, ξ)
= 0,
u(0)
= u0 ,
Cov
=
(Jred T Jred )−1 ,
+ additional constraints on controls
Optimal Experimental Design for Parabolic PDE Models: Numerical Framework K., Kriwet 2016
Parametrization of the underlying parameter estimation problem with multiple shooting approach Condensing technique in function space and efficient computation of the reduced ovariance matrix for parameters Formulation of the OED problem as a multi-stage optimal control problem Efficient computation of reduced gradients by an adjoint approach which requires np + 1 PDE solves Efficient computation of Hesse operator by Lagrange function and adjoint equations Positive definite approximation of Hesse operator which requires solution of decoupled np × np adjoint equations Software on the basis of DEAL-II
Application: Catalytic Plug Flow Reactor
A fluid flux through a packed bed of a porous catalyst, where an exothermic reaction takes place
Aim: to calibrate the mathematical model for the reactor
Catalytic Plug Flow Reactor
Mathematical Model Mass balance:
∂(uf ci ) ∂ci ∂ 2 ci =− + Deff ,z 2 + Deff ,r ∂t ∂z ∂z
1 ∂ci ∂ 2 ci + ∂r2 r ∂r
+
ρs,bed Ri ρf
Energy balance: ( ρf cp,f + (1 − )ρs,bed cp,s )
∂(uf ρf T) ∂T ∂2T = −cp,f + λeff ,z 2 + λeff ,r ∂t ∂z ∂z +ρs,bed
N X
∆HR,i Ri
i=1
Catalyst activity: ∂acat Ea,des = cA k0,des + A0,des exp − ∂t Rg T
∂2T 1 ∂T + ∂r2 r ∂r
The Optimal Experiment Design Problem
4 parameters to estimate: activation energies and frequency factors of the main and deactivation reaction 16 measurements of the temperature in the middle of the reactor at 5 different time points Bondary control: The temperature of the gas at the inlet of the reactor Tent , and the concentration cent λeff ,z
∂T(t, 0, r) = −uf ρf cp,f (Tent (t, r) − T(t, 0, r)) ∂z c(t, 0, r) = cent (t, r)
z=0 z=0
Intuitive vs Optimal Design Parameter k1 k2 k1,deac k2,deac trace
True Values −1.76416 −3.37131 −5.60809 −5.62849 -
Estimates −1.79401 ± 0.949 −3.83068 ± 0.232 −4.98277 ± 8.117 −5.61164 ± 1.160 7.18573
Uncertainty 52.9% 6.1% 162.9% 20.6% -
Estimates Based on Intuitive Experimental Design
Parameter k1 k2 k1,deac k2,deac trace
True Values −1.76416 −3.37131 −5.60809 −5.62849 -
Estimates −1.77712 ± 0.628 −3.84067 ± 0.228 −5.57985 ± 4.914 −5.61718 ± 0.977 2.692312
Estimates Based on the Optimal Experiment
Uncertainty 35.4% 5.9% 88.1% 17.4% -
Refinements: Online Optimal Experimental Design Idea: new measurements gained in experiments can be used to improve the parameter estimates p∗ re-optimize optimum experimental design over remaining experiment min
ϕ(C(ξ, p∗ ))
ξ∈Ω×[t,tN ]
Challenges: real-time iteration and multi-level iteration for optimum experimental design
Parameter k1 k2 k1,deac k2,deac trace
True Values −1.76416 −3.37131 −5.60809 −5.62849 -
Estimates −1.77712 ± 0.628 −3.84067 ± 0.228 −5.57985 ± 4.914 −5.61718 ± 0.977 2.692312
Uncertainty 35.4% 5.9% 88.1% 17.4% -
Estimates Based on Offline Optimal Experimental Design
Parameter k1 k2 k1,deac k2,deac trace
True Values −1.76416 −3.37131 −5.60809 −5.62849 -
Estimates −1.79392 ± 0.492 −3.8335 ± 0.203 −5.42928 ± 2.653 −5.59868 ± 0.872 0.851882
Uncertainty 27.4% 5.3% 48.9% 15.6% -
Estimates Based on Online Optimal Experimental Design
One More Application: Select Optimal Catalyst Geometry by Heat Transfer Coefficient – High Impact on the Value Chain
Estimation of Catalyst Properties
Cooperation with
silver catalyst on aluminum supports of different geometries ultimate aim: maximize selectivity of catalytic production process, e.g., ethene to ethene oxide
important criterion: high heat transfer coefficient to avoid hot spots or explosion!
Experimental setup
Cooperation with
standard heat conduction test reactor
special features: non-stationary heating, temperature range 20-350◦ C temperature gradient over 200◦ C
Experimental setup
Cooperation with
2D nonstationary PDE τ
∂ ∂T 1 ∂ ∂T ∂T ∂T = λz + rλr − Gcp ∂t ∂z ∂z r ∂r ∂r ∂z
≈ 1000 spatial grid points 2 controls: boundary temperature TW , mass current density G 4 parameters to estimate: λz , λr (heat conductivity) τ , cp (heat capacity)
measurements: outlet temperature
Simulation of Heat Conduction Cooperation with
Optimal Experimental Design
Cooperation with
four experiments intuitive expert design experimental time 16h
optimal design
with constant G = 0, 0.9, 1.8, 4 kg/m2 s non-stationary heating with ramp TW = 20 − 350◦ C
one experiment
experimental time 2.5h
with varying steps of G = 0, 0.49, 4 kg/m2 s
uncertainty is halved
stationary heating with TW = 350◦ C
Optimal Experimental Design
Cooperation with
Results: total heat transfer coefficients for 3 candidate shapes including error bars
high accuracy decision basis for reactor design market idea revised: shape 1 instead of shape 2 (original choice) Impact on value chain: catalyst investment over 1 Me!
Conclusions Numerical methods and software for parameter estimation and optimum experimental design Complex nonlinear problems modelled by ODE/DAE/PDE can be treated Optimal experimental design leads to acceleration of modeling process, allows to reduce e.g. development times for new products, prevents from making wrong decisions based on not calibrated models Challenges: High potential, but practically no theory for nonlinear dynamical processes available Also more efficient numerical methods needed robustification of OED against uncertainties more general problem classes, e.g., hybrid systems, PDE models incorporation of OED in real time state and parameter estimation and feedback control
New application areas
THANK YOU!