Differential Equations & Functional Data Analysis Parameter Estimation for Differential Equations

Whitney Huang Department of Statistics Purdue University

April 21, 2014

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

1 / 27

Outline

1

Motivation

2

The estimation procedure

3

Example: Groundwater Levels

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

2 / 27

Differential Equations A differential equation is an equation that relates some function of one or more variables with its derivatives. Ordinary differential equation: m¨ x + b x˙ + kx = Fext

Partial differential equation: ∂u −α ∂t

Whitney Huang (Purdue University)



∂2u ∂2u ∂2u + + 2 ∂x 2 ∂y 2 ∂z

DE’s & FDA

 =0

April 21, 2014

3 / 27

Differential Equations

I

Widely used to model dynamic systems I I I

I

Maxwell’s equations in electromagnetism Navier–Stokes equations in fluid dynamics in fluid dynamics The Black–Scholes PDE in Economics

Forward problem (i.e. solving differential equations) has been studied extensively by mathematicians

Inverse Problem Suppose a given data set can be reasonably model by a differential equation but with unknown coefficients. Can we make a statistical inference?

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

4 / 27

Differential Equations

I

Widely used to model dynamic systems I I I

I

Maxwell’s equations in electromagnetism Navier–Stokes equations in fluid dynamics in fluid dynamics The Black–Scholes PDE in Economics

Forward problem (i.e. solving differential equations) has been studied extensively by mathematicians

Inverse Problem Suppose a given data set can be reasonably model by a differential equation but with unknown coefficients. Can we make a statistical inference?

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

4 / 27

Differential Equations

I

Widely used to model dynamic systems I I I

I

Maxwell’s equations in electromagnetism Navier–Stokes equations in fluid dynamics in fluid dynamics The Black–Scholes PDE in Economics

Forward problem (i.e. solving differential equations) has been studied extensively by mathematicians

Inverse Problem Suppose a given data set can be reasonably model by a differential equation but with unknown coefficients. Can we make a statistical inference?

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

4 / 27

Differential Equations

I

Widely used to model dynamic systems I I I

I

Maxwell’s equations in electromagnetism Navier–Stokes equations in fluid dynamics in fluid dynamics The Black–Scholes PDE in Economics

Forward problem (i.e. solving differential equations) has been studied extensively by mathematicians

Inverse Problem Suppose a given data set can be reasonably model by a differential equation but with unknown coefficients. Can we make a statistical inference?

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

4 / 27

Differential Equations

I

Widely used to model dynamic systems I I I

I

Maxwell’s equations in electromagnetism Navier–Stokes equations in fluid dynamics in fluid dynamics The Black–Scholes PDE in Economics

Forward problem (i.e. solving differential equations) has been studied extensively by mathematicians

Inverse Problem Suppose a given data set can be reasonably model by a differential equation but with unknown coefficients. Can we make a statistical inference?

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

4 / 27

Differential Equations

I

Widely used to model dynamic systems I I I

I

Maxwell’s equations in electromagnetism Navier–Stokes equations in fluid dynamics in fluid dynamics The Black–Scholes PDE in Economics

Forward problem (i.e. solving differential equations) has been studied extensively by mathematicians

Inverse Problem Suppose a given data set can be reasonably model by a differential equation but with unknown coefficients. Can we make a statistical inference?

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

4 / 27

Differential Equations

I

Widely used to model dynamic systems I I I

I

Maxwell’s equations in electromagnetism Navier–Stokes equations in fluid dynamics in fluid dynamics The Black–Scholes PDE in Economics

Forward problem (i.e. solving differential equations) has been studied extensively by mathematicians

Inverse Problem Suppose a given data set can be reasonably model by a differential equation but with unknown coefficients. Can we make a statistical inference?

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

4 / 27

Differential Equations & Functional Data Analysis

Most dynamic systems defined by the solutions of their differential equations are not fit to data, they intend to capture gross shape features in the specified context. However, · · · I

Solutions of differential equations are functions

I

We can treat the data as an approximated solution of the corresponding differential equation

⇒ FDA framework can help for the inverse problem

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

5 / 27

Differential Equations & Functional Data Analysis

Most dynamic systems defined by the solutions of their differential equations are not fit to data, they intend to capture gross shape features in the specified context. However, · · · I

Solutions of differential equations are functions

I

We can treat the data as an approximated solution of the corresponding differential equation

⇒ FDA framework can help for the inverse problem

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

5 / 27

Differential Equations & Functional Data Analysis

Most dynamic systems defined by the solutions of their differential equations are not fit to data, they intend to capture gross shape features in the specified context. However, · · · I

Solutions of differential equations are functions

I

We can treat the data as an approximated solution of the corresponding differential equation

⇒ FDA framework can help for the inverse problem

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

5 / 27

Differential Equations & Functional Data Analysis

Most dynamic systems defined by the solutions of their differential equations are not fit to data, they intend to capture gross shape features in the specified context. However, · · · I

Solutions of differential equations are functions

I

We can treat the data as an approximated solution of the corresponding differential equation

⇒ FDA framework can help for the inverse problem

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

5 / 27

Differential Equations & Functional Data Analysis

Most dynamic systems defined by the solutions of their differential equations are not fit to data, they intend to capture gross shape features in the specified context. However, · · · I

Solutions of differential equations are functions

I

We can treat the data as an approximated solution of the corresponding differential equation

⇒ FDA framework can help for the inverse problem

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

5 / 27

Outline

1

Motivation

2

The estimation procedure

3

Example: Groundwater Levels

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

6 / 27

Set-up

Differential equation: f (x, x, ˙ x¨, · · · ; θ) = 0 e.g. x˙ = −βx + µ ⇒ f = x˙ + βx − µ = 0 Observed data: y (ti ) = x(ti ) + εi ,

i = 1, · · · , n

i.i.d.

εi ∼ N(0, σ 2 ) Goal: to estimate the unknown θ in the differential equation from the data with measurement error and to quantify the uncertainty of the estimates

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

7 / 27

Statistical Challenge

Most differential equations are not solvable analytically. In order to carry out the estimation I

it requires repeatedly solving differential equation numerically

I

the initial value of the differential equation need to be known

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

8 / 27

Statistical Challenge

Most differential equations are not solvable analytically. In order to carry out the estimation I

it requires repeatedly solving differential equation numerically

I

the initial value of the differential equation need to be known

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

8 / 27

Estimation procedure

1

(Ramsay et al. 2007):

basic idea

Use basis function expansion to approximate x(t), i.e xˆ(t) = cT φ(t)

2

Estimate the coefficients c of the chosen basis functions by incorporating differential equation defined penalty

3

Estimate the parameters θ in the differential equation

4

Choosing the amount of smoothing λ

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

9 / 27

Estimation procedure

1

(Ramsay et al. 2007):

basic idea

Use basis function expansion to approximate x(t), i.e xˆ(t) = cT φ(t)

2

Estimate the coefficients c of the chosen basis functions by incorporating differential equation defined penalty

3

Estimate the parameters θ in the differential equation

4

Choosing the amount of smoothing λ

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

9 / 27

Estimation procedure

1

(Ramsay et al. 2007):

basic idea

Use basis function expansion to approximate x(t), i.e xˆ(t) = cT φ(t)

2

Estimate the coefficients c of the chosen basis functions by incorporating differential equation defined penalty

3

Estimate the parameters θ in the differential equation

4

Choosing the amount of smoothing λ

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

9 / 27

Estimation procedure

1

(Ramsay et al. 2007):

basic idea

Use basis function expansion to approximate x(t), i.e xˆ(t) = cT φ(t)

2

Estimate the coefficients c of the chosen basis functions by incorporating differential equation defined penalty

3

Estimate the parameters θ in the differential equation

4

Choosing the amount of smoothing λ

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

9 / 27

Basic function expansion

xˆ(t) =

K X

ck φk (t) = cT φ(t)

k=1 I

Choice of basis function: splines are usually the logical choice because of the compact support and the capacity to capture any transient localized features

I

Number of basis function: usually large because it requires not only to approximate x(t) but also its derivatives

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

10 / 27

Basic function expansion

xˆ(t) =

K X

ck φk (t) = cT φ(t)

k=1 I

Choice of basis function: splines are usually the logical choice because of the compact support and the capacity to capture any transient localized features

I

Number of basis function: usually large because it requires not only to approximate x(t) but also its derivatives

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

10 / 27

Data fitting criterion

Z J(c|θ) = ` (ˆ x (ti ), y (ti )) + λ | {z } | data fidelity

[f (ˆ x (t); θ)]2 dt {z }

DE defined penalty

I

the first part of the criterion function is the fidelity of basis function approximation to the data

I

the second part is the penalty term with respect to differential equation given θ

I

smoothing parameter λ controls the relative emphasis on these two objectives

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

11 / 27

Data fitting criterion

Z J(c|θ) = ` (ˆ x (ti ), y (ti )) + λ | {z } | data fidelity

[f (ˆ x (t); θ)]2 dt {z }

DE defined penalty

I

the first part of the criterion function is the fidelity of basis function approximation to the data

I

the second part is the penalty term with respect to differential equation given θ

I

smoothing parameter λ controls the relative emphasis on these two objectives

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

11 / 27

Data fitting criterion

Z J(c|θ) = ` (ˆ x (ti ), y (ti )) + λ | {z } | data fidelity

[f (ˆ x (t); θ)]2 dt {z }

DE defined penalty

I

the first part of the criterion function is the fidelity of basis function approximation to the data

I

the second part is the penalty term with respect to differential equation given θ

I

smoothing parameter λ controls the relative emphasis on these two objectives

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

11 / 27

The parameter hierarchy

There are three classes of parameters to estimate: I

The coefficients c in the basis function expansion

I

The parameters θ defining the differential equation

I

The smoothing parameter λ

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

12 / 27

The parameter hierarchy

There are three classes of parameters to estimate: I

The coefficients c in the basis function expansion

I

The parameters θ defining the differential equation

I

The smoothing parameter λ

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

12 / 27

The parameter hierarchy

There are three classes of parameters to estimate: I

The coefficients c in the basis function expansion

I

The parameters θ defining the differential equation

I

The smoothing parameter λ

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

12 / 27

The roles of the three parameter levels

I

c are not of the direct interest ⇒ nuisance parameters

I

we are primary interest in θ, the parameters that define the differential equation Smoothing parameter λ control the overall complexity of the model

I

I I

λ → 0 ⇒ high complexity in xˆ(t) λ → ∞ ⇒ low complexity in xˆ(t)

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

13 / 27

The roles of the three parameter levels

I

c are not of the direct interest ⇒ nuisance parameters

I

we are primary interest in θ, the parameters that define the differential equation Smoothing parameter λ control the overall complexity of the model

I

I I

λ → 0 ⇒ high complexity in xˆ(t) λ → ∞ ⇒ low complexity in xˆ(t)

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

13 / 27

The roles of the three parameter levels

I

c are not of the direct interest ⇒ nuisance parameters

I

we are primary interest in θ, the parameters that define the differential equation Smoothing parameter λ control the overall complexity of the model

I

I I

λ → 0 ⇒ high complexity in xˆ(t) λ → ∞ ⇒ low complexity in xˆ(t)

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

13 / 27

The roles of the three parameter levels

I

c are not of the direct interest ⇒ nuisance parameters

I

we are primary interest in θ, the parameters that define the differential equation Smoothing parameter λ control the overall complexity of the model

I

I I

λ → 0 ⇒ high complexity in xˆ(t) λ → ∞ ⇒ low complexity in xˆ(t)

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

13 / 27

The roles of the three parameter levels

I

c are not of the direct interest ⇒ nuisance parameters

I

we are primary interest in θ, the parameters that define the differential equation Smoothing parameter λ control the overall complexity of the model

I

I I

λ → 0 ⇒ high complexity in xˆ(t) λ → ∞ ⇒ low complexity in xˆ(t)

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

13 / 27

The parameter cascade algorithm

1

c are nuisance parameters are defined as a smooth functions c(θ, λ)

2

Structural parameters θ are defined as functions θ(λ) of the complexity parameters

3

These functional relationships are defined implicitly by specifying a different conditional fitting criterion at each level of the parameter hierarchy

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

14 / 27

The parameter cascade algorithm

1

c are nuisance parameters are defined as a smooth functions c(θ, λ)

2

Structural parameters θ are defined as functions θ(λ) of the complexity parameters

3

These functional relationships are defined implicitly by specifying a different conditional fitting criterion at each level of the parameter hierarchy

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

14 / 27

The parameter cascade algorithm

1

c are nuisance parameters are defined as a smooth functions c(θ, λ)

2

Structural parameters θ are defined as functions θ(λ) of the complexity parameters

3

These functional relationships are defined implicitly by specifying a different conditional fitting criterion at each level of the parameter hierarchy

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

14 / 27

The multi–criterion optimization strategy I

Nuisance parameter functions c(θ, λ) are defined by optimizing the regularized fitting criterion J(c|θ) =

n X

2

{yi − xˆ(ti )} + λ

Z

[f (ˆ x (t); θ)]2 dt

i=1 I

A purely data-fitting criterion H(θ) is then optimized with respect to the structural parameters θ alone H(θ) =

n X

{yi − xˆ(ti , θ)}2

i=1 I

At the top level, a complexity criterion F (λ), is optimized with respect to λ

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

15 / 27

The multi–criterion optimization strategy I

Nuisance parameter functions c(θ, λ) are defined by optimizing the regularized fitting criterion J(c|θ) =

n X

2

{yi − xˆ(ti )} + λ

Z

[f (ˆ x (t); θ)]2 dt

i=1 I

A purely data-fitting criterion H(θ) is then optimized with respect to the structural parameters θ alone H(θ) =

n X

{yi − xˆ(ti , θ)}2

i=1 I

At the top level, a complexity criterion F (λ), is optimized with respect to λ

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

15 / 27

The multi–criterion optimization strategy I

Nuisance parameter functions c(θ, λ) are defined by optimizing the regularized fitting criterion J(c|θ) =

n X

2

{yi − xˆ(ti )} + λ

Z

[f (ˆ x (t); θ)]2 dt

i=1 I

A purely data-fitting criterion H(θ) is then optimized with respect to the structural parameters θ alone H(θ) =

n X

{yi − xˆ(ti , θ)}2

i=1 I

At the top level, a complexity criterion F (λ), is optimized with respect to λ

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

15 / 27

Remarks

I

Gradients and hessians at any level can be computed using the Implicit Function Theorem

I

xˆ(t) = cT φ(t) is only approximately a solution of the differential equation. It is a data-regularized approximation to a solution

I

When λ are small, the optimization criteria J and H are smooth functions of their arguments, easily optimized. By increasing λ, we avoid the local minimum and force xˆ(t) close to x ∗ (t)

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

16 / 27

Remarks

I

Gradients and hessians at any level can be computed using the Implicit Function Theorem

I

xˆ(t) = cT φ(t) is only approximately a solution of the differential equation. It is a data-regularized approximation to a solution

I

When λ are small, the optimization criteria J and H are smooth functions of their arguments, easily optimized. By increasing λ, we avoid the local minimum and force xˆ(t) close to x ∗ (t)

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

16 / 27

Remarks

I

Gradients and hessians at any level can be computed using the Implicit Function Theorem

I

xˆ(t) = cT φ(t) is only approximately a solution of the differential equation. It is a data-regularized approximation to a solution

I

When λ are small, the optimization criteria J and H are smooth functions of their arguments, easily optimized. By increasing λ, we avoid the local minimum and force xˆ(t) close to x ∗ (t)

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

16 / 27

Outline

1

Motivation

2

The estimation procedure

3

Example: Groundwater Levels

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

17 / 27

Rain and Groundwater

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

18 / 27

Rain and Groundwater

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

19 / 27

Rain and Groundwater: a smaller time scale

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

20 / 27

Functional regression model

G (t) = G0 + αR(t) + ε(t) I

it implies that sudden changes in rainfall R(t) are passed on immediately to groundwater level G (t)

I

we observe that G (t) increases smoothly over a series of localized rainfalls, and declines slowly when they cease

⇒ functional linear model won’t work

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

21 / 27

Functional regression model

G (t) = G0 + αR(t) + ε(t) I

it implies that sudden changes in rainfall R(t) are passed on immediately to groundwater level G (t)

I

we observe that G (t) increases smoothly over a series of localized rainfalls, and declines slowly when they cease

⇒ functional linear model won’t work

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

21 / 27

Functional regression model

G (t) = G0 + αR(t) + ε(t) I

it implies that sudden changes in rainfall R(t) are passed on immediately to groundwater level G (t)

I

we observe that G (t) increases smoothly over a series of localized rainfalls, and declines slowly when they cease

⇒ functional linear model won’t work

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

21 / 27

Functional regression model

G (t) = G0 + αR(t) + ε(t) I

it implies that sudden changes in rainfall R(t) are passed on immediately to groundwater level G (t)

I

we observe that G (t) increases smoothly over a series of localized rainfalls, and declines slowly when they cease

⇒ functional linear model won’t work

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

21 / 27

Differential equation for groundwater level

dG (t) = −βG (t) + αR(t − δ) + µ dt where I

β specifies the the rate of change of G (t) with itself

I

α defines the impact of a change in R(t)

I

µ is a baseline level, required here because the origin for level G (t) is not meaningful

I

lag δ is the time for rainfall to reach the groundwater level, and is known to be about 3 hours

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

22 / 27

The constant coefficient fit

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

23 / 27

Allowing ODE parameters time varying

I

As groundwater level G (t) changes, the dynamics change, too, because we move through different types of sub-soil structures

I

We weren’t given sub-soil transmission rates, so we needed to allow β(t), α(t) and µ(t) to vary slowly over time dG (t) = −β(t)G (t) + α(t)R(t − δ) + µ(t) dt

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

24 / 27

The time-varying coefficient fit

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

25 / 27

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

26 / 27

Ramsay, J. O., Hooker, G., Campbell, D., & Cao, J. Parameter estimation for differential equations: a generalized smoothing approach (with discussion) Journal of the Royal Statistical Society. Series B (Methodological), 69(5), 741–796, 2007 Xun, X., Cao, J., Mallick, B., Maity, A., & Carroll, R. J. Parameter Estimation of Partial Differential Equation Models Journal of the American Statistical Association, 108(503), 1009–1020, 2013

Whitney Huang (Purdue University)

DE’s & FDA

April 21, 2014

27 / 27