A Hierarchical Dirichlet Process Model with Multiple Levels of Clustering for Human EEG Seizure Modeling

A Hierarchical Dirichlet Process Model with Multiple Levels of Clustering for Human EEG Seizure Modeling Supplementary Materials Drausin Wulsin1 , Sha...

Author: Amice Bell

4 downloads 0 Views 248KB Size

Report

Download PDF

Recommend Documents

A Hierarchical Dirichlet Model for Taxonomy Expansion for Search Engines

BIOACOUSTIC SEGMENTATION BY HIERARCHICAL DIRICHLET PROCESS HIDDEN MARKOV MODEL

Reliable and Scalable Variational Inference for the Hierarchical Dirichlet Process

Hierarchical Dirichlet Processes

HDPauthor: A New Hybrid Author-Topic Model using Latent Dirichlet Allocation and Hierarchical Dirichlet Processes

MINING ACTIVITIES USING STICKY MULTIMODAL DUAL HIERARCHICAL DIRICHLET PROCESS HIDDEN MARKOV MODEL

Gibbs Sampling Methods for Dirichlet Process Mixture Model: Technical Details

Hierarchical Subspace Clustering

Hierarchical k-means clustering

Lecture 24: Dirichlet distribution and Dirichlet Process

Clustering: hierarchical and k-means. Clustering analysis

Efficient Active Algorithms for Hierarchical Clustering

A Dependency-Constrained Hierarchical Model with Moses

Controlling Memory Consumption of Hierarchical Radiosity with Clustering

Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture

Consensus Clustering + Meta Clustering = Multiple Consensus Clustering

A Novel Indexing Technique for Web Documents using Hierarchical Clustering

WEYL GROUP MULTIPLE DIRICHLET SERIES OF TYPE A 2

Clustering of autoimmune diseases in families with a high-risk for multiple sclerosis: a descriptive study

A Parsimonious Model of Mobile Partitioned Networks with Clustering

Probabilistic Hierarchical Clustering with Labeled and Unlabeled Data

Automatic seizure detection using a two-dimensional EEG feature space

Modeling Virtual Organizations with Latent Dirichlet Allocation: A Case for Natural Language Processing

A neural model of hierarchical reinforcement learning

A Hierarchical Dirichlet Process Model with Multiple Levels of Clustering for Human EEG Seizure Modeling Supplementary Materials Drausin Wulsin1 , Shane Jensen2 , and Brian Litt1,3 Depts. of Bioengineering1 , Statistics2 , and Neurology3 University of Pennsylvania

ICML 2012

1

Algorithms for posterior MCMC

Algorithms for sampling the atom indicators, level parameters, and base parameters are given in Algorithms 1, 2, and 3. We denote [· · · ] as a vector and the operations [· · · ] · [· · · ] and [· · · ] / [· · · ] as element-wise multiplication and division, respectively. A Dirac point mass in index k is given as δk . We denote sampling a scalar θ from a multinomial distribution with vector weights p as θ ∼ Multi(p), where the weights p need not be normalized. A ∗ in place of an index indicates that all indices are selected, yielding a vector. The function 1(·) is the indicator function, which gives a 1 if its argument is true and a 0 if it is false.

2

Normal model with a conjugate prior

Here we briefly describe the details of our Normal scaled inverse-χ2 conjugate prior for a univariate Normal and the resulting posterior. For more details, see (Gelman et al., 2004, pgs. 78-80). Our base model is a multivariate Normal with mean µ and diagonal covariance σ 2 , both in d R . The diagonal covariance implies that our d dimensions are independent, so the likelihood of x is simply the product of the each dimension i’s likelihood of xi , and sampling µ and σ 2 from the posterior just involves sampling each dimension individually. For notational ease, we will thus assume that d = 1 in what follows below, so x = x and (µ, σ 2 ) = (µ, σ 2 ). The Normal model likelihood is 1 p(x | µ, σ 2 ) ∝ (σ 2 )−1/2 exp − 2 (x − µ)2 (1) 2σ Given n i.i.d. observations x1 , . . . , xn , their joint likelihood is 2

2 −n/2

p(x1 , . . . , xn | µ, σ ) ∝ (σ )

n 1 X exp − 2 (xi − µ)2 2σ i=1

1

! (2)

Algorithm 1 Sampling algorithm for the MLC-HDP base- and level-atom indicators 1: for t = 1, . . . , T do 2: for j = 1, . . . , Jt do 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22:

(2)

` ← ztj for i = 1, . . . , Ntj do // for each channel i of seizure j of patient t (3) (3) (3) (3) (3) (3) π ← α · [β1 , . . . , βK , βK+1 ] + [n`1 , . . . , n`K+1 ] // calc weights over base atoms k 0 ∼ Multi (π · [f1 (xtji ), . . . , fK+1 (xtji )]) // sample channel i’s base atom indicator (3) if k 0 6= ztji then (3)

ztji ← k 0 Φk ← Φk xtji and Φk0 ← Φk0 ⊕ xtji (3) (3) (3) (3) n`k ← n`k − 1 and n`k0 ← n`k0 + 1 end if end for

(2)

ztj ← `0

24:

n`∗ ← n`∗ −

34: 35: 36:

// update base atom sufficient statistics // update level-3 atom counts

for ` = 1, . . . , L(2) + 1 do // calc likelihood of seizure j under each level-2 atom ` (3) (3) (3) (3) (3) π 0` ← α(3) ·P [β1 , . . . , βK , βK+1 ] + [n`1 , . . . , n`K+1 ] 0 0 π ` ← π ` / k π`k QNtj P g` = i=1 // product of marginal posteriors of the channels in seizure j k π`k fk (xtji ) end for (1) l ← zt (2) (2) (2) (2) τ ← α(2) · [β1 , . . . , βL(2) +1 ] + [nl1 , . . . , nlL(2) +1 ] // calc weights over level-2 atoms `0 ∼ Muti(τ · [g1 , . . . , gL(2) +1 ]) // sample seizure j’s level-2 atom indicator (2) if `0 6= ztj then

23:

25: 26: 27: 28: 29: 30: 31: 32: 33:

// for each patient t // for each seizure j of patient t

(3)

(3)

(2) nl`

(2) nl`

←

PNtj

−1

(3) i=1 δk=ztji (2) and nl`0

and ←

(2) nl`0

(3)

(3)

n`0 ∗ ← n`0 ∗ +

PNtj

(3) i=1 δk=ztji

+1

// update level-3 atom counts // update level-2 atom counts

end if end for for ` = 1, . . . , L(2) + 1 do (3) (3) (3) (3) (3) π 0` ← α(3) ·P [β1 , . . . , βK , βK+1 ] + [n`1 , . . . , n`K+1 ] 0 0 π ` ← π ` / k π`k end for for l = 1, . . . , L(1) + 1 do // calc likelihood of patient t under each level-1 atom l (2) (2) (2) (2) τ 0l ← α(2) · [β1 , . . . , βL(2) +1 ] + [nl1 , . . . , nlL(2) +1 ] P 0 τ l ← τ 0` / ` τ`k QJt QNtj PL(2) +1 hl = j=1 τl` π`(k=z(3) ) // product of marginal posteriors of the seizures i=1 `=1 tji

37: 38: 39: 40: 41: 42:

end for (1) (1) (1) (1) // calc weights over level-1 atoms ρ ← α(1) · [β1 , . . . , βL(1) +1 ] + [n1 , . . . , nL(1) +1 ] 0 l ∼ Muti(ρ · [h1 , . . . , hL(1) +1 ]) // sample patient t’s level-1 atom indicator (1) if l0 6= zt then (1) zt ← l 0 PJt PJt (2) (2) (2) (2) nl∗ ← nl∗ − j=1 δ`=z(2) and nl0 ∗ ← nl0 ∗ + j=1 δ`=z(2) // update level-2 atom counts tj

(1)

(1)

43: nl ← nl 44: end if 45: end for

−1

and

tj

(1)

(1)

nl 0 ← nl 0 + 1

// update level-1 atom counts

2

Algorithm 2 Sampling algorithm for the MLC-HDP level parameters 1: for v = 3, . . . , 1 do 2: (v) 3: if n·(L(v) +1) > 0 then 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14:

(v) n∗(L(v) +1)

←0

and

// for each level // if the extra, empty sub-atom has been filled (v) m∗(L(v) +1)

←0

// make new space in counts variables

(v)

ω ∼ Beta(1, α ) [βL(v) , βL(v) +1 ] ← βL(v) · [ω, 1 − ω] L(v) ← L(v) + 1 end if L(0) ≡ 1 for q = 1, . . . , L(v−1) do for r = 1, . . . , L(v) do (v) for s = 1, . .. , nqr do

// sample a new parent weight via stick-breaking // increment L(v)

// sample m variables for each sub-atom r of each level-atom q

α(v) β (v)

θs ∼ Ber (v) (v)r α βr +s end forP (v) mqr ← s θs end for end for (v) (v) (v) (v) [β1 , . . . , βL(v) +1 ] ∼ Dir m·1 , . . . , m·L(v) , γ (v)

15: 16: 17: 18: 19: 20: // sample parent weights over sub-atoms 21: 22: (optional) sample hyperparameters α(v) and γ (v) as shown in Algorithm 3 23: 24: end for

Algorithm 3 Sampling algorithm for the MLC-HDP hyperparameters 1: we assume that this algorithm has the same scope as line 22 of Algorithm 2. 2: let γ (v) have a Gamma(aγ(v) , bγ(v) ) prior and α(v) have a Gamma(aα(v) , bα(v) ) prior 3: 4: for r = 1, .. . , L(v) do // sample γ auxiliary variables 5:

6: 7:

(v)

m b r ← 1 m·q > 0 (v) cr ∼ Ber (v)m·r (v) γ +m·r (v) dr ∼ Beta γ (v) + 1, m·r end for P P γ (v) ∼ Gamma aγ(v) + r (m b r − cr ), bγ(v) − r log dr

8: 9: 10: 11: for q = 1, . . . , L(v−1) do (v) nr· 12: cq ∼ Ber (v) (v)

// sample γ from posterior // sample α auxiliary variables

α +nr· (v) dq ∼ Beta α(v) + 1, nr· 14: end for P P (v) 15: α(v) ∼ Gamma aγ(v) + m·· − q cq ), bγ(v) − q log dq

13:

3

// sample α from posterior

We assume that both the mean and variance of our Normal distribution are unknown, requiring a Normal scaled inverse-χ2 conjugate prior, which entails σ 2 ∼ Inv-χ2 (ν0 , σ02 ) µ | σ 2 ∼ N (µ0 , σ 2 /n0 ) and has a joint prior density, denoted N -Inv-χ2 (n0 , µ0 , ν0 , σ02 ), of 1 2 2 −(ν0 +1)/2+1 2 2 p(µ, σ ) ∝ (σ ) exp − 2 (ν0 σ0 + n0 (µ0 − µ) ) 2σ

(3)

(4)

where n0 is the prior counts, µ0 is the prior location, ν0 is the prior degrees of freedom, and σ02 is the prior scale. The posterior density is the product of the data likelihood and the joint prior and has the form !! n X 1 2 2 −(ν0 +n+1)/2+1 2 2 2 p(x1 , . . . , xn | µ, σ ) ∝ (σ ) exp − 2 (xi − µ) + ν0 σ0 + n0 (µ0 − µ) (5) 2σ i=1

with posterior parameters n, µ, ν, and statistics (Sudderth, 2006)

σ2,

which we can efficiently represent using the sufficient

n = n0 + n n µ = n 0 µ0 +

(6) n X

xi

(7)

i=1

ν = ν0 + n

(8)

ν σ 2 = ν0 σ02 + n0 µ20 +

n X

x2i − n µ2

(9)

i=1

These values can be cached for each base atom and easily updated when a new data point is added or removed. In practice, we only store the first three terms of (9) and subtract off the last term when we need to actually use the sufficient statistics (e.g., for sampling µ and σ 2 values from the posterior). In situations where full covariance is required, the setup is still quite similar, except the Cholesky decomposition of the analogous form of (9) is stored (see Sudderth 2006 for more details). We sample from the posterior in two steps, first drawing σ 2 from its marginal posterior distribution σ 2 ∼ Inv-χ2 (ν, σ 2 ) (10) or (in practice) θ ∼ Inv-χ2 (ν) σ 2 = ν σ 2 /θ

(11)

and then drawing µ from its conditional posterior distribution µ | σ ∼ N (µ, σ 2 /n)

(12)

Integrating over the µ and σ 2 parameters of the N -Inv-χ2 distribution yields the posterior predictive distribution, here a Student-t (Gelman et al., 2004, pg. 88), for a new data point x+ , n+1 2 p(x+ | x1 , . . . , xn , n0 , µ0 , ν0 , σ02 ) = tν x+ | µ, σ (13) n 4

References Gelman, Andrew, Carlin, John B., Stern, Hal S., and Rubin, Donald S. Bayesian Data Analysis. Chapman & Hall/CRC, 2nd edition, 2004. Sudderth, E.B. Graphical models for visual object recognition and tracking. PhD thesis, 2006.

5