Learning Bayesian Networks from Data

Overview  Introduction  Parameter Estimation  Model Selection  Structure Discovery  Incomplete Data  Learning from Structured Data Learning Ba...
Author: Elijah Henry
0 downloads 0 Views 2MB Size
Overview

 Introduction  Parameter Estimation  Model Selection  Structure Discovery  Incomplete Data  Learning from Structured Data

Learning Bayesian Networks from Data Nir Friedman Hebrew U.

Daphne Koller Stanford

.

2

Bayesian Networks

Example: “ICU Alarm” network

Compact representation of probability distributions via conditional independence Family of

Alarm

Qualitative part: Earthquake Directed acyclic graph (DAG)  Nodes - random variables  Edges - direct influence Radio

Burglary

Alarm

Domain: Monitoring Intensive-Care Patients  37 variables  509 parameters …instead of 254

MINVOLSET

E e e e e

B P(A | E,B) b 0.9 0.1 b 0.2 0.8 b 0.9 0.1 b 0.01 0.99

PULMEMBOLUS

PAP

HYPOVOLEMIA

Quantitative part: Set of conditional probability distributions

LVEDVOLUME

CVP

SAO2

PCWP

LVFAILURE

STROEVOLUME

FIO2

VENTALV

PVSAT

ARTCO2

EXPCO2

INSUFFANESTH

CATECHOL

HISTORY

ERRBLOWOUTPUT

CO

HR

HREKG

ERRCAUTER

HRSAT

HRBP BP

3

4

Inference

Why learning? Knowledge acquisition bottleneck  Knowledge acquisition is an expensive process  Often we don’t have an expert

 Probability of any event given any evidence

Most likely explanation

 Scenario that explains evidence Earthquake

Data is cheap

 Amount of available information growing rapidly  Learning allows us to construct models from raw

Burglary

 Value of Information

Effect of intervention

DISCONNECT

VENITUBE PRESS

Posterior probabilities

 Maximize expected utility

VENTMACH

VENTLUNG

MINOVL

TPR

P (B , E , A, C , R ) = P (B )P (E )P (A | B , E )P (R | E )P (C | A)

Rational decision making

SHUNT

ANAPHYLAXIS

Call

Together: Define a unique distribution in a factored form

KINKEDTUBE

INTUBATION

Radio

Alarm

data

Call 5

6

1

Why Learn Bayesian Networks?

Learning Bayesian networks

 Conditional independencies & graphical language capture structure of many real-world distributions

E

 Graph structure provides much insight into domain

Data + Prior Information

 Allows “knowledge discovery”

 Learned model can be used for many tasks

Learner

B

R

A C

E e e e e

 Supports all the features of probabilistic learning  Model selection criteria

 Dealing with missing data & hidden variables

B P(A | E,B) b .9 .1 b .7 .3 b .8 .2 b .99 .01

7

8

Known Structure, Complete Data E, B, A . .

E e e e e

B P(A | E,B) b ? ? b ? ? b ? ? b ? ?

E B

E, B, A . .

B

E e e e e

A

Learner E

Unknown Structure, Complete Data

A

B P(A | E,B) b .9 .1 b .7 .3 b .8 .2 b .99 .01

E e e e e

 Network structure is specified

B P(A | E,B) b ? ? b ? ? b ? ? b ? ?

E

E e e e e

A

Learner E

B

B A

B P(A | E,B) b .9 .1 b .7 .3 b .8 .2 b .99 .01

 Network structure is not specified

 Inducer needs to estimate parameters

 Inducer needs to select arcs & estimate parameters

 Data does not contain missing values

 Data does not contain missing values 9

10

Known Structure, Incomplete Data E, B, A . . . .