Markov Logic in Natural Language Processing Hoifung Poon

Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington Overview     Motivation Fou...
Author: Rafe Gaines
2 downloads 0 Views 2MB Size
Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington

Overview 

  

Motivation Foundational areas Markov logic NLP applications   

Basics Supervised learning Unsupervised learning

2

Languages Are Structural governments lm$pxtm (according to their families)

3

Languages Are Structural S

govern-ment-s l-m$px-t-m

VP

NP V

(according to their families)

NP

IL-4 induces CD11B

Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41...... involvement Theme

Cause

up-regulation Theme

IL-10

Cause

gp41

Site

activation Theme

human p70(S6)-kinase monocyte

George Walker Bush was the 43rd President of the United States. …… Bush was the eldest son of President G. H. W. Bush and Babara Bush. ……. In November 1977, he met Laura Welch at a barbecue. 4

Languages Are Structural S

govern-ment-s l-m$px-t-m

VP

NP V

(according to their families)

NP

IL-4 induces CD11B

Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41...... involvement Theme

Cause

up-regulation Theme

IL-10

Cause

gp41

Site

activation Theme

human p70(S6)-kinase monocyte

George Walker Bush was the 43rd President of the United States. …… Bush was the eldest son of President G. H. W. Bush and Babara Bush. ……. In November 1977, he met Laura Welch at a barbecue. 5

Languages Are Structural 

Objects are not just feature vectors   



Objects are seldom i.i.d. (independent and identically distributed)   



They have parts and subparts Which have relations with each other They can be trees, graphs, etc.

They exhibit local and global dependencies They form class hierarchies (with multiple inheritance) Objects’ properties depend on those of related objects

Deeply interwoven with knowledge 6

First-Order Logic  





Main theoretical foundation of computer science General language for describing complex structures and knowledge Trees, graphs, dependencies, hierarchies, etc. easily expressed Inference algorithms (satisfiability testing, theorem proving, etc.)

7

Languages Are Statistical I saw the man with the telescope NP I saw the man with the telescope NP ADVP I saw the man with the telescope

Microsoft buys Powerset Microsoft acquires Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, … ……

Here in London, Frances Deek is a retired teacher … In the Israeli town …, Karen London says … Now London says … London  PERSON or LOCATION?

G. W. Bush …… …… Laura Bush …… Mrs. Bush …… Which one? 8

Languages Are Statistical 

   

Languages are ambiguous Our information is always incomplete We need to model correlations Our predictions are uncertain Statistics provides the tools to handle this

9

Probabilistic Graphical Models 

     

Mixture models Hidden Markov models Bayesian networks Markov random fields Maximum entropy models Conditional random fields Etc. 10

The Problem 





 

Logic is deterministic, requires manual coding Statistical models assume i.i.d. data, objects = feature vectors Historically, statistical and logical NLP have been pursued separately We need to unify the two! Burgeoning field in machine learning: Statistical relational learning 11

Costs and Benefits of Statistical Relational Learning 

Benefits   



Better predictive accuracy Better understanding of domains Enable learning with less or no labeled data

Costs   

Learning is much harder Inference becomes a crucial issue Greater complexity for user 12

Progress to Date 

 

Probabilistic logic [Nilsson, 1986] Statistics and beliefs [Halpern, 1990] Knowledge-based model construction [Wellman et al., 1992]



Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Relational Markov networks [Taskar et al., 2002]



Etc.



This talk: Markov logic [Domingos & Lowd, 2009]

 

13

Markov Logic: A Unifying Framework 

   

Probabilistic graphical models and first-order logic are special cases Unified inference and learning algorithms Easy-to-use software: Alchemy Broad applicability Goal of this tutorial: Quickly learn how to use Markov logic and Alchemy for a broad spectrum of NLP applications 14

Overview  

Motivation Foundational areas  

 





Probabilistic inference Statistical learning Logical inference Inductive logic programming

Markov logic NLP applications   

Basics Supervised learning Unsupervised learning 15

Markov Networks 

Undirected graphical models

Smoking

Cancer Asthma



Cough

Potential functions defined over cliques P (x) 

1 Z

Z 



Ф(S,C)

False

False

4.5

False

True

4.5

True

False

2.7

True

True

4.5

c

 x

 c ( xc )

Smoking Cancer

c

 c ( xc )

16

Markov Networks 

Undirected graphical models

Smoking

Cancer Asthma



Cough

Log-linear model:   P (x)  exp   w i f i ( x )  Z  i  1

Weight of Feature i

f 1 ( Smoking,

w1  1 .5

Cancer

1 )    0

Feature i

if

 Smoking

 Cancer

otherwise 17

Markov Nets vs. Bayes Nets Property

Markov Nets

Bayes Nets

Form

Prod. potentials

Prod. potentials

Potentials

Arbitrary

Cond. probabilities

Cycles

Allowed

Forbidden

Partition func. Z = ? Indep. check

Z=1

Graph separation D-separation

Indep. props. Some

Some

Inference

Convert to Markov

MCMC, BP, etc.

18

Inference in Markov Networks 

Goal: compute marginals & conditionals of P(X ) 

 

 exp  Z  1

 i

Z 

 X

 exp  

 i

 wi fi ( X )  

Exact inference is #P-complete Conditioning on Markov blanket is easy:

 w f (x) ( x  0)   exp   exp

P ( x | M B ( x ))  exp



 wi fi ( X )  



i

wi fi

i

i

i

i

w i f i ( x  1)



Gibbs sampling exploits this 19

MCMC: Gibbs Sampling state ← random truth assignment for i ← 1 to num-samples do for each variable x sample x according to P(x|neighbors(x)) state ← state with new value of x P(F) ← fraction of states in which F is true

20

Other Inference Methods 



Belief propagation (sum-product) Mean field / Variational approximations

21

MAP/MPE Inference 

Goal: Find most likely state of world given evidence max

P ( y | x)

y

Query

Evidence

22

MAP Inference Algorithms 

   

Iterated conditional modes Simulated annealing Graph cuts Belief propagation (max-product) LP relaxation

23

Overview  

Motivation Foundational areas  

 





Probabilistic inference Statistical learning Logical inference Inductive logic programming

Markov logic NLP applications   

Basics Supervised learning Unsupervised learning 24

Generative Weight Learning   

Maximize likelihood Use gradient ascent or L-BFGS No local maxima  wi

log Pw ( x )  n i ( x )  E w  n i ( x ) 

No. of times feature i is true in data Expected no. times feature i is true according to model



Requires inference at each step (slow!) 25

Pseudo-Likelihood PL ( x ) 



P ( x i | neighbors

( x i ))

i





 

Likelihood of each variable given its neighbors in the data Does not require inference at each step Widely used in vision, spatial statistics, etc. But PL parameters may not work well for long inference chains 26

Discriminative Weight Learning 

Maximize conditional likelihood of query (y) given evidence (x)  wi

log Pw ( y | x )  n i ( x , y )  E w  n i ( x , y ) 

No. of true groundings of clause i in data Expected no. true groundings according to model



Approximate expected counts by counts in MAP state of y given x 27

Voted Perceptron   

Originally proposed for training HMMs discriminatively Assumes network is linear chain Can be generalized to arbitrary networks

wi ← 0 for t ← 1 to T do yMAP ← Viterbi(x) wi ← wi + η [counti(yData) – counti(yMAP)] return  wi / T 28

Overview  

Motivation Foundational areas  

 





Probabilistic inference Statistical learning Logical inference Inductive logic programming

Markov logic NLP applications   

Basics Supervised learning Unsupervised learning 29

First-Order Logic   

 

Constants, variables, functions, predicates E.g.: Anna, x, MotherOf(x), Friends(x, y) Literal: Predicate or its negation Clause: Disjunction of literals Grounding: Replace all variables by constants E.g.: Friends (Anna, Bob) World (model, interpretation): Assignment of truth values to all ground predicates 30

Inference in First-Order Logic     

Traditionally done by theorem proving (e.g.: Prolog) Propositionalization followed by model checking turns out to be faster (often by a lot) Propositionalization: Create all ground atoms and clauses Model checking: Satisfiability testing Two main approaches:  

Backtracking (e.g.: DPLL) Stochastic local search (e.g.: WalkSAT) 31

Satisfiability 



  



Input: Set of clauses (Convert KB to conjunctive normal form (CNF)) Output: Truth assignment that satisfies all clauses, or failure The paradigmatic NP-complete problem Solution: Search Key point: Most SAT problems are actually easy Hard region: Narrow range of #Clauses / #Variables 32

Stochastic Local Search 

    

Uses complete assignments instead of partial Start with random state Flip variables in unsatisfied clauses Hill-climbing: Minimize # unsatisfied clauses Avoid local minima: Random flips Multiple restarts

33

The WalkSAT Algorithm for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if all clauses satisfied then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes # satisfied clauses return failure

34

Overview  

Motivation Foundational areas  

 





Probabilistic inference Statistical learning Logical inference Inductive logic programming

Markov logic NLP applications   

Basics Supervised learning Unsupervised learning 35

Rule Induction 

Given: Set of positive and negative examples of some concept   



Goal: Induce a set of rules that cover all positive examples and no negative ones   



Example: (x1, x2, … , xn, y) y: concept (Boolean) x1, x2, … , xn: attributes (assume Boolean)

Rule: xa ^ xb ^ …  y (xa: Literal, i.e., xi or its negation) Same as Horn clause: Body  Head Rule r covers example x iff x satisfies body of r

Eval(r): Accuracy, info gain, coverage, support, etc. 36

Learning a Single Rule head ← y body ← Ø repeat for each literal x rx ← r with x added to body Eval(rx) body ← body ^ best x until no x improves Eval(r) return r 37

Learning a Set of Rules R←Ø S ← examples repeat learn a single rule r R←RU{r} S ← S − positive examples covered by r until S = Ø return R 38

First-Order Rule Induction 

 





y and xi are now predicates with arguments E.g.: y is Ancestor(x,y), xi is Parent(x,y) Literals to add are predicates or their negations Literal to add must include at least one variable already appearing in rule Adding a literal changes # groundings of rule E.g.: Ancestor(x,z) ^ Parent(z,y)  Ancestor(x,y) Eval(r) must take this into account E.g.: Multiply by # positive groundings of rule still covered after adding literal 39

Overview 

  

Motivation Foundational areas Markov logic NLP applications   

Basics Supervised learning Unsupervised learning

40

Markov Logic  

 

Syntax: Weighted first-order formulas Semantics: Feature templates for Markov networks Intuition: Soften logical constraints Give each formula a weight (Higher weight  Stronger constraint)

P(world)

 exp



weights

of formulas

it satisfies

 41

Example: Coreference Resolution M e n tio n s o f O b a m a a r e o fte n h e a d e d b y " O b a m a " M e n tio n s o f O b a m a a r e o fte n h e a d e d b y " P r e s id e n t" A p p o s itio n s u s u a lly r e fe r to th e s a m e e n tity

Barack Obama, the 44th President of the United States, is the first African American to hold the office. ……

42

Example: Coreference Resolution  x M e n tio n O f ( x , O b a m a )  H e a d ( x , " O b a m a " )  x M e n tio n O f ( x , O b a m a )  H e a d ( x , " P re s id e n t " )  x , y , c A p p o s itio n ( x , y )  M e n tio n O f ( x , c )  M e n tio n O f ( y , c )

43

Example: Coreference Resolution 1 .5

 x M e n tio n O f ( x , O b a m a )  H e a d ( x , " O b a m a " )

0 .8

 x M e n tio n O f ( x , O b a m a )  H e a d ( x , " P re s id e n t " )

100

 x , y , c A p p o s itio n ( x , y )  M e n tio n O f ( x , c )  M e n tio n O f ( y , c )

44

Example: Coreference Resolution 1 .5

 x M e n tio n O f ( x , O b a m a )  H e a d ( x , " O b a m a " )

0 .8

 x M e n tio n O f ( x , O b a m a )  H e a d ( x , " P re s id e n t " )

100

 x , y , c A p p o s itio n ( x , y )  M e n tio n O f ( x , c )  M e n tio n O f ( y , c )

Two mention constants: A and B Apposition(A,B) Head(A,“President”)

Head(B,“President”)

MentionOf(A,Obama)

MentionOf(B,Obama)

Head(A,“Obama”)

Head(B,“Obama”) Apposition(B,A)

45

Markov Logic Networks 

MLN is template for ground Markov nets



Probability of a world x:   P (x)  exp   w i n i ( x )  Z  i  1

Weight of formula i



 

No. of true groundings of formula i in x

Typed variables and constants greatly reduce size of ground Markov net Functions, existential quantifiers, etc. Can handle infinite domains [Singla & Domingos, 2007] and continuous domains [Wang & Domingos, 2008]

46

Relation to Statistical Models 

Special cases:           

Markov networks Markov random fields Bayesian networks Log-linear models Exponential models Max. entropy models Gibbs distributions Boltzmann machines Logistic regression Hidden Markov models Conditional random fields



Obtained by making all predicates zero-arity



Markov logic allows objects to be interdependent (non-i.i.d.)

47

Relation to First-Order Logic 





Infinite weights  First-order logic Satisfiable KB, positive weights  Satisfying assignments = Modes of distribution Markov logic allows contradictions between formulas

48

MLN Algorithms: The First Three Generations Problem MAP inference Marginal inference Weight learning Structure learning

First generation Weighted satisfiability Gibbs sampling Pseudolikelihood Inductive logic progr.

Second generation Lazy inference MC-SAT Voted perceptron ILP + PL (etc.)

Third generation Cutting planes Lifted inference Scaled conj. gradient Clustering + pathfinding 49

MAP/MPE Inference 

Problem: Find most likely state of world given evidence max

P ( y | x)

y

Query

Evidence

50

MAP/MPE Inference 

Problem: Find most likely state of world given evidence max y

1 Z

x

  exp   w i n i ( x , y )   i 

51

MAP/MPE Inference 

Problem: Find most likely state of world given evidence max y



wini ( x, y )

i

52

MAP/MPE Inference 

Problem: Find most likely state of world given evidence max y

 



wini ( x, y )

i

This is just the weighted MaxSAT problem Use weighted SAT solver (e.g., MaxWalkSAT [Kautz et al., 1997] )

53

The MaxWalkSAT Algorithm for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if  weights(sat. clauses) > threshold then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes  weights(sat. clauses) return failure, best solution found 54

Computing Probabilities 

  

P(Formula|MLN,C) = ? MCMC: Sample worlds, check formula holds P(Formula1|Formula2,MLN,C) = ? If Formula2 = Conjunction of ground atoms 



First construct min subset of network necessary to answer query (generalization of KBMC) Then apply MCMC

55

But … Insufficient for Logic 

Problem: Deterministic dependencies break MCMC Near-deterministic ones make it very slow



Solution: Combine MCMC and WalkSAT → MC-SAT algorithm [Poon & Domingos, 2006]

56

Auxiliary-Variable Methods 

Main ideas:  



Use auxiliary variables to capture dependencies Turn difficult sampling into uniform sampling

Given distribution P(x)  1, if 0  u  P ( x ) f ( x,u)    0 , o th e r w is e







f ( x, u) du  P ( x)

Sample from f (x, u), then discard u

57

Slice Sampling [Damien et al. 1999] P(x)

U Slice

u(k) X

x(k)

x(k+1)

58

Slice Sampling 

Identifying the slice may be difficult

P(x) 

1 Z





 i(x)

i

Introduce an auxiliary variable ui for each Фi f ( x , u1 ,

 1 if 0  u i   i ( x ) ,un )    0 o th e r w is e

59

The MC-SAT Algorithm 

Select random subset M of satisfied clauses   

  

With probability 1 – exp ( – wi ) Larger wi  Ci more likely to be selected Hard clause (wi  ): Always selected

Slice  States that satisfy clauses in M Uses SAT solver to sample x | u. Orders of magnitude faster than Gibbs sampling, etc.

60

But … It Is Not Scalable   



1000 researchers Coauthor(x,y): 1 million ground atoms Coauthor(x,y)  Coauthor(y,z)  Coauthor(x,z): 1 billion ground clauses Exponential in arity

61

Sparsity to the Rescue  





1000 researchers Coauthor(x,y): 1 million ground atoms But … most atoms are false Coauthor(x,y)  Coauthor(y,z)  Coauthor(x,z): 1 billion ground clauses Most trivially satisfied if most atoms are false No need to explicitly compute most of them

62

Lazy Inference 

LazySAT [Singla & Domingos, 2006a]   



Lazy version of WalkSAT [Selman et al., 1996] Grounds atoms/clauses as needed Greatly reduces memory usage

The idea is much more general [Poon & Domingos, 2008a]

63

General Method for Lazy Inference 



If most variables assume the default value, wasteful to instantiate all variables / functions Main idea: 





Allocate memory for a small subset of “active” variables / functions Activate more if necessary as inference proceeds

Applicable to a diverse set of algorithms: Satisfiability solvers (systematic, local-search), Markov chain Monte Carlo, MPE / MAP algorithms, Maximum expected utility algorithms, Belief propagation, MC-SAT, Etc.



Reduce memory and time by orders of magnitude 64

Lifted Inference 



  

Consider belief propagation (BP) Often in large problems, many nodes are interchangeable: They send and receive the same messages throughout BP Basic idea: Group them into supernodes, forming lifted network Smaller network → Faster inference Akin to resolution in first-order logic 65

Belief Propagation  x f ( x ) 



 h x ( x )

h  n ( x ) \{ f }

Features (f)

Nodes (x)

 wf  f  x (x)    e  ~{ x} 

(x)





y f

y  n ( f ) \{ x }

 ( y)  

66

Lifted Belief Propagation  x f ( x ) 



 h x ( x )

h  n ( x ) \{ f }

Features (f)

Nodes (x)

 wf  f  x (x)    e  ~{ x} 

(x)





y f

y  n ( f ) \{ x }

 ( y)  

67

Lifted Belief Propagation , : Functions of edge counts

 x f ( x )  



  h x ( x )

h  n ( x ) \{ f }

Features (f)

Nodes (x)

 wf  f  x (x)    e  ~{ x} 

(x)





y f

y  n ( f ) \{ x }

 ( y)  

68

Learning    

Data is a relational database Closed world assumption (if not: EM) Learning parameters (weights) Learning structure (formulas)

69

Parameter Learning 

Parameter tying: Groundings of same clause  wi

lo g P ( x )  n i ( x )  E x  n i ( x ) 

No. of times clause i is true in data Expected no. times clause i is true according to MLN

 

Generative learning: Pseudo-likelihood Discriminative learning: Conditional likelihood, use MC-SAT or MaxWalkSAT for inference 70

Parameter Learning 





Pseudo-likelihood + L-BFGS is fast and robust but can give poor inference results Voted perceptron: Gradient descent + MAP inference Scaled conjugate gradient

71

Voted Perceptron for MLNs 

 

HMMs are special case of MLNs Replace Viterbi by MaxWalkSAT Network can now be arbitrary graph wi ← 0 for t ← 1 to T do yMAP ← MaxWalkSAT(x) wi ← wi + η [counti(yData) – counti(yMAP)] return  wi / T 72

Problem: Multiple Modes 

 

Not alleviated by contrastive divergence Alleviated by MC-SAT Warm start: Start each MC-SAT run at previous end state

73

Problem: Extreme Ill-Conditioning

  

Solvable by quasi-Newton, conjugate gradient, etc. But line searches require exact inference Solution: Scaled conjugate gradient [Lowd & Domingos, 2008]

  

Use Hessian to choose step size Compute quadratic form inside MC-SAT Use inverse diagonal Hessian as preconditioner 74

Structure Learning 

 





Standard inductive logic programming optimizes the wrong thing But can be used to overgenerate for L1 pruning Our approach: ILP + Pseudo-likelihood + Structure priors For each candidate structure change: Start from current weights & relax convergence Use subsampling to compute sufficient statistics

75

Structure Learning 

 



Initial state: Unit clauses or prototype KB Operators: Add/remove literal, flip sign Evaluation function: Pseudo-likelihood + Structure prior Search: Beam search, shortest-first search

76

Alchemy Open-source software including:  Full first-order logic syntax  Generative & discriminative weight learning  Structure learning  Weighted satisfiability, MCMC, lifted BP  Programming language features alchemy.cs.washington.edu 77

Alchemy Represent- F.O. Logic + ation Markov nets Inference

Learning

Prolog

BUGS

Horn clauses

Bayes nets

Model check- Theorem MCMC ing, MCMC, proving lifted BP Parameters No Params. & structure

Uncertainty Yes

No

Yes

Relational

Yes

No

Yes

78

Constrained Conditional Model 

 

Representation: Integer linear programs Local classifiers + Global constraints Inference: LP solver Parameter learning: None for constraints  

 

Weights of soft constraints set heuristically Local weights typically learned independently

Structure learning: None to date But see latest development in NAACL-10 79

Running Alchemy 

Programs   





Infer Learnwts Learnstruct

Options

MLN file   



Types (optional) Predicates Formulas

Database files

80

Overview 

  

Motivation Foundational areas Markov logic NLP applications   

Basics Supervised learning Unsupervised learning

81

Uniform Distribn.: Empty MLN Example: Unbiased coin flips Type: flip = { 1, … , 20 } Predicate: Heads(flip) 1

P ( H e a d s ( f )) 

Z 1 Z

e

e  0

0

1 Z

e

0



1 2

82

Binomial Distribn.: Unit Clause Example: Biased coin flips Type: flip = { 1, … , 20 } Predicate: Heads(flip) Formula: Heads(f) Weight:

 p w  log  1 p

Log odds of heads: 1

P ( H ead s(f )) 

Z 1 Z

e

w

e 

w

1 Z

e

0



1 1 e

w

   

 p

By default, MLN includes unit clauses for all predicates (captures marginal distributions, etc.) 83

Multinomial Distribution Example: Throwing die throw = { 1, … , 20 } face = { 1, … , 6 } Predicate: Outcome(throw,face) Formulas: Outcome(t,f) ^ f != f’ => !Outcome(t,f’). Exist f Outcome(t,f). Types:

Too cumbersome!

84

Multinomial Distrib.: ! Notation Example: Throwing die throw = { 1, … , 20 } face = { 1, … , 6 } Predicate: Outcome(throw,face!) Types:

Formulas: Semantics: Arguments without “!” determine arguments with “!”. Also makes inference more efficient (triggers blocking).

85

Multinomial Distrib.: + Notation Example: Throwing biased die throw = { 1, … , 20 } face = { 1, … , 6 } Predicate: Outcome(throw,face!) Formulas: Outcome(t,+f) Types:

Semantics: Learn weight for each grounding of args with “+”.

86

Logistic Regression (MaxEnt) Logistic regression:

 P (C  1 | F  f )    a   bi f i log    P (C  0 | F  f ) 

Type: obj = { 1, ... , n } Query predicate: C(obj) Evidence predicates: Fi(obj) Formulas: a C(x) bi Fi(x) ^ C(x) Resulting distribution: P ( C  c , F  f ) 

Therefore:

 exp  ac  Z  1

 i

 bi f i c  

 exp  a   b i f i    P (C  1 | F  f )    a    log  log     exp( 0 )  P (C  0 | F  f )   

Alternative form:



bi f i

Fi(x) => C(x) 87

Hidden Markov Models obs = { Red, Green, Yellow } state = { Stop, Drive, Slow } time = { 0, ..., 100 } State(state!,time) Obs(obs!,time) State(+s,0) State(+s,t) ^ State(+s',t+1) Obs(+o,t) ^ State(+s,t) Sparse HMM: State(s,t) => State(s1,t+1) v State(s2, t+1) v ... . 88

Bayesian Networks 

 







Use all binary predicates with same first argument (the object x). One predicate for each variable A: A(x,v!) One clause for each line in the CPT and value of the variable Context-specific independence: One clause for each path in the decision tree Logistic regression: As before Noisy OR: Deterministic OR + Pairwise clauses 89

Relational Models 

Knowledge-based model construction   



Stochastic logic programs   



Allow only Horn clauses Same as Bayes nets, except arbitrary relations Combin. function: Logistic regression, noisy-OR or external

Allow only Horn clauses Weight of clause = log(p) Add formulas: Head holds  Exactly one body holds

Probabilistic relational models  

Allow only binary relations Same as Bayes nets, except first argument can vary 90

Relational Models 

Relational Markov networks  





Bayesian logic   



SQL → Datalog → First-order logic One clause for each state of a clique + syntax in Alchemy facilitates this Object = Cluster of similar/related observations Observation constants + Object constants Predicate InstanceOf(Obs,Obj) and clauses using it

Unknown relations: Second-order Markov logic S. Kok & P. Domingos, “Statistical Predicate Invention”, in Proc. ICML-2007. 91

Overview 

  

Motivation Foundational areas Markov logic NLP applications   

Basics Supervised learning Unsupervised learning

92

Text Classification The 56th quadrennial United States presidential election was held on November 4, 2008. Outgoing Republican President George W. Bush's policies and actions and the American public's desire for change were key issues throughout the campaign. ……

Topic = politics

The Chicago Bulls are an American professional basketball team based in Chicago, Illinois, playing in the Central Division of the Eastern Conference in the National Basketball Association (NBA). ……

Topic = sports

……

93

Text Classification page = {1, ..., max} word = { ... } topic = { ... } Topic(page,topic) HasWord(page,word)

Topic(p,t) HasWord(p,+w) => Topic(p,+t)

If topics mutually exclusive: Topic(page,topic!)

94

Text Classification page = {1, ..., max} word = { ... } topic = { ... } Topic(page,topic) HasWord(page,word) Links(page,page)

Topic(p,t) HasWord(p,+w) => Topic(p,+t) Topic(p,t) ^ Links(p,p') => Topic(p',t)

Cf. S. Chakrabarti, B. Dom & P. Indyk, “Hypertext Classification Using Hyperlinks,” in Proc. SIGMOD-1998. 95

Entity Resolution AUTHOR: H. POON & P. DOMINGOS TITLE: UNSUPERVISED SEMANTIC PARSING VENUE: EMNLP-09 AUTHOR: Hoifung Poon and Pedro Domings TITLE: Unsupervised semantic parsing VENUE: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

SAME?

AUTHOR: Poon, Hoifung and Domings, Pedro TITLE: Unsupervised ontology induction from text VENUE: Proceedings of the Forty-Eighth Annual Meeting of the Association for Computational Linguistics SAME? AUTHOR: H. Poon, P. Domings TITLE: Unsupervised ontology induction VENUE: ACL-10

96

Entity Resolution Problem: Given database, find duplicate records HasToken(token,field,record) SameField(field,record,record) SameRecord(record,record)

HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’) SameField(f,r,r’) => SameRecord(r,r’)

97

Entity Resolution Problem: Given database, find duplicate records HasToken(token,field,record) SameField(field,record,record) SameRecord(record,record)

HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’) SameField(f,r,r’) => SameRecord(r,r’) SameRecord(r,r’) ^ SameRecord(r’,r”) => SameRecord(r,r”)

Cf. A. McCallum & B. Wellner, “Conditional Models of Identity Uncertainty with Application to Noun Coreference,” in Adv. NIPS 17, 2005. 98

Entity Resolution Can also resolve fields: HasToken(token,field,record) SameField(field,record,record) SameRecord(record,record)

HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’) SameField(f,r,r’) SameRecord(r,r’) SameRecord(r,r’) ^ SameRecord(r’,r”) => SameRecord(r,r”) SameField(f,r,r’) ^ SameField(f,r’,r”) => SameField(f,r,r”) More: P. Singla & P. Domingos, “Entity Resolution with Markov Logic”, in Proc. ICDM-2006.

99

Information Extraction Unsupervised Semantic Parsing, Hoifung Poon and Pedro Domingos. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Singapore: ACL.

UNSUPERVISED SEMANTIC PARSING. H. POON & P. DOMINGOS. EMNLP-2009.

100

Information Extraction Author

Title

Venue

Unsupervised Semantic Parsing, Hoifung Poon and Pedro Domingos. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Singapore: ACL.

SAME? UNSUPERVISED SEMANTIC PARSING. H. POON & P. DOMINGOS. EMNLP-2009.

101

Information Extraction 





Problem: Extract database from text or semi-structured sources Example: Extract database of publications from citation list(s) (the “CiteSeer problem”) Two steps: 



Segmentation: Use HMM to assign tokens to fields Entity resolution: Use logistic regression and transitivity 102

Information Extraction Token(token, position, citation) InField(position, field!, citation) SameField(field, citation, citation) SameCit(citation, citation) Token(+t,i,c) => InField(i,+f,c) InField(i,+f,c) ^ InField(i+1,+f,c)

Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’) SameField(+f,c,c’) SameCit(c,c’) SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”) SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)

103

Information Extraction Token(token, position, citation) InField(position, field!, citation) SameField(field, citation, citation) SameCit(citation, citation) Token(+t,i,c) => InField(i,+f,c) InField(i,+f,c) ^ !Token(“.”,i,c) ^ InField(i+1,+f,c)

Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’) SameField(+f,c,c’) SameCit(c,c’) SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”) SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)

More: H. Poon & P. Domingos, “Joint Inference in Information Extraction”, in Proc. AAAI-2007. 104

Biomedical Text Mining 

Traditionally, name entity recognition or information extraction E.g., protein recognition, protein-protein identification



BioNLP-09 shared task: Nested bio-events   

Much harder than traditional IE Top F1 around 50% Naturally calls for joint inference

105

Bio-Event Extraction Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... involvement Theme

Cause

up-regulation Theme

IL-10

Cause

Site

gp41

human monocyte

activation Theme

p70(S6)-kinase 106

Bio-Event Extraction Token(position, token) DepEdge(position, position, dependency) IsProtein(position) EvtType(position, evtType) InArgPath(position, position, argType!)

Logistic regression

Token(i,+w) => EvtType(i,+t) Token(j,w) ^ DepEdge(i,j,+d) => EvtType(i,+t) DepEdge(i,j,+d) => InArgPath(i,j,+a) Token(i,+w) ^ DepEdge(i,j,+d) => InArgPath(i,j,+a) …

107

Bio-Event Extraction Token(position, token) DepEdge(position, position, dependency) IsProtein(position) EvtType(position, evtType) InArgPath(position, position, argType!) Token(i,+w) => EvtType(i,+t) Token(j,w) ^ DepEdge(i,j,+d) => EvtType(i,+t) DepEdge(i,j,+d) => InArgPath(i,j,+a) Adding a few joint inference Token(i,+w) ^ DepEdge(i,j,+d) => InArgPath(i,j,+a) rules doubles the F1 … InArgPath(i,j,Theme) => IsProtein(j) v (Exist k k!=i ^ InArgPath(j, k, Theme)). …

More: H. Poon and L. Vanderwende, “Joint Inference for Knowledge Extraction from Biomedical Literature”, 10:40 am, June 4, Gold Room.

108

Temporal Information Extraction 



Identify event times and temporal relations (BEFORE, AFTER, OVERLAP) E.g., who is the President of U.S.A.? 

 

Obama: 1/20/2009  present G. W. Bush: 1/20/2001  1/19/2009 Etc.

109

Temporal Information Extraction DepEdge(position, position, dependency) Event(position, event) After(event, event)

DepEdge(i,j,+d) ^ Event(i,p) ^ Event(j,q) => After(p,q)

After(p,q) ^ After(q,r) => After(p,r)

110

Temporal Information Extraction DepEdge(position, position, dependency) Event(position, event) After(event, event) Role(position, position, role) DepEdge(I,j,+d) ^ Event(i,p) ^ Event(j,q) => After(p,q) Role(i,j,ROLE-AFTER) ^ Event(i,p) ^ Event(j,q) => After(p,q) After(p,q) ^ After(q,r) => After(p,r) More:

K. Yoshikawa, S. Riedel, M. Asahara and Y. Matsumoto, “Jointly Identifying Temporal Relations with Markov Logic”, in Proc. ACL-2009. X. Ling & D. Weld, “Temporal Information Extraction”, in Proc. AAAI-2010. 111

Semantic Role Labeling 



Problem: Identify arguments for a predicate Two steps: 



Argument identification: Determine whether a phrase is an argument Role classification: Determine the type of an argument (agent, theme, temporal, adjunct, etc.)

112

Semantic Role Labeling Token(position, token) DepPath(position, position, path) IsPredicate(position) Role(position, position, role!) HasRole(position, position)

Token(i,+t) => IsPredicate(i) DepPath(i,j,+p) => Role(i,j,+r)

HasRole(i,j) => IsPredicate(i) IsPredicate(i) => Exist j HasRole(i,j) HasRole(i,j) => Exist r Role(i,j,r) Role(i,j,r) => HasRole(i,j)

Cf. K. Toutanova, A. Haghighi, C. Manning, “A global joint model for semantic role labeling”, in Computational Linguistics 2008.

113

Joint Semantic Role Labeling and Word Sense Disambiguation Token(position, token) DepPath(position, position, path) IsPredicate(position) Role(position, position, role!) HasRole(position, position) Sense(position, sense!) Token(i,+t) => IsPredicate(i) DepPath(i,j,+p) => Role(i,j,+r) Sense(I,s) => IsPredicate(i) HasRole(i,j) => IsPredicate(i) IsPredicate(i) => Exist j HasRole(i,j) HasRole(i,j) => Exist r Role(i,j,r) Role(i,j,r) => HasRole(i,j) Token(i,+t) ^ Role(i,j,+r) => Sense(i,+s)

More: I. Meza-Ruiz & S. Riedel, “Jointly Identifying Predicates, Arguments and Senses using Markov Logic”, in Proc. NAACL-2009.

114

Practical Tips: Modeling 



Add all unit clauses (the default) How to handle uncertain data:

R(x,y) ^ R’(x,y) (the “HMM trick”) 

Implications vs. conjunctions For soft correlation, conjunctions often better  Implication: A => B is equivalent to !(A ^ !B)  Share cases with others like A => C  Make learning unnecessarily harder 115

Practical Tips: Efficiency 

  

Open/closed world assumptions Low clause arities Low numbers of constants Short inference chains

116

Practical Tips: Development 

  

Start with easy components Gradually expand to full task Use the simplest MLN that works Cycle: Add/delete formulas, learn and test

117

Overview 

  

Motivation Foundational areas Markov logic NLP applications   

Basics Supervised learning Unsupervised learning

118

Unsupervised Learning: Why? 

 



Virtually unlimited supply of unlabeled text Labeling is expensive (Cf. Penn-Treebank) Often difficult to label with consistency and high quality (e.g., semantic parses) Emerging field: Machine reading Extract knowledge from unstructured text with high precision/recall and minimal human effort Check out LBR-Workshop (WS9) on Sunday 119

Unsupervised Learning: How? 



I.i.d. learning: Sophisticated model requires more labeled data Statistical relational learning: Sophisticated model may require less labeled data  



Relational dependencies constrain problem space One formula is worth a thousand labels

Small amount of domain knowledge  large-scale joint inference 120

Unsupervised Learning: How? 





Ambiguities vary among objects Joint inference  Propagate information from unambiguous objects to ambiguous ones E.g.: Are they G. W. Bush … coreferent? He … … Mrs. Bush … 121

Unsupervised Learning: How 





Ambiguities vary among objects Joint inference  Propagate information from unambiguous objects to ambiguous ones E.g.: Should be G. W. Bush … coreferent He … … Mrs. Bush … 122

Unsupervised Learning: How 





Ambiguities vary among objects Joint inference  Propagate information from unambiguous objects to ambiguous ones E.g.: So must be G. W. Bush … singular male! He … … Mrs. Bush … 123

Unsupervised Learning: How 





Ambiguities vary among objects Joint inference  Propagate information from unambiguous objects to ambiguous ones E.g.: Must be G. W. Bush … singular female! He … … Mrs. Bush … 124

Unsupervised Learning: How 





Ambiguities vary among objects Joint inference  Propagate information from unambiguous objects to ambiguous ones E.g.: Verdict: G. W. Bush … Not coreferent! He … … Mrs. Bush … 125

Parameter Learning 

Marginalize out hidden variables  wi

lo g P ( x )  E z | x  n i ( x , z )   E x , z  n i ( x , z ) 

Sum over z, conditioned on observed x Summed over both x and z

 

Use MC-SAT to approximate both expectations May also combine with contrastive estimation [Poon & Cherry & Toutanova, NAACL-2009] 126

Unsupervised Coreference Resolution Head(mention, string) Type(mention, type) MentionOf(mention, entity)

Mixture model

MentionOf(+m,+e) Type(+m,+t) Head(+m,+h) ^ MentionOf(+m,+e)

Joint inference formulas: Enforce agreement

MentionOf(a,e) ^ MentionOf(b,e) => (Type(a,t) Type(b,t)) … (similarly for Number, Gender etc.)

127

Unsupervised Coreference Resolution Head(mention, string) Type(mention, type) MentionOf(mention, entity) Apposition(mention, mention) MentionOf(+m,+e) Type(+m,+t) Head(+m,+h) ^ MentionOf(+m,+e) MentionOf(a,e) ^ MentionOf(b,e) => (Type(a,t) Type(b,t)) … (similarly for Number, Gender etc.)

Joint inference formulas: Leverage apposition

Apposition(a,b) => (MentionOf(a,e) MentionOf(b,e))

More: H. Poon and P. Domingos, “Joint Unsupervised Coreference Resolution with Markov Logic”, in Proc. EMNLP-2008.

128

Relational Clustering: Discover Unknown Predicates 



Cluster relations along with objects Use second-order Markov logic [Kok & Domingos, 2007, 2008]



Key idea: Cluster combination determines likelihood of relations InClust(r,+c) ^ InClust(x,+a) ^ InClust(y,+b) => r(x,y)

 

Input: Relational tuples extracted by TextRunner [Banko et al., 2007] Output: Semantic network 129

Recursive Relational Clustering 

Unsupervised semantic parsing [Poon & Domingos, EMNLP-2009]



Text  Knowledge   



Start directly from text Identify meaning units + Resolve variations Use high-order Markov logic (variables over arbitrary lambda forms and their clusters)

End-to-end machine reading: Read text, then answer questions 130

Semantic Parsing IL-4 protein induces CD11b

INDUCE(e1) INDUCER(e1,e2) INDUCED(e1,e3) IL-4(e2)

CD11B(e3)

Structured prediction: Partition + Assignment induces nsubj

protein nn

IL-4

dobj

CD11b

INDUCE

induces

INDUCER nsubj

dobj INDUCED

protein

CD11b

nn

CD11B

IL-4 IL-4

131

Challenge: Same Meaning, Many Variations IL-4 up-regulates CD11b Protein IL-4 enhances the expression of CD11b CD11b expression is induced by IL-4 protein The cytokin interleukin-4 induces CD11b expression IL-4’s up-regulation of CD11b, … ……

132

Unsupervised Semantic Parsing 

USP  Recursively cluster arbitrary expressions composed with / by similar expressions IL-4 induces CD11b Protein IL-4 enhances the expression of CD11b CD11b expression is enhanced by IL-4 protein The cytokin interleukin-4 induces CD11b expression IL-4’s up-regulation of CD11b, …

133

Unsupervised Semantic Parsing 

USP  Recursively cluster arbitrary expressions composed with / by similar expressions IL-4 induces CD11b Protein IL-4 enhances the expression of CD11b CD11b expression is enhanced by IL-4 protein The cytokin interleukin-4 induces CD11b expression IL-4’s up-regulation of CD11b, …

Cluster same forms at the atom level 134

Unsupervised Semantic Parsing 

USP  Recursively cluster arbitrary expressions composed with / by similar expressions IL-4 induces CD11b Protein IL-4 enhances the expression of CD11b CD11b expression is enhanced by IL-4 protein The cytokin interleukin-4 induces CD11b expression IL-4’s up-regulation of CD11b, …

Cluster forms in composition with same forms 135

Unsupervised Semantic Parsing 

USP  Recursively cluster arbitrary expressions composed with / by similar expressions IL-4 induces CD11b Protein IL-4 enhances the expression of CD11b CD11b expression is enhanced by IL-4 protein The cytokin interleukin-4 induces CD11b expression IL-4’s up-regulation of CD11b, …

Cluster forms in composition with same forms 136

Unsupervised Semantic Parsing 

USP  Recursively cluster arbitrary expressions composed with / by similar expressions IL-4 induces CD11b Protein IL-4 enhances the expression of CD11b CD11b expression is enhanced by IL-4 protein The cytokin interleukin-4 induces CD11b expression IL-4’s up-regulation of CD11b, …

Cluster forms in composition with same forms 137

Unsupervised Semantic Parsing 

USP  Recursively cluster arbitrary expressions composed with / by similar expressions IL-4 induces CD11b Protein IL-4 enhances the expression of CD11b CD11b expression is enhanced by IL-4 protein The cytokin interleukin-4 induces CD11b expression IL-4’s up-regulation of CD11b, …

Cluster forms in composition with same forms 138

Unsupervised Semantic Parsing 



Exponential prior on number of parameters Event/object/property cluster mixtures: InClust(e,+c) ^ HasValue(e,+v) Object/Event Cluster: INDUCE induces 0.1 enhances 0.4

Property Cluster: INDUCER IL-4

0.2

None 0.1

agent 0.4

IL-8

0.1

One 0.8











nsubj 0.5

139

But … State Space Too Large Coreference: #-clusters  #-mentions  USP: #-clusters  exp(#-tokens)  Also, meaning units often small and many singleton clusters  Use combinatorial search 

140

Inference: Hill-Climb Probability ?

?

Initialize

? ?

induces

nsubj

dobj

protein

CD11B

? ?

nn

?

IL-4

Lambda reduction ?

Search Operator

? ?

protein

protein nn

IL-4

?

nn

IL-4 141

Learning: Hill-Climb Likelihood Initialize

induces

1

enhances 1

MERGE

Search Operator

induces 1

enhances 1

induces 0.2 enhances 0.8

IL-4 1

protein 1



COMPOSE IL-4 1

protein

IL-4 protein

1

1

142

Unsupervised Ontology Induction 

Limitations of USP:   



OntoUSP [Poon & Domingos, ACL-2010]  



No ISA hierarchy among clusters Little smoothing Limited capability to generalize Extends USP to also induce ISA hierarchy Joint approach for ontology induction, population, and knowledge extraction

To appear in ACL (see you in Uppsala :-) 143

OntoUSP 

Modify the cluster mixture formula InClust(e,c) ^ ISA(c,+d) ^ HasValue(e,+v)

 

Hierarchical smoothing + clustering New operator in learning: ABSTRACTION induces enhances inhibits suppresses

induces 0.6 INDUCE up-regulates 0.2

INHIBIT

0.3 0.1 0.2 0.1



… ISA

inhibits 0.4 suppresses 0.2

MERGE with REGULATE ?

ISA

INHIBIT

INDUCE

inhibits 0.4 suppresses 0.2







induces 0.6 up-regulates 0.2

144

End of The Beginning …  

Not merely a user guide of MLN and Alchemy Statistical relational learning: Growth area for machine learning and NLP

145

Future Work: Inference 

Scale up inference   



Cutting-planes methods (e.g., [Riedel, 2008]) Unify lifted inference with sampling Coarse-to-fine inference

Alternative technology E.g., linear programming, lagrangian relaxation

146

Future Work: Supervised Learning 

Alternative optimization objectives E.g., max-margin learning [Huynh & Mooney, 2009]



Learning for efficient inference E.g., learning arithmetic circuits [Lowd & Domingos, 2008]



Structure learning: Improve accuracy and scalability E.g., [Kok & Domingos, 2009]

147

Future Work: Unsupervised Learning 

  

Model: Learning objective, formalism, etc. Learning: Local optima, intractability, etc. Hyperparameter tuning Leverage available resources   



Semi-supervised learning Multi-task learning Transfer learning (e.g., domain adaptation)

Human in the loop E.g., interative ML, active learning, crowdsourcing 148

Future Work: NLP Applications 

Existing application areas:   

 

More joint inference opportunities Additional domain knowledge Combine multiple pipeline stages

A “killer app”: Machine reading Many, many more awaiting YOU to discover

149

Summary  

We need to unify logical and statistical NLP Markov logic provides a language for this  

 





Syntax: Weighted first-order formulas Semantics: Feature templates of Markov nets Inference: Satisfiability, MCMC, lifted BP, etc. Learning: Pseudo-likelihood, VP, PSCG, ILP, etc.

Growing set of NLP applications Open-source software: Alchemy alchemy.cs.washington.edu



Book: Domingos & Lowd, Markov Logic, Morgan & Claypool, 2009.

150

References [Banko et al., 2007] Michele Banko, Michael J. Cafarella, Stephen Soderland, Matt Broadhead, Oren Etzioni, "Open Information Extraction From the Web", In Proc. IJCAI-2007. [Chakrabarti et al., 1998] Soumen Chakrabarti, Byron Dom, Piotr Indyk, "Hypertext Classification Using Hyperlinks", in Proc. SIGMOD-1998. [Damien et al., 1999] Paul Damien, Jon Wakefield, Stephen Walker, "Gibbs sampling for Bayesian non-conjugate and hierarchical models by auxiliary variables", Journal of the Royal Statistical Society B, 61:2. [Domingos & Lowd, 2009] Pedro Domingos and Daniel Lowd, Markov Logic, Morgan & Claypool. [Friedman et al., 1999] Nir Friedman, Lise Getoor, Daphne Koller, Avi Pfeffer, "Learning probabilistic relational models", in Proc. IJCAI- 151 1999.

References [Halpern, 1990] Joe Halpern, "An analysis of first-order logics of probability", Artificial Intelligence 46. [Huynh & Mooney, 2009] Tuyen Huynh and Raymond Mooney, "MaxMargin Weight Learning for Markov Logic Networks", In Proc. ECML-2009. [Kautz et al., 1997] Henry Kautz, Bart Selman, Yuejun Jiang, "A general stochastic approach to solving problems with hard and soft constraints", In The Satisfiability Problem: Theory and Applications. AMS. [Kok & Domingos, 2007] Stanley Kok and Pedro Domingos, "Statistical Predicate Invention", In Proc. ICML-2007. [Kok & Domingos, 2008] Stanley Kok and Pedro Domingos, "Extracting Semantic Networks from Text via Relational Clustering", In Proc. 152 ECML-2008.

References [Kok & Domingos, 2009] Stanley Kok and Pedro Domingos, "Learning Markov Logic Network Structure via Hypergraph Lifting", In Proc. ICML-2009. [Ling & Weld, 2010] Xiao Ling and Daniel S. Weld, "Temporal Information Extraction", In Proc. AAAI-2010. [Lowd & Domingos, 2007] Daniel Lowd and Pedro Domingos, "Efficient Weight Learning for Markov Logic Networks", In Proc. PKDD-2007. [Lowd & Domingos, 2008] Daniel Lowd and Pedro Domingos, "Learning Arithmetic Circuits", In Proc. UAI-2008.

[Meza-Ruiz & Riedel, 2009] Ivan Meza-Ruiz and Sebastian Riedel, "Jointly Identifying Predicates, Arguments and Senses using Markov Logic", In Proc. NAACL-2009. 153

References [Muggleton, 1996] Stephen Muggleton, "Stochastic logic programs", in Proc. ILP-1996. [Nilsson, 1986] Nil Nilsson, "Probabilistic logic", Artificial Intelligence 28. [Page et al., 1998] Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd, "The PageRank Citation Ranking: Bringing Order to the Web", Tech. Rept., Stanford University, 1998. [Poon & Domingos, 2006] Hoifung Poon and Pedro Domingos, "Sound and Efficient Inference with Probabilistic and Deterministic Dependencies", In Proc. AAAI-06. [Poon & Domingos, 2007] Hoifung Poon and Pedro Domingo, "Joint Inference in Information Extraction", In Proc. AAAI-07. 154

References [Poon & Domingos, 2008a] Hoifung Poon, Pedro Domingos, Marc Sumner, "A General Method for Reducing the Complexity of Relational Inference and its Application to MCMC", In Proc. AAAI08. [Poon & Domingos, 2008b] Hoifung Poon and Pedro Domingos, "Joint Unsupervised Coreference Resolution with Markov Logic", In Proc. EMNLP-08. [Poon & Domingos, 2009] Hoifung and Pedro Domingos, "Unsupervised Semantic Parsing", In Proc. EMNLP-09. [Poon & Cherry & Toutanova, 2009] Hoifung Poon, Colin Cherry, Kristina Toutanova, "Unsupervised Morphological Segmentation with Log-Linear Models", In Proc. NAACL-2009. 155

References [Poon & Vanderwende, 2010] Hoifung Poon and Lucy Vanderwende, "Joint Inference for Knowledge Extraction from Biomedical Literature", In Proc. NAACL-10. [Poon & Domingos, 2010] Hoifung and Pedro Domingos, "Unsupervised Ontology Induction From Text", In Proc. ACL-10. [Riedel 2008] Sebatian Riedel, "Improving the Accuracy and Efficiency of MAP Inference for Markov Logic", In Proc. UAI-2008. [Riedel et al., 2009] Sebastian Riedel, Hong-Woo Chun, Toshihisa Takagi and Jun'ichi Tsujii, "A Markov Logic Approach to BioMolecular Event Extraction", In Proc. BioNLP 2009 Shared Task. [Selman et al., 1996] Bart Selman, Henry Kautz, Bram Cohen, "Local search strategies for satisfiability testing", In Cliques, Coloring, and Satisfiability: Second DIMACS Implementation Challenge. AMS. 156

References [Singla & Domingos, 2006a] Parag Singla and Pedro Domingos, "Memory-Efficient Inference in Relational Domains", In Proc. AAAI2006. [Singla & Domingos, 2006b] Parag Singla and Pedro Domingos, "Entity Resolution with Markov Logic", In Proc. ICDM-2006. [Singla & Domingos, 2007] Parag Singla and Pedro Domingos, "Markov Logic in Infinite Domains", In Proc. UAI-2007. [Singla & Domingos, 2008] Parag Singla and Pedro Domingos, "Lifted First-Order Belief Propagation", In Proc. AAAI-2008.

[Taskar et al., 2002] Ben Taskar, Pieter Abbeel, Daphne Koller, "Discriminative probabilistic models for relational data", in Proc. UAI2002. 157

References [Toutanova & Haghighi & Manning, 2008] Kristina Toutanova, Aria Haghighi, Chris Manning, "A global joint model for semantic role labeling", Computational Linguistics. [Wang & Domingos, 2008] Jue Wang and Pedro Domingos, "Hybrid Markov Logic Networks", In Proc. AAAI-2008. [Wellman et al., 1992] Michael Wellman, John S. Breese, Robert P. Goldman, "From knowledge bases to decision models", Knowledge Engineering Review 7. [Yoshikawa et al., 2009] Katsumasa Yoshikawa, Sebastian Riedel, Masayuki Asahara and Yuji Matsumoto, "Jointly Identifying Temporal Relations with Markov Logic", In Proc. ACL-2009.

158