Types for Database Query Languages Polymorphism, Complexity, and Completeness

Types for Database Query Languages Polymorphism, Complexity, and Completeness Stijn Vansummeren Universit´e Libre de Bruxelles 10 May 2010 Introdu...
Author: Barnaby Turner
0 downloads 1 Views 234KB Size
Types for Database Query Languages Polymorphism, Complexity, and Completeness

Stijn Vansummeren Universit´e Libre de Bruxelles

10 May 2010

Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others

2 / 67

Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others

3 / 67

Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others

How does this specialize to database query languages?

4 / 67

Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others

How does this specialize to database query languages? • Limited expressiveness (not Turing-complete) • No higher-order functions, no subtyping • Only records, collections 5 / 67

Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others

How does this specialize to database query languages? • Limited expressiveness (not Turing-complete) → complete type systems? • No higher-order functions, no subtyping • Only records, collections 6 / 67

Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others

How does this specialize to database query languages? • Limited expressiveness (not Turing-complete) → complete type systems? • No higher-order functions, no subtyping → complexity of typability? • Only records, collections 7 / 67

Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others

How does this specialize to database query languages? • Limited expressiveness (not Turing-complete) → complete type systems? • No higher-order functions, no subtyping → complexity of typability? • Only records, collections → specialized type inference algorithms? 8 / 67

Introduction Results presented are from the following papers: • On the Complexity of Deciding Typability in the Relational Algebra Acta Informatica, 2005 • Polymorphic Type Inference for the Named Nested Relational Calculus ACM TOCL, 2006 • Well-Definedness and Semantic Type-Checking for the Nested Relational Calculus Theoretical Computer Science, 2007 • Unpublished notes

This is joint work with • Dirk Van Gucht, Indiana University, USA • Jan Van den Bussche, Hasselt University, Belgium

9 / 67

Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects

o

::=

c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }

10 / 67

Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects

o

::=

c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }

Expressions

e

::= |

x | o | (A : e, . . . , B : e 0 ) | e.A {} | {e} | e1 ∪ e2 | {e | x1 ∈ e1 , . . . , xn ∈ en }

Binder List



::=

x1 ∈ e1 , . . . , xn ∈ en

Types

s, t

::=

int | string | · · · | (A : s, . . . , B : t) | {s}

Example: {(A : y .C , B : z) | y ∈ x1 , z ∈ x2 }

11 / 67

Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects

o

::=

c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }

Expressions

e

::= |

x | o | (A : e, . . . , B : e 0 ) | e.A {} | {e} | e1 ∪ e2 | {e | x1 ∈ e1 , . . . , xn ∈ en }

Binder List



::=

x1 ∈ e1 , . . . , xn ∈ en

Types

s, t

::=

int | string | · · · | (A : s, . . . , B : t) | {s}

Example: {(A : y .C , B : z) | y ∈ x1 , z ∈ x2 } x1 C 1 3

x2 D 2 4

3 8



A 1 1 3 3

B 3 8 3 8

12 / 67

Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects

o

::=

c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }

Expressions

e

::= |

x | o | (A : e, . . . , B : e 0 ) | e.A {} | {e} | e1 ∪ e2 | {e | x1 ∈ e1 , . . . , xn ∈ en }

Binder List



::=

x1 ∈ e1 , . . . , xn ∈ en

Types

s, t

::=

int | string | · · · | (A : s, . . . , B : t) | {s}

Operational semantics: e→o

... 0

e0 → o0

e → (A : o, . . . , B : o 0 ) 0

(A : e, . . . , B : e ) → (A : o, . . . , B : o )

e.A → o

13 / 67

Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects

o

::=

c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }

Expressions

e

::= |

x | o | (A : e, . . . , B : e 0 ) | e.A {} | {e} | e1 ∪ e2 | {e | x1 ∈ e1 , . . . , xn ∈ en }

Binder List



::=

x1 ∈ e1 , . . . , xn ∈ en

Types

s, t

::=

int | string | · · · | (A : s, . . . , B : t) | {s}

Operational semantics: e→o {} → {}

{e} → {o}

e1 → {o1 , . . . , om }

e2 → {o10 , . . . , on0 }

e1 ∪ e2 → {o1 , . . . , om , o10 , . . . , on0 }

14 / 67

Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects

o

::=

c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }

Expressions

e

::= |

x | o | (A : e, . . . , B : e 0 ) | e.A {} | {e} | e1 ∪ e2 | {e | x1 ∈ e1 , . . . , xn ∈ en }

Binder List



::=

x1 ∈ e1 , . . . , xn ∈ en

Types

s, t

::=

int | string | · · · | (A : s, . . . , B : t) | {s}

Operational semantics: e→o

e1 → {}

{e | } → {o}

{e | x1 ∈ e1 , ∆} → {}

e1 → {o, . . . , o 0 } {e[x1 /o] | ∆[x1 /o]} ∪ · · · ∪ {e[x1 /o 0 ] | ∆[x1 /o 0 ]} → o 00 {e | x1 ∈ e1 , ∆} → o 00

15 / 67

In Search of a Complete Static Type System

Static Type Systems for Turing-complete Languages: • Are sound (i.e., can prove the absence of runtime errors) • But necessarily incomplete (i.e., cannot prove that an error will occur)

Question: N RC is not Turing-complete. Does it have a sound and complete static type system?

16 / 67

In Search of a Complete Static Type System (2)

The question is equivalent to the following decision problem: Well-Definedness Input: Expression e(x, . . . , y ) and types s, . . . , t for the free variables. Problem: Decide whether e is well-defined under s, . . . , t, i.e., whether e[x/o, . . . , y /o 0 ] evaluates to an object for all o : s, . . . , o 0 : t.

17 / 67

In Search of a Complete Static Type System (2)

The question is equivalent to the following decision problem: Well-Definedness Input: Expression e(x, . . . , y ) and types s, . . . , t for the free variables. Problem: Decide whether e is well-defined under s, . . . , t, i.e., whether e[x/o, . . . , y /o 0 ] evaluates to an object for all o : s, . . . , o 0 : t.

Theorem Well-Definedness for N RC is decidable.

18 / 67

Language Extensions Consider the Extension of N RC with • Atomic comparisons e1 eq e2 which can only compare two atomic data values • This gives us essentially the conjunctive queries Operational semantics: e1 → c1

e2 → c1

e1 eq e2 → {()}

e1 → c1

e2 → c2

e1 eq e2 → {}

Example: return all records in R whose A-field is 5 {x | x ∈ R, y ∈ (x.A eq 5)}

19 / 67

Language Extensions (2) Theorem • Well-Definedness for N RC(eq) is decidable. • Well-Definedness for N RC(eq) is hard for Co-Nexptime

20 / 67

Language Extensions (2) Theorem • Well-Definedness for N RC(eq) is decidable. • Well-Definedness for N RC(eq) is hard for Co-Nexptime Hardness follows by reduction from: Satisfiability Input: Problem:

Expression e(x, . . . , y ) and types s, . . . , t such that e[x/o, . . . , y /o 0 ] evaluates to a set for all o : s, . . . , o 0 : t. Decide whether there exist objects o : s, . . . , o 0 : t such that e[x/o, . . . , y /o 0 ] evaluates to a non-empty set.

21 / 67

Language Extensions (2) Theorem • Well-Definedness for N RC(eq) is decidable. • Well-Definedness for N RC(eq) is hard for Co-Nexptime Hardness follows by reduction from: Satisfiability Input: Problem:

Expression e(x, . . . , y ) and types s, . . . , t such that e[x/o, . . . , y /o 0 ] evaluates to a set for all o : s, . . . , o 0 : t. Decide whether there exist objects o : s, . . . , o 0 : t such that e[x/o, . . . , y /o 0 ] evaluates to a non-empty set.

• Let e be a closed, well-defined expression that always outputs a set

22 / 67

Language Extensions (2) Theorem • Well-Definedness for N RC(eq) is decidable. • Well-Definedness for N RC(eq) is hard for Co-Nexptime Hardness follows by reduction from: Satisfiability Input: Problem:

Expression e(x, . . . , y ) and types s, . . . , t such that e[x/o, . . . , y /o 0 ] evaluates to a set for all o : s, . . . , o 0 : t. Decide whether there exist objects o : s, . . . , o 0 : t such that e[x/o, . . . , y /o 0 ] evaluates to a non-empty set.

• Let e be a closed, well-defined expression that always outputs a set • Then {{}.A | x ∈ e} is well-def ⇔ e is satisfiable

23 / 67

Language Extensions (2) Theorem • Well-Definedness for N RC(eq) is decidable. • Well-Definedness for N RC(eq) is hard for Co-Nexptime Hardness follows by reduction from: Satisfiability Input: Problem:

Expression e(x, . . . , y ) and types s, . . . , t such that e[x/o, . . . , y /o 0 ] evaluates to a set for all o : s, . . . , o 0 : t. Decide whether there exist objects o : s, . . . , o 0 : t such that e[x/o, . . . , y /o 0 ] evaluates to a non-empty set.

• Let e be a closed, well-defined expression that always outputs a set • Then {{}.A | x ∈ e} is well-def ⇔ e is satisfiable • [Koch; 2006] Satisfiability of closed expressions is Co-Nexptime-hard. 24 / 67

Language Extensions (3) Consider the Extension of N RC with • General comparisons e1 = e2 which can compare arbritrary objects • Gives us at the full power of the relational algebra Operational semantics: e1 → o1

e2 → o1

e1 = e2 → {()}

e1 → o1

e2 → o2

e1 = e2 → {}

25 / 67

Language Extensions (3) Consider the Extension of N RC with • General comparisons e1 = e2 which can compare arbritrary objects • Gives us at the full power of the relational algebra Operational semantics: e1 → o1

e2 → o1

e1 = e2 → {()}

e1 → o1

e2 → o2

e1 = e2 → {}

Theorem • Satisfiability for Relational Algebra is undecidable. • Therefore Satisfiability for N RC(=) is undecidable. • Hence, Well-Definedness for N RC(=) is undecidable.

26 / 67

Language Extensions (4) Consider the Extension of N RC with • Singleton extraction extract(e) that extract the value from a singleton set • present in OQL Operational semantics: e → {o} extract(e) → o This allows us to model some features of SQL • SQL: select ... where (5 = select distinct A from R) • N RC(eq, extract): 5 eq (extract {x.A | x ∈ R})

27 / 67

Language Extensions (4) Consider the Extension of N RC with • Singleton extraction extract(e) that extract the value from a singleton set • present in OQL Operational semantics: e → {o} extract(e) → o This allows us to model some features of SQL • SQL: select ... where (5 = select distinct A from R) • N RC(eq, extract): 5 eq (extract {x.A | x ∈ R})

Theorem Well-Definedness for N RC(eq, extract) is undecidable

28 / 67

In Search of a Complete Static Type System (4) Conclusion: • Complete Static Type Systems exist for restricted query languages (the conjunctive queries) • But these systems have high complexity

29 / 67

In Search of a Complete Static Type System (4) Conclusion: • Complete Static Type Systems exist for restricted query languages (the conjunctive queries) • But these systems have high complexity

Solution: adopt the standard (incomplete) static type system

30 / 67

In Search of a Complete Static Type System (4) Conclusion: • Complete Static Type Systems exist for restricted query languages (the conjunctive queries) • But these systems have high complexity

Solution: adopt the standard (incomplete) static type system

Typing rules:

T ` x : T(x)

o: s

T ` e : (A : s, . . . , B : t)

T ` o: s

T ` e.A : s

T ` e: s

...

T ` e0 : t

0

T ` (A : e, . . . , B : e ) : (A : s, . . . , B : t)

31 / 67

In Search of a Complete Static Type System (4) Conclusion: • Complete Static Type Systems exist for restricted query languages (the conjunctive queries) • But these systems have high complexity

Solution: adopt the standard (incomplete) static type system

Typing rules: T ` e: s T ` {} : {s}

T ` e1 : {s}

T ` {e} : {s}

x1 : s1 , . . . , xi : si , T ` ei+1 : {si+1 } for 0 ≤ i < n

T ` e2 : {s}

T ` e1 ∪ e2 : {s} x1 : s1 , . . . , xn : sn , T ` e : s

T ` {e | x1 ∈ e1 , . . . , xn ∈ en } : {s}

32 / 67

Expressiveness? Static typing sometimes limits expressiveness: • Untyped Lambda Calculus: all computable functions • Simply Typed Lambda Calculus: restricted to extended polynomials

33 / 67

Expressiveness? Static typing sometimes limits expressiveness: • Untyped Lambda Calculus: all computable functions • Simply Typed Lambda Calculus: restricted to extended polynomials Is the same true for queries? • Language-integrated queries in statically typed languages (LINQ, links, . . . ) • Language-integrated queries in dynamically typed languages (Python, Ruby, ...)

34 / 67

Expressiveness? Static typing sometimes limits expressiveness: • Untyped Lambda Calculus: all computable functions • Simply Typed Lambda Calculus: restricted to extended polynomials Is the same true for queries? • Language-integrated queries in statically typed languages (LINQ, links, . . . ) • Language-integrated queries in dynamically typed languages (Python, Ruby, ...)

Question: Is the static type system for N RC expressively complete? • Can all well-typed queries be equivalently written in a well-typed way?

35 / 67

Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e

36 / 67

Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}}

37 / 67

Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}} → {}

38 / 67

Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}} → {} • Creating heterogeneous objects: {z.A | z ∈ (x ∪ y )} ◦ ill-typed under x 7→ {(A : r , B : s)} y 7→ {(A : r , C : t)}.

39 / 67

Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}} → {} • Creating heterogeneous objects: {z.A | z ∈ (x ∪ y )} ◦ ill-typed under x 7→ {(A : r , B : s)} y 7→ {(A : r , C : t)}. ◦ Can be rewritten as {z.A | z ∈ x} ∪ {z.A | z ∈ y }

40 / 67

Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}} → {} • Creating heterogeneous objects: {z.A | z ∈ (x ∪ y )} ◦ ill-typed under x 7→ {(A : r , B : s)} y 7→ {(A : r , C : t)}. ◦ Can be rewritten as {z.A | z ∈ x} ∪ {z.A | z ∈ y } • General case more difficult: {z.A | z ∈ e} ◦ ill-typed when e is another comprehension that returns a heterogeneous set

41 / 67

Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}} → {} • Creating heterogeneous objects: {z.A | z ∈ (x ∪ y )} ◦ ill-typed under x 7→ {(A : r , B : s)} y 7→ {(A : r , C : t)}. ◦ Can be rewritten as {z.A | z ∈ x} ∪ {z.A | z ∈ y } • General case more difficult: {z.A | z ∈ e} ◦ ill-typed when e is another comprehension that returns a heterogeneous set I would appreciate any pointers to the literature on similar results for generalpurpose (functional) programming languages! 42 / 67

Polymorphic Expressivenes Consider the Extension of N RC with • Complement projection dropA (e) that retains all of e’s fields but A Operational semantics and typing rule: e → (A : o, B : o 0 , . . . , C : o 00 )

T ` e : (A : r , B : s, . . . , C : t)

dropA e → (B : o 0 , . . . , C : o 00 )

T ` dropA e : (B : s, . . . , C : t)

Note that: • We can easily simulate dropA x in N RC(=) if we know the type of x

43 / 67

Polymorphic Expressivenes Consider the Extension of N RC with • Complement projection dropA (e) that retains all of e’s fields but A Operational semantics and typing rule: e → (A : o, B : o 0 , . . . , C : o 00 )

T ` e : (A : r , B : s, . . . , C : t)

dropA e → (B : o 0 , . . . , C : o 00 )

T ` dropA e : (B : s, . . . , C : t)

Note that: • We can easily simulate dropA x in N RC(=) if we know the type of x • But not if x’s type is unknown!

Theorem • A typing of an expression e is a pair (T , s) such that T ` e : s • Say that two expressions e(x, . . . , y ) and e 0 (x, . . . , y ) are polymorphically equivalent if they have the same set of typings and, for each such typing (T , s), e1 and e2 evaluate to the same output on each input of type T • No expression in N RC(=) is polymorphically equivalent to dropA x 44 / 67

Polymorphic Expressiveness (2) Consider the Extension of N RC with • Cartesian Product e1 × e2 • Join e1 o n e2 Typing rules: (φ1 , φ2 , ψ are record types, + is record type concatenation) T ` e1 : {φ1 } T ` e2 : {φ2 } φ1 and φ2 have disjoint sets of attributes

T ` e1 : {φ1 + ψ} T ` e2 : {φ2 + ψ} φ1 and φ2 have disjoint sets of attributes

T ` e1 × e2 : {φ1 + φ2 }

T ` e1 o n e2 : {φ1 + φ2 + ψ}

45 / 67

Polymorphic Expressiveness (2) Consider the Extension of N RC with • Cartesian Product e1 × e2 • Join e1 o n e2 Typing rules: (φ1 , φ2 , ψ are record types, + is record type concatenation) T ` e1 : {φ1 } T ` e2 : {φ2 } φ1 and φ2 have disjoint sets of attributes

T ` e1 : {φ1 + ψ} T ` e2 : {φ2 + ψ} φ1 and φ2 have disjoint sets of attributes

T ` e1 × e2 : {φ1 + φ2 }

T ` e1 o n e2 : {φ1 + φ2 + ψ}

Theorem • No expression in N RC(=, drop, ×) is polymorphically equivalent to e1 o n e2 . • No expression in N RC(=, drop, o n) is polymorphically equivalent to e1 × e2 .

46 / 67

Polymorphic Expressiveness (3)

Open Research Questions • Is there a reasonable notion when a query language is “polymorphically

complete”? • What operators are needed to obtain such a language?

47 / 67

Typability and Type Inference Two classical problems Typability Input: Problem:

Expression e(x, . . . , y ) Do there exists T and t such that T ` e : t?

Type Inference Input: Problem:

Expression e(x, . . . , y ) Give an explicit description of the set of all typings (T , s) for which T ` e : s?

Practical Motivation: • Complexity of Typability tells us something about the complexity of typechecking queries in implicitly typed programming languages • Type inference is essential for query optimization in the absence of schema information (Kleisli, . . . ). 48 / 67

Typability and Type Inference What notion of type formulae is “just right” for a given query language?

49 / 67

Typability and Type Inference What notion of type formulae is “just right” for a given query language?

Theorem [Buneman and Ohori; 1996] There exists a polynomial time algorithm that, given an expression e(x, . . . , y ) in N RC(=), returns false if e is untypable, and otherwise returns a kinded type formula describing all of e’s typings.

50 / 67

Typability and Type Inference What notion of type formulae is “just right” for a given query language?

Theorem [Buneman and Ohori; 1996] There exists a polynomial time algorithm that, given an expression e(x, . . . , y ) in N RC(=), returns false if e is untypable, and otherwise returns a kinded type formula describing all of e’s typings.

Theorem [R´ emy; 1993] There exists a polynomial time algorithm that, given an expression e(x, . . . , y ) in N RC(=, drop), returns false if e is untypable, and otherwise returns a type formula with row variables describing all of e’s typings.

51 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) Our proposal: a type inference algorithm based on constrained types, in the spirit of HM(X) [Odersky, Sulzmann, and Wehr; 1999]

Signature of principal type formulas A principal type formula for e(x1 , . . . , xn ) is a conjunctive, many-sorted, firstorder logic formula ϕ(x1 , . . . , xn , z) that, interpreted in the structure of all possible types T , defines all typings of e: x1 : s1 , . . . , xn : sn ` e : t



T |= ϕ(s1 , . . . , sn , t)

Example: principal type formula for x ∪ y (∃u) x = Set(u) ∧ y = Set(u) ∧ z = Set(u)

52 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) The signature of the logic and its interpretation: Symbol x, y , . . . ρ, ρ0 , . . . = ⊆ # Set Record A ,

Arity type row type × type row × row row × row type → type row → type type → row row × row → row

Interpretation in T

equality relation on types containment of functions e.g. A : s ⊆ B : t, A : s relates rows with disjoint domains maps s to {s} maps A : s, . . . , B : t to (A : s, . . . , B : t) maps s to A : s maps (ρ1 , ρ2 ) to ρ1 ∪ π ˆdom(ρ1 ) (ρ2 )

Example: principal type formula for {v .A | v ∈ (x × y )} (∃ρ)(∃ρ0 ) x = Set(Record(ρ)) ∧ y = Set(Record(ρ0 )) ∧ ρ # ρ0 ∧ (∃u) z = Set(u) ∧ A(u) ⊆ ρ, ρ0

53 / 67

Typability and Type Inference for N RC(=, drop, ×, o n)

Theorem Every N RC(=, drop, ×, o n) expression e has a principal type formula ϕe , of size linear in the size of e, and computable from e in polynomial time.

54 / 67

Typability and Type Inference for N RC(=, drop, ×, o n)

Theorem Every N RC(=, drop, ×, o n) expression e has a principal type formula ϕe , of size linear in the size of e, and computable from e in polynomial time.

Main application: Typability of N RC(=, drop, ×, o n)-expressions is in NP.

55 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is in NP. Proof sketch: Step 1. Expression e → principal type formula ϕe e is typable



ϕe is satisfiable

Example: {v .A | v ∈ (x × y )} ∪ {v .B | v ∈ x} ⇔ (∃u)(∃ρ1 )(∃ρ2 ) z = Set(u) ∧ x = Set(Record(ρ1 )) ∧ y = Set(Record(ρ2 )) ∧ ρ1 # ρ2 ∧ A(u) ⊆ ρ1 , ρ2 ∧ B(u) ⊆ ρ1

56 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is in NP. Proof sketch: Step 2. Principal type formula ϕe → quantifier free formula ψe ϕe is satisfiable



ψe is satisfiable

Example: (∃u)(∃ρ1 )(∃ρ2 ) z = Set(u) ∧ x = Set(Record(ρ1 )) ∧ y = Set(Record(ρ2 )) ∧ ρ1 # ρ2 ∧ A(u) ⊆ ρ1 , ρ2 ∧ B(u) ⊆ ρ1 ⇔ z = Set(u) ∧ x = Set(Record(ρ1 )) ∧ y = Set(Record(ρ2 )) ∧ ρ1 # ρ2 ∧ A(u) ⊆ ρ1 , ρ2 ∧ B(u) ⊆ ρ1 57 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is in NP. Proof sketch: Step 3. Quantifier free formula ψe → guess quantifier free formula θe without row variables ψe is satisfiable ⇔ θe is satisfiable Example: z = Set(u) ∧ x = Set(Record(ρ1 )) ∧ y = Set(Record(ρ2 )) ∧ ρ1 # ρ2 ∧ A(u) ⊆ ρ1 , ρ2 ∧ B(u) ⊆ ρ1 ⇔ z = Set(u) ∧ x = Set(Record(B(u1 ))) ∧ y = Set(Record(A(u2 ))) ∧ B(u1 ) # A(u2 ) ∧ A(u) ⊆ B(u1 ), A(u2 ) ∧ B(u) ⊆ B(u1 ) 58 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is in NP. Proof sketch: Step 4. Quantifier free formula θe without row variables → simplified formula σe θe is satisfiable



σe is satisfiable

Example: z = Set(u) ∧ x = Set(Record(B(u1 ))) ∧ y = Set(Record(A(u2 ))) ∧ B(u1 ) # A(u2 ) ∧ A(u) ⊆ B(u1 ), A(u2 ) ∧ B(u) ⊆ B(u1 ) ⇔ z = Set(u) ∧ x = Set(Record(B(u1 ))) ∧ y = Set(Record(A(u2 ))) ∧ A(u) ⊆ B(u1 ), A(u2 ) ∧ B(u) ⊆ B(u1 ) 59 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is in NP. Proof sketch: Step 4. Quantifier free formula θe without row variables → simplified formula σe θe is satisfiable



σe is satisfiable

Example: z = Set(u) ∧ x = Set(Record(B(u1 ))) ∧ y = Set(Record(A(u2 ))) ∧ A(u) ⊆ B(u1 ), A(u2 ) ∧ B(u) ⊆ B(u1 ) ⇔ z = Set(u) ∧ x = Set(Record(B(u1 ))) ∧ y = Set(Record(A(u2 ))) ∧ u = u2 ∧ u = u1 The latter formula can efficiently be solved by unification

60 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is NP-hard. Proof sketch: By a reduction from Positive one-in-three 3SAT. Abbreviate πA (e) := {v .A | v ∈ e}.

61 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is NP-hard. Proof sketch: By a reduction from Positive one-in-three 3SAT. Abbreviate πA (e) := {v .A | v ∈ e}. (x1 ∨ y1 ∨ z1 ) ∧ (x2 ∨ y1 ∨ z2 ) ∧ (x2 ∨ y3 ∨ z1 ) is satisfiable ⇔ πA (x1 × y1 × z1 ) ∪ πA (x2 × y1 × z2 ) ∪ πA (x2 × y3 × z1 ) is typable

62 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) What about typability for expressions without cartesian product operator?

63 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) What about typability for expressions without cartesian product operator?

Theorem Typability of NNRC-expressions without cartesian product operator is NP-hard. Proof sketch: • Uses proof idea from Ohori and Buneman (1988), who showed that typability for “generalized join” is NP-hard. • Reduction from Monotone 3SAT.

64 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) What about typability for expressions without cartesian product operator?

Theorem Typability of NNRC-expressions without cartesian product operator is NP-hard. Proof sketch: • Uses proof idea from Ohori and Buneman (1988), who showed that typability for “generalized join” is NP-hard. • Reduction from Monotone 3SAT. Reductions transfer to programming languages with symmetric record concatenation, join, or mixin modules

65 / 67

The complexity of type checking DBPL’s with the N RC(=, drop, ×, o n) as the embedded QL When the ambient language is the simply typed λ-calculus • Type checking moves from P-complete to NP-hard

66 / 67

The complexity of type checking DBPL’s with the N RC(=, drop, ×, o n) as the embedded QL When the ambient language is the simply typed λ-calculus • Type checking moves from P-complete to NP-hard When the ambient language is ML: • Type checking was already Exptime-complete • However, Exptime-hardness is only due to peculiar programs which rarely occur in practice • Type checking ML is typically in linear time in practice • In contrast, NP-hardness for N RC(=, drop, ×, o n) is due to cartesian product and join, which do occur in practice

67 / 67