Types for Database Query Languages Polymorphism, Complexity, and Completeness

Types for Database Query Languages Polymorphism, Complexity, and Completeness Stijn Vansummeren Universit´e Libre de Bruxelles 10 May 2010 Introdu...

Author: Barnaby Turner

0 downloads 1 Views 234KB Size

Report

Download PDF

Recommend Documents

19 Database Theory: Query Languages

Types and Persistence in Database Programming Languages

Query Languages for XML

Types, Polymorphism and Overloading

PARAMETRIC POLYMORPHISM FOR XML PROCESSING LANGUAGES

Other Relational Query Languages

Existential Types for Imperative Languages

Complexity of natural languages

Query Languages for XML. XPath XQuery XSLT

FOUNDATIONS OF DATABASES AND QUERY LANGUAGES

XML Query Languages: Experiences and Exemplars

Usability of XML Query Languages

On the Completeness of Full-Text Search Languages for XML

Automata and Languages Computability Theory Complexity Theory

DataMine: Application Programming Interface and Query Language for Database Mining

Combinatorial Complexity of Regular Languages

Constrained Types for Object-Oriented Languages

A Query Simulation System To Illustrate Database Query Execution

The Complexity of XPath Query Evaluation and XML Typing

Relational Database Languages: Relational Calculus

Database Programming Languages (DBPL-5)

Adaptable Controlled Natural Languages for Online Query Systems

Types for Database Query Languages Polymorphism, Complexity, and Completeness

Stijn Vansummeren Universit´e Libre de Bruxelles

10 May 2010

Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others

2 / 67

Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others

3 / 67

Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others

How does this specialize to database query languages?

4 / 67

Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others

How does this specialize to database query languages? • Limited expressiveness (not Turing-complete) • No higher-order functions, no subtyping • Only records, collections 5 / 67

Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others

How does this specialize to database query languages? • Limited expressiveness (not Turing-complete) → complete type systems? • No higher-order functions, no subtyping • Only records, collections 6 / 67

Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others

How does this specialize to database query languages? • Limited expressiveness (not Turing-complete) → complete type systems? • No higher-order functions, no subtyping → complexity of typability? • Only records, collections 7 / 67

Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others

How does this specialize to database query languages? • Limited expressiveness (not Turing-complete) → complete type systems? • No higher-order functions, no subtyping → complexity of typability? • Only records, collections → specialized type inference algorithms? 8 / 67

Introduction Results presented are from the following papers: • On the Complexity of Deciding Typability in the Relational Algebra Acta Informatica, 2005 • Polymorphic Type Inference for the Named Nested Relational Calculus ACM TOCL, 2006 • Well-Definedness and Semantic Type-Checking for the Nested Relational Calculus Theoretical Computer Science, 2007 • Unpublished notes

This is joint work with • Dirk Van Gucht, Indiana University, USA • Jan Van den Bussche, Hasselt University, Belgium

9 / 67

Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects

o

::=

c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }

10 / 67

Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects

o

::=

c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }

Expressions

e

::= |

x | o | (A : e, . . . , B : e 0 ) | e.A {} | {e} | e1 ∪ e2 | {e | x1 ∈ e1 , . . . , xn ∈ en }

Binder List

∆

::=

x1 ∈ e1 , . . . , xn ∈ en

Types

s, t

::=

int | string | · · · | (A : s, . . . , B : t) | {s}

Example: {(A : y .C , B : z) | y ∈ x1 , z ∈ x2 }

11 / 67

Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects

o

::=

c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }

Expressions

e

::= |

x | o | (A : e, . . . , B : e 0 ) | e.A {} | {e} | e1 ∪ e2 | {e | x1 ∈ e1 , . . . , xn ∈ en }

Binder List

∆

::=

x1 ∈ e1 , . . . , xn ∈ en

Types

s, t

::=

int | string | · · · | (A : s, . . . , B : t) | {s}

Example: {(A : y .C , B : z) | y ∈ x1 , z ∈ x2 } x1 C 1 3

x2 D 2 4

3 8

⇒

A 1 1 3 3

B 3 8 3 8

12 / 67

Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects

o

::=

c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }

Expressions

e

::= |

x | o | (A : e, . . . , B : e 0 ) | e.A {} | {e} | e1 ∪ e2 | {e | x1 ∈ e1 , . . . , xn ∈ en }

Binder List

∆

::=

x1 ∈ e1 , . . . , xn ∈ en

Types

s, t

::=

int | string | · · · | (A : s, . . . , B : t) | {s}

Operational semantics: e→o

... 0

e0 → o0

e → (A : o, . . . , B : o 0 ) 0

(A : e, . . . , B : e ) → (A : o, . . . , B : o )

e.A → o

13 / 67

Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects

o

::=

c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }

Expressions

e

::= |

x | o | (A : e, . . . , B : e 0 ) | e.A {} | {e} | e1 ∪ e2 | {e | x1 ∈ e1 , . . . , xn ∈ en }

Binder List

∆

::=

x1 ∈ e1 , . . . , xn ∈ en

Types

s, t

::=

int | string | · · · | (A : s, . . . , B : t) | {s}

Operational semantics: e→o {} → {}

{e} → {o}

e1 → {o1 , . . . , om }

e2 → {o10 , . . . , on0 }

e1 ∪ e2 → {o1 , . . . , om , o10 , . . . , on0 }

14 / 67

Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects

o

::=

c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }

Expressions

e

::= |

x | o | (A : e, . . . , B : e 0 ) | e.A {} | {e} | e1 ∪ e2 | {e | x1 ∈ e1 , . . . , xn ∈ en }

Binder List

∆

::=

x1 ∈ e1 , . . . , xn ∈ en

Types

s, t

::=

int | string | · · · | (A : s, . . . , B : t) | {s}

Operational semantics: e→o

e1 → {}

{e | } → {o}

{e | x1 ∈ e1 , ∆} → {}

e1 → {o, . . . , o 0 } {e[x1 /o] | ∆[x1 /o]} ∪ · · · ∪ {e[x1 /o 0 ] | ∆[x1 /o 0 ]} → o 00 {e | x1 ∈ e1 , ∆} → o 00

15 / 67

In Search of a Complete Static Type System

Static Type Systems for Turing-complete Languages: • Are sound (i.e., can prove the absence of runtime errors) • But necessarily incomplete (i.e., cannot prove that an error will occur)

Question: N RC is not Turing-complete. Does it have a sound and complete static type system?

16 / 67

In Search of a Complete Static Type System (2)

The question is equivalent to the following decision problem: Well-Definedness Input: Expression e(x, . . . , y ) and types s, . . . , t for the free variables. Problem: Decide whether e is well-defined under s, . . . , t, i.e., whether e[x/o, . . . , y /o 0 ] evaluates to an object for all o : s, . . . , o 0 : t.

17 / 67

In Search of a Complete Static Type System (2)

The question is equivalent to the following decision problem: Well-Definedness Input: Expression e(x, . . . , y ) and types s, . . . , t for the free variables. Problem: Decide whether e is well-defined under s, . . . , t, i.e., whether e[x/o, . . . , y /o 0 ] evaluates to an object for all o : s, . . . , o 0 : t.

Theorem Well-Definedness for N RC is decidable.

18 / 67

Language Extensions Consider the Extension of N RC with • Atomic comparisons e1 eq e2 which can only compare two atomic data values • This gives us essentially the conjunctive queries Operational semantics: e1 → c1

e2 → c1

e1 eq e2 → {()}

e1 → c1

e2 → c2

e1 eq e2 → {}

Example: return all records in R whose A-field is 5 {x | x ∈ R, y ∈ (x.A eq 5)}

19 / 67

Language Extensions (2) Theorem • Well-Definedness for N RC(eq) is decidable. • Well-Definedness for N RC(eq) is hard for Co-Nexptime

20 / 67

Language Extensions (2) Theorem • Well-Definedness for N RC(eq) is decidable. • Well-Definedness for N RC(eq) is hard for Co-Nexptime Hardness follows by reduction from: Satisfiability Input: Problem:

Expression e(x, . . . , y ) and types s, . . . , t such that e[x/o, . . . , y /o 0 ] evaluates to a set for all o : s, . . . , o 0 : t. Decide whether there exist objects o : s, . . . , o 0 : t such that e[x/o, . . . , y /o 0 ] evaluates to a non-empty set.

21 / 67

Language Extensions (2) Theorem • Well-Definedness for N RC(eq) is decidable. • Well-Definedness for N RC(eq) is hard for Co-Nexptime Hardness follows by reduction from: Satisfiability Input: Problem:

Expression e(x, . . . , y ) and types s, . . . , t such that e[x/o, . . . , y /o 0 ] evaluates to a set for all o : s, . . . , o 0 : t. Decide whether there exist objects o : s, . . . , o 0 : t such that e[x/o, . . . , y /o 0 ] evaluates to a non-empty set.

• Let e be a closed, well-defined expression that always outputs a set

22 / 67

Language Extensions (2) Theorem • Well-Definedness for N RC(eq) is decidable. • Well-Definedness for N RC(eq) is hard for Co-Nexptime Hardness follows by reduction from: Satisfiability Input: Problem:

Expression e(x, . . . , y ) and types s, . . . , t such that e[x/o, . . . , y /o 0 ] evaluates to a set for all o : s, . . . , o 0 : t. Decide whether there exist objects o : s, . . . , o 0 : t such that e[x/o, . . . , y /o 0 ] evaluates to a non-empty set.

• Let e be a closed, well-defined expression that always outputs a set • Then {{}.A | x ∈ e} is well-def ⇔ e is satisfiable

23 / 67

Language Extensions (2) Theorem • Well-Definedness for N RC(eq) is decidable. • Well-Definedness for N RC(eq) is hard for Co-Nexptime Hardness follows by reduction from: Satisfiability Input: Problem:

Expression e(x, . . . , y ) and types s, . . . , t such that e[x/o, . . . , y /o 0 ] evaluates to a set for all o : s, . . . , o 0 : t. Decide whether there exist objects o : s, . . . , o 0 : t such that e[x/o, . . . , y /o 0 ] evaluates to a non-empty set.

• Let e be a closed, well-defined expression that always outputs a set • Then {{}.A | x ∈ e} is well-def ⇔ e is satisfiable • [Koch; 2006] Satisfiability of closed expressions is Co-Nexptime-hard. 24 / 67

Language Extensions (3) Consider the Extension of N RC with • General comparisons e1 = e2 which can compare arbritrary objects • Gives us at the full power of the relational algebra Operational semantics: e1 → o1

e2 → o1

e1 = e2 → {()}

e1 → o1

e2 → o2

e1 = e2 → {}

25 / 67

Language Extensions (3) Consider the Extension of N RC with • General comparisons e1 = e2 which can compare arbritrary objects • Gives us at the full power of the relational algebra Operational semantics: e1 → o1

e2 → o1

e1 = e2 → {()}

e1 → o1

e2 → o2

e1 = e2 → {}

Theorem • Satisfiability for Relational Algebra is undecidable. • Therefore Satisfiability for N RC(=) is undecidable. • Hence, Well-Definedness for N RC(=) is undecidable.

26 / 67

Language Extensions (4) Consider the Extension of N RC with • Singleton extraction extract(e) that extract the value from a singleton set • present in OQL Operational semantics: e → {o} extract(e) → o This allows us to model some features of SQL • SQL: select ... where (5 = select distinct A from R) • N RC(eq, extract): 5 eq (extract {x.A | x ∈ R})

27 / 67

Language Extensions (4) Consider the Extension of N RC with • Singleton extraction extract(e) that extract the value from a singleton set • present in OQL Operational semantics: e → {o} extract(e) → o This allows us to model some features of SQL • SQL: select ... where (5 = select distinct A from R) • N RC(eq, extract): 5 eq (extract {x.A | x ∈ R})

Theorem Well-Definedness for N RC(eq, extract) is undecidable

28 / 67

In Search of a Complete Static Type System (4) Conclusion: • Complete Static Type Systems exist for restricted query languages (the conjunctive queries) • But these systems have high complexity

29 / 67

In Search of a Complete Static Type System (4) Conclusion: • Complete Static Type Systems exist for restricted query languages (the conjunctive queries) • But these systems have high complexity

Solution: adopt the standard (incomplete) static type system

30 / 67

In Search of a Complete Static Type System (4) Conclusion: • Complete Static Type Systems exist for restricted query languages (the conjunctive queries) • But these systems have high complexity

Solution: adopt the standard (incomplete) static type system

Typing rules:

T ` x : T(x)

o: s

T ` e : (A : s, . . . , B : t)

T ` o: s

T ` e.A : s

T ` e: s

...

T ` e0 : t

0

T ` (A : e, . . . , B : e ) : (A : s, . . . , B : t)

31 / 67

In Search of a Complete Static Type System (4) Conclusion: • Complete Static Type Systems exist for restricted query languages (the conjunctive queries) • But these systems have high complexity

Solution: adopt the standard (incomplete) static type system

Typing rules: T ` e: s T ` {} : {s}

T ` e1 : {s}

T ` {e} : {s}

x1 : s1 , . . . , xi : si , T ` ei+1 : {si+1 } for 0 ≤ i < n

T ` e2 : {s}

T ` e1 ∪ e2 : {s} x1 : s1 , . . . , xn : sn , T ` e : s

T ` {e | x1 ∈ e1 , . . . , xn ∈ en } : {s}

32 / 67

Expressiveness? Static typing sometimes limits expressiveness: • Untyped Lambda Calculus: all computable functions • Simply Typed Lambda Calculus: restricted to extended polynomials

33 / 67

Expressiveness? Static typing sometimes limits expressiveness: • Untyped Lambda Calculus: all computable functions • Simply Typed Lambda Calculus: restricted to extended polynomials Is the same true for queries? • Language-integrated queries in statically typed languages (LINQ, links, . . . ) • Language-integrated queries in dynamically typed languages (Python, Ruby, ...)

34 / 67

Expressiveness? Static typing sometimes limits expressiveness: • Untyped Lambda Calculus: all computable functions • Simply Typed Lambda Calculus: restricted to extended polynomials Is the same true for queries? • Language-integrated queries in statically typed languages (LINQ, links, . . . ) • Language-integrated queries in dynamically typed languages (Python, Ruby, ...)

Question: Is the static type system for N RC expressively complete? • Can all well-typed queries be equivalently written in a well-typed way?

35 / 67

Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e

36 / 67

Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}}

37 / 67

Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}} → {}

38 / 67

Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}} → {} • Creating heterogeneous objects: {z.A | z ∈ (x ∪ y )} ◦ ill-typed under x 7→ {(A : r , B : s)} y 7→ {(A : r , C : t)}.

39 / 67

Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}} → {} • Creating heterogeneous objects: {z.A | z ∈ (x ∪ y )} ◦ ill-typed under x 7→ {(A : r , B : s)} y 7→ {(A : r , C : t)}. ◦ Can be rewritten as {z.A | z ∈ x} ∪ {z.A | z ∈ y }

40 / 67

Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}} → {} • Creating heterogeneous objects: {z.A | z ∈ (x ∪ y )} ◦ ill-typed under x 7→ {(A : r , B : s)} y 7→ {(A : r , C : t)}. ◦ Can be rewritten as {z.A | z ∈ x} ∪ {z.A | z ∈ y } • General case more difficult: {z.A | z ∈ e} ◦ ill-typed when e is another comprehension that returns a heterogeneous set

41 / 67

Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}} → {} • Creating heterogeneous objects: {z.A | z ∈ (x ∪ y )} ◦ ill-typed under x 7→ {(A : r , B : s)} y 7→ {(A : r , C : t)}. ◦ Can be rewritten as {z.A | z ∈ x} ∪ {z.A | z ∈ y } • General case more difficult: {z.A | z ∈ e} ◦ ill-typed when e is another comprehension that returns a heterogeneous set I would appreciate any pointers to the literature on similar results for generalpurpose (functional) programming languages! 42 / 67

Polymorphic Expressivenes Consider the Extension of N RC with • Complement projection dropA (e) that retains all of e’s fields but A Operational semantics and typing rule: e → (A : o, B : o 0 , . . . , C : o 00 )

T ` e : (A : r , B : s, . . . , C : t)

dropA e → (B : o 0 , . . . , C : o 00 )

T ` dropA e : (B : s, . . . , C : t)

Note that: • We can easily simulate dropA x in N RC(=) if we know the type of x

43 / 67

Polymorphic Expressivenes Consider the Extension of N RC with • Complement projection dropA (e) that retains all of e’s fields but A Operational semantics and typing rule: e → (A : o, B : o 0 , . . . , C : o 00 )

T ` e : (A : r , B : s, . . . , C : t)

dropA e → (B : o 0 , . . . , C : o 00 )

T ` dropA e : (B : s, . . . , C : t)

Note that: • We can easily simulate dropA x in N RC(=) if we know the type of x • But not if x’s type is unknown!

Theorem • A typing of an expression e is a pair (T , s) such that T ` e : s • Say that two expressions e(x, . . . , y ) and e 0 (x, . . . , y ) are polymorphically equivalent if they have the same set of typings and, for each such typing (T , s), e1 and e2 evaluate to the same output on each input of type T • No expression in N RC(=) is polymorphically equivalent to dropA x 44 / 67

Polymorphic Expressiveness (2) Consider the Extension of N RC with • Cartesian Product e1 × e2 • Join e1 o n e2 Typing rules: (φ1 , φ2 , ψ are record types, + is record type concatenation) T ` e1 : {φ1 } T ` e2 : {φ2 } φ1 and φ2 have disjoint sets of attributes

T ` e1 : {φ1 + ψ} T ` e2 : {φ2 + ψ} φ1 and φ2 have disjoint sets of attributes

T ` e1 × e2 : {φ1 + φ2 }

T ` e1 o n e2 : {φ1 + φ2 + ψ}

45 / 67

Polymorphic Expressiveness (2) Consider the Extension of N RC with • Cartesian Product e1 × e2 • Join e1 o n e2 Typing rules: (φ1 , φ2 , ψ are record types, + is record type concatenation) T ` e1 : {φ1 } T ` e2 : {φ2 } φ1 and φ2 have disjoint sets of attributes

T ` e1 : {φ1 + ψ} T ` e2 : {φ2 + ψ} φ1 and φ2 have disjoint sets of attributes

T ` e1 × e2 : {φ1 + φ2 }

T ` e1 o n e2 : {φ1 + φ2 + ψ}

Theorem • No expression in N RC(=, drop, ×) is polymorphically equivalent to e1 o n e2 . • No expression in N RC(=, drop, o n) is polymorphically equivalent to e1 × e2 .

46 / 67

Polymorphic Expressiveness (3)

Open Research Questions • Is there a reasonable notion when a query language is “polymorphically

complete”? • What operators are needed to obtain such a language?

47 / 67

Typability and Type Inference Two classical problems Typability Input: Problem:

Expression e(x, . . . , y ) Do there exists T and t such that T ` e : t?

Type Inference Input: Problem:

Expression e(x, . . . , y ) Give an explicit description of the set of all typings (T , s) for which T ` e : s?

Practical Motivation: • Complexity of Typability tells us something about the complexity of typechecking queries in implicitly typed programming languages • Type inference is essential for query optimization in the absence of schema information (Kleisli, . . . ). 48 / 67

Typability and Type Inference What notion of type formulae is “just right” for a given query language?

49 / 67

Typability and Type Inference What notion of type formulae is “just right” for a given query language?

Theorem [Buneman and Ohori; 1996] There exists a polynomial time algorithm that, given an expression e(x, . . . , y ) in N RC(=), returns false if e is untypable, and otherwise returns a kinded type formula describing all of e’s typings.

50 / 67

Typability and Type Inference What notion of type formulae is “just right” for a given query language?

Theorem [Buneman and Ohori; 1996] There exists a polynomial time algorithm that, given an expression e(x, . . . , y ) in N RC(=), returns false if e is untypable, and otherwise returns a kinded type formula describing all of e’s typings.

Theorem [R´ emy; 1993] There exists a polynomial time algorithm that, given an expression e(x, . . . , y ) in N RC(=, drop), returns false if e is untypable, and otherwise returns a type formula with row variables describing all of e’s typings.

51 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) Our proposal: a type inference algorithm based on constrained types, in the spirit of HM(X) [Odersky, Sulzmann, and Wehr; 1999]

Signature of principal type formulas A principal type formula for e(x1 , . . . , xn ) is a conjunctive, many-sorted, firstorder logic formula ϕ(x1 , . . . , xn , z) that, interpreted in the structure of all possible types T , defines all typings of e: x1 : s1 , . . . , xn : sn ` e : t

⇔

T |= ϕ(s1 , . . . , sn , t)

Example: principal type formula for x ∪ y (∃u) x = Set(u) ∧ y = Set(u) ∧ z = Set(u)

52 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) The signature of the logic and its interpretation: Symbol x, y , . . . ρ, ρ0 , . . . = ⊆ # Set Record A ,

Arity type row type × type row × row row × row type → type row → type type → row row × row → row

Interpretation in T

equality relation on types containment of functions e.g. A : s ⊆ B : t, A : s relates rows with disjoint domains maps s to {s} maps A : s, . . . , B : t to (A : s, . . . , B : t) maps s to A : s maps (ρ1 , ρ2 ) to ρ1 ∪ π ˆdom(ρ1 ) (ρ2 )

Example: principal type formula for {v .A | v ∈ (x × y )} (∃ρ)(∃ρ0 ) x = Set(Record(ρ)) ∧ y = Set(Record(ρ0 )) ∧ ρ # ρ0 ∧ (∃u) z = Set(u) ∧ A(u) ⊆ ρ, ρ0

53 / 67

Typability and Type Inference for N RC(=, drop, ×, o n)

Theorem Every N RC(=, drop, ×, o n) expression e has a principal type formula ϕe , of size linear in the size of e, and computable from e in polynomial time.

54 / 67

Typability and Type Inference for N RC(=, drop, ×, o n)

Theorem Every N RC(=, drop, ×, o n) expression e has a principal type formula ϕe , of size linear in the size of e, and computable from e in polynomial time.

Main application: Typability of N RC(=, drop, ×, o n)-expressions is in NP.

55 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is in NP. Proof sketch: Step 1. Expression e → principal type formula ϕe e is typable

⇔

ϕe is satisfiable

Example: {v .A | v ∈ (x × y )} ∪ {v .B | v ∈ x} ⇔ (∃u)(∃ρ1 )(∃ρ2 ) z = Set(u) ∧ x = Set(Record(ρ1 )) ∧ y = Set(Record(ρ2 )) ∧ ρ1 # ρ2 ∧ A(u) ⊆ ρ1 , ρ2 ∧ B(u) ⊆ ρ1

56 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is in NP. Proof sketch: Step 2. Principal type formula ϕe → quantifier free formula ψe ϕe is satisfiable

⇔

ψe is satisfiable

Example: (∃u)(∃ρ1 )(∃ρ2 ) z = Set(u) ∧ x = Set(Record(ρ1 )) ∧ y = Set(Record(ρ2 )) ∧ ρ1 # ρ2 ∧ A(u) ⊆ ρ1 , ρ2 ∧ B(u) ⊆ ρ1 ⇔ z = Set(u) ∧ x = Set(Record(ρ1 )) ∧ y = Set(Record(ρ2 )) ∧ ρ1 # ρ2 ∧ A(u) ⊆ ρ1 , ρ2 ∧ B(u) ⊆ ρ1 57 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is in NP. Proof sketch: Step 3. Quantifier free formula ψe → guess quantifier free formula θe without row variables ψe is satisfiable ⇔ θe is satisfiable Example: z = Set(u) ∧ x = Set(Record(ρ1 )) ∧ y = Set(Record(ρ2 )) ∧ ρ1 # ρ2 ∧ A(u) ⊆ ρ1 , ρ2 ∧ B(u) ⊆ ρ1 ⇔ z = Set(u) ∧ x = Set(Record(B(u1 ))) ∧ y = Set(Record(A(u2 ))) ∧ B(u1 ) # A(u2 ) ∧ A(u) ⊆ B(u1 ), A(u2 ) ∧ B(u) ⊆ B(u1 ) 58 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is in NP. Proof sketch: Step 4. Quantifier free formula θe without row variables → simplified formula σe θe is satisfiable

⇔

σe is satisfiable

Example: z = Set(u) ∧ x = Set(Record(B(u1 ))) ∧ y = Set(Record(A(u2 ))) ∧ B(u1 ) # A(u2 ) ∧ A(u) ⊆ B(u1 ), A(u2 ) ∧ B(u) ⊆ B(u1 ) ⇔ z = Set(u) ∧ x = Set(Record(B(u1 ))) ∧ y = Set(Record(A(u2 ))) ∧ A(u) ⊆ B(u1 ), A(u2 ) ∧ B(u) ⊆ B(u1 ) 59 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is in NP. Proof sketch: Step 4. Quantifier free formula θe without row variables → simplified formula σe θe is satisfiable

⇔

σe is satisfiable

Example: z = Set(u) ∧ x = Set(Record(B(u1 ))) ∧ y = Set(Record(A(u2 ))) ∧ A(u) ⊆ B(u1 ), A(u2 ) ∧ B(u) ⊆ B(u1 ) ⇔ z = Set(u) ∧ x = Set(Record(B(u1 ))) ∧ y = Set(Record(A(u2 ))) ∧ u = u2 ∧ u = u1 The latter formula can efficiently be solved by unification

60 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is NP-hard. Proof sketch: By a reduction from Positive one-in-three 3SAT. Abbreviate πA (e) := {v .A | v ∈ e}.

61 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is NP-hard. Proof sketch: By a reduction from Positive one-in-three 3SAT. Abbreviate πA (e) := {v .A | v ∈ e}. (x1 ∨ y1 ∨ z1 ) ∧ (x2 ∨ y1 ∨ z2 ) ∧ (x2 ∨ y3 ∨ z1 ) is satisfiable ⇔ πA (x1 × y1 × z1 ) ∪ πA (x2 × y1 × z2 ) ∪ πA (x2 × y3 × z1 ) is typable

62 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) What about typability for expressions without cartesian product operator?

63 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) What about typability for expressions without cartesian product operator?

Theorem Typability of NNRC-expressions without cartesian product operator is NP-hard. Proof sketch: • Uses proof idea from Ohori and Buneman (1988), who showed that typability for “generalized join” is NP-hard. • Reduction from Monotone 3SAT.

64 / 67

Typability and Type Inference for N RC(=, drop, ×, o n) What about typability for expressions without cartesian product operator?

Theorem Typability of NNRC-expressions without cartesian product operator is NP-hard. Proof sketch: • Uses proof idea from Ohori and Buneman (1988), who showed that typability for “generalized join” is NP-hard. • Reduction from Monotone 3SAT. Reductions transfer to programming languages with symmetric record concatenation, join, or mixin modules

65 / 67

The complexity of type checking DBPL’s with the N RC(=, drop, ×, o n) as the embedded QL When the ambient language is the simply typed λ-calculus • Type checking moves from P-complete to NP-hard

66 / 67

The complexity of type checking DBPL’s with the N RC(=, drop, ×, o n) as the embedded QL When the ambient language is the simply typed λ-calculus • Type checking moves from P-complete to NP-hard When the ambient language is ML: • Type checking was already Exptime-complete • However, Exptime-hardness is only due to peculiar programs which rarely occur in practice • Type checking ML is typically in linear time in practice • In contrast, NP-hardness for N RC(=, drop, ×, o n) is due to cartesian product and join, which do occur in practice

67 / 67