Types for Database Query Languages Polymorphism, Complexity, and Completeness
Stijn Vansummeren Universit´e Libre de Bruxelles
10 May 2010
Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others
2 / 67
Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others
3 / 67
Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others
How does this specialize to database query languages?
4 / 67
Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others
How does this specialize to database query languages? • Limited expressiveness (not Turing-complete) • No higher-order functions, no subtyping • Only records, collections 5 / 67
Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others
How does this specialize to database query languages? • Limited expressiveness (not Turing-complete) → complete type systems? • No higher-order functions, no subtyping • Only records, collections 6 / 67
Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others
How does this specialize to database query languages? • Limited expressiveness (not Turing-complete) → complete type systems? • No higher-order functions, no subtyping → complexity of typability? • Only records, collections 7 / 67
Introduction Extensive study of type systems for: • General purpose (Turing complete) programming languages: ◦ ML [Hindley; 1969 - Milner; 1978 - Damas and Milner; 1982] ◦ Many others • Database programming languages (higher order functions + records + collections + . . . ): ◦ Extensible records [Wand; 1989 - R´emy; 1989, 1990] ◦ Generalized relational operators [Buneman and Ohori; 1996] ◦ Constrained types: HM(X) [Odersky, Sulzmann, and Wehr; 1999] ◦ Many others
How does this specialize to database query languages? • Limited expressiveness (not Turing-complete) → complete type systems? • No higher-order functions, no subtyping → complexity of typability? • Only records, collections → specialized type inference algorithms? 8 / 67
Introduction Results presented are from the following papers: • On the Complexity of Deciding Typability in the Relational Algebra Acta Informatica, 2005 • Polymorphic Type Inference for the Named Nested Relational Calculus ACM TOCL, 2006 • Well-Definedness and Semantic Type-Checking for the Nested Relational Calculus Theoretical Computer Science, 2007 • Unpublished notes
This is joint work with • Dirk Van Gucht, Indiana University, USA • Jan Van den Bussche, Hasselt University, Belgium
9 / 67
Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects
o
::=
c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }
10 / 67
Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects
o
::=
c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }
Expressions
e
::= |
x | o | (A : e, . . . , B : e 0 ) | e.A {} | {e} | e1 ∪ e2 | {e | x1 ∈ e1 , . . . , xn ∈ en }
Binder List
∆
::=
x1 ∈ e1 , . . . , xn ∈ en
Types
s, t
::=
int | string | · · · | (A : s, . . . , B : t) | {s}
Example: {(A : y .C , B : z) | y ∈ x1 , z ∈ x2 }
11 / 67
Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects
o
::=
c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }
Expressions
e
::= |
x | o | (A : e, . . . , B : e 0 ) | e.A {} | {e} | e1 ∪ e2 | {e | x1 ∈ e1 , . . . , xn ∈ en }
Binder List
∆
::=
x1 ∈ e1 , . . . , xn ∈ en
Types
s, t
::=
int | string | · · · | (A : s, . . . , B : t) | {s}
Example: {(A : y .C , B : z) | y ∈ x1 , z ∈ x2 } x1 C 1 3
x2 D 2 4
3 8
⇒
A 1 1 3 3
B 3 8 3 8
12 / 67
Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects
o
::=
c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }
Expressions
e
::= |
x | o | (A : e, . . . , B : e 0 ) | e.A {} | {e} | e1 ∪ e2 | {e | x1 ∈ e1 , . . . , xn ∈ en }
Binder List
∆
::=
x1 ∈ e1 , . . . , xn ∈ en
Types
s, t
::=
int | string | · · · | (A : s, . . . , B : t) | {s}
Operational semantics: e→o
... 0
e0 → o0
e → (A : o, . . . , B : o 0 ) 0
(A : e, . . . , B : e ) → (A : o, . . . , B : o )
e.A → o
13 / 67
Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects
o
::=
c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }
Expressions
e
::= |
x | o | (A : e, . . . , B : e 0 ) | e.A {} | {e} | e1 ∪ e2 | {e | x1 ∈ e1 , . . . , xn ∈ en }
Binder List
∆
::=
x1 ∈ e1 , . . . , xn ∈ en
Types
s, t
::=
int | string | · · · | (A : s, . . . , B : t) | {s}
Operational semantics: e→o {} → {}
{e} → {o}
e1 → {o1 , . . . , om }
e2 → {o10 , . . . , on0 }
e1 ∪ e2 → {o1 , . . . , om , o10 , . . . , on0 }
14 / 67
Introduction - Nested Relational Calculus N RC Canonical Query Language for Complex Objects Objects
o
::=
c | (A : o, . . . , B : o 0 ) | {o, . . . , o 0 }
Expressions
e
::= |
x | o | (A : e, . . . , B : e 0 ) | e.A {} | {e} | e1 ∪ e2 | {e | x1 ∈ e1 , . . . , xn ∈ en }
Binder List
∆
::=
x1 ∈ e1 , . . . , xn ∈ en
Types
s, t
::=
int | string | · · · | (A : s, . . . , B : t) | {s}
Operational semantics: e→o
e1 → {}
{e | } → {o}
{e | x1 ∈ e1 , ∆} → {}
e1 → {o, . . . , o 0 } {e[x1 /o] | ∆[x1 /o]} ∪ · · · ∪ {e[x1 /o 0 ] | ∆[x1 /o 0 ]} → o 00 {e | x1 ∈ e1 , ∆} → o 00
15 / 67
In Search of a Complete Static Type System
Static Type Systems for Turing-complete Languages: • Are sound (i.e., can prove the absence of runtime errors) • But necessarily incomplete (i.e., cannot prove that an error will occur)
Question: N RC is not Turing-complete. Does it have a sound and complete static type system?
16 / 67
In Search of a Complete Static Type System (2)
The question is equivalent to the following decision problem: Well-Definedness Input: Expression e(x, . . . , y ) and types s, . . . , t for the free variables. Problem: Decide whether e is well-defined under s, . . . , t, i.e., whether e[x/o, . . . , y /o 0 ] evaluates to an object for all o : s, . . . , o 0 : t.
17 / 67
In Search of a Complete Static Type System (2)
The question is equivalent to the following decision problem: Well-Definedness Input: Expression e(x, . . . , y ) and types s, . . . , t for the free variables. Problem: Decide whether e is well-defined under s, . . . , t, i.e., whether e[x/o, . . . , y /o 0 ] evaluates to an object for all o : s, . . . , o 0 : t.
Theorem Well-Definedness for N RC is decidable.
18 / 67
Language Extensions Consider the Extension of N RC with • Atomic comparisons e1 eq e2 which can only compare two atomic data values • This gives us essentially the conjunctive queries Operational semantics: e1 → c1
e2 → c1
e1 eq e2 → {()}
e1 → c1
e2 → c2
e1 eq e2 → {}
Example: return all records in R whose A-field is 5 {x | x ∈ R, y ∈ (x.A eq 5)}
19 / 67
Language Extensions (2) Theorem • Well-Definedness for N RC(eq) is decidable. • Well-Definedness for N RC(eq) is hard for Co-Nexptime
20 / 67
Language Extensions (2) Theorem • Well-Definedness for N RC(eq) is decidable. • Well-Definedness for N RC(eq) is hard for Co-Nexptime Hardness follows by reduction from: Satisfiability Input: Problem:
Expression e(x, . . . , y ) and types s, . . . , t such that e[x/o, . . . , y /o 0 ] evaluates to a set for all o : s, . . . , o 0 : t. Decide whether there exist objects o : s, . . . , o 0 : t such that e[x/o, . . . , y /o 0 ] evaluates to a non-empty set.
21 / 67
Language Extensions (2) Theorem • Well-Definedness for N RC(eq) is decidable. • Well-Definedness for N RC(eq) is hard for Co-Nexptime Hardness follows by reduction from: Satisfiability Input: Problem:
Expression e(x, . . . , y ) and types s, . . . , t such that e[x/o, . . . , y /o 0 ] evaluates to a set for all o : s, . . . , o 0 : t. Decide whether there exist objects o : s, . . . , o 0 : t such that e[x/o, . . . , y /o 0 ] evaluates to a non-empty set.
• Let e be a closed, well-defined expression that always outputs a set
22 / 67
Language Extensions (2) Theorem • Well-Definedness for N RC(eq) is decidable. • Well-Definedness for N RC(eq) is hard for Co-Nexptime Hardness follows by reduction from: Satisfiability Input: Problem:
Expression e(x, . . . , y ) and types s, . . . , t such that e[x/o, . . . , y /o 0 ] evaluates to a set for all o : s, . . . , o 0 : t. Decide whether there exist objects o : s, . . . , o 0 : t such that e[x/o, . . . , y /o 0 ] evaluates to a non-empty set.
• Let e be a closed, well-defined expression that always outputs a set • Then {{}.A | x ∈ e} is well-def ⇔ e is satisfiable
23 / 67
Language Extensions (2) Theorem • Well-Definedness for N RC(eq) is decidable. • Well-Definedness for N RC(eq) is hard for Co-Nexptime Hardness follows by reduction from: Satisfiability Input: Problem:
Expression e(x, . . . , y ) and types s, . . . , t such that e[x/o, . . . , y /o 0 ] evaluates to a set for all o : s, . . . , o 0 : t. Decide whether there exist objects o : s, . . . , o 0 : t such that e[x/o, . . . , y /o 0 ] evaluates to a non-empty set.
• Let e be a closed, well-defined expression that always outputs a set • Then {{}.A | x ∈ e} is well-def ⇔ e is satisfiable • [Koch; 2006] Satisfiability of closed expressions is Co-Nexptime-hard. 24 / 67
Language Extensions (3) Consider the Extension of N RC with • General comparisons e1 = e2 which can compare arbritrary objects • Gives us at the full power of the relational algebra Operational semantics: e1 → o1
e2 → o1
e1 = e2 → {()}
e1 → o1
e2 → o2
e1 = e2 → {}
25 / 67
Language Extensions (3) Consider the Extension of N RC with • General comparisons e1 = e2 which can compare arbritrary objects • Gives us at the full power of the relational algebra Operational semantics: e1 → o1
e2 → o1
e1 = e2 → {()}
e1 → o1
e2 → o2
e1 = e2 → {}
Theorem • Satisfiability for Relational Algebra is undecidable. • Therefore Satisfiability for N RC(=) is undecidable. • Hence, Well-Definedness for N RC(=) is undecidable.
26 / 67
Language Extensions (4) Consider the Extension of N RC with • Singleton extraction extract(e) that extract the value from a singleton set • present in OQL Operational semantics: e → {o} extract(e) → o This allows us to model some features of SQL • SQL: select ... where (5 = select distinct A from R) • N RC(eq, extract): 5 eq (extract {x.A | x ∈ R})
27 / 67
Language Extensions (4) Consider the Extension of N RC with • Singleton extraction extract(e) that extract the value from a singleton set • present in OQL Operational semantics: e → {o} extract(e) → o This allows us to model some features of SQL • SQL: select ... where (5 = select distinct A from R) • N RC(eq, extract): 5 eq (extract {x.A | x ∈ R})
Theorem Well-Definedness for N RC(eq, extract) is undecidable
28 / 67
In Search of a Complete Static Type System (4) Conclusion: • Complete Static Type Systems exist for restricted query languages (the conjunctive queries) • But these systems have high complexity
29 / 67
In Search of a Complete Static Type System (4) Conclusion: • Complete Static Type Systems exist for restricted query languages (the conjunctive queries) • But these systems have high complexity
Solution: adopt the standard (incomplete) static type system
30 / 67
In Search of a Complete Static Type System (4) Conclusion: • Complete Static Type Systems exist for restricted query languages (the conjunctive queries) • But these systems have high complexity
Solution: adopt the standard (incomplete) static type system
Typing rules:
T ` x : T(x)
o: s
T ` e : (A : s, . . . , B : t)
T ` o: s
T ` e.A : s
T ` e: s
...
T ` e0 : t
0
T ` (A : e, . . . , B : e ) : (A : s, . . . , B : t)
31 / 67
In Search of a Complete Static Type System (4) Conclusion: • Complete Static Type Systems exist for restricted query languages (the conjunctive queries) • But these systems have high complexity
Solution: adopt the standard (incomplete) static type system
Typing rules: T ` e: s T ` {} : {s}
T ` e1 : {s}
T ` {e} : {s}
x1 : s1 , . . . , xi : si , T ` ei+1 : {si+1 } for 0 ≤ i < n
T ` e2 : {s}
T ` e1 ∪ e2 : {s} x1 : s1 , . . . , xn : sn , T ` e : s
T ` {e | x1 ∈ e1 , . . . , xn ∈ en } : {s}
32 / 67
Expressiveness? Static typing sometimes limits expressiveness: • Untyped Lambda Calculus: all computable functions • Simply Typed Lambda Calculus: restricted to extended polynomials
33 / 67
Expressiveness? Static typing sometimes limits expressiveness: • Untyped Lambda Calculus: all computable functions • Simply Typed Lambda Calculus: restricted to extended polynomials Is the same true for queries? • Language-integrated queries in statically typed languages (LINQ, links, . . . ) • Language-integrated queries in dynamically typed languages (Python, Ruby, ...)
34 / 67
Expressiveness? Static typing sometimes limits expressiveness: • Untyped Lambda Calculus: all computable functions • Simply Typed Lambda Calculus: restricted to extended polynomials Is the same true for queries? • Language-integrated queries in statically typed languages (LINQ, links, . . . ) • Language-integrated queries in dynamically typed languages (Python, Ruby, ...)
Question: Is the static type system for N RC expressively complete? • Can all well-typed queries be equivalently written in a well-typed way?
35 / 67
Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e
36 / 67
Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}}
37 / 67
Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}} → {}
38 / 67
Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}} → {} • Creating heterogeneous objects: {z.A | z ∈ (x ∪ y )} ◦ ill-typed under x 7→ {(A : r , B : s)} y 7→ {(A : r , C : t)}.
39 / 67
Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}} → {} • Creating heterogeneous objects: {z.A | z ∈ (x ∪ y )} ◦ ill-typed under x 7→ {(A : r , B : s)} y 7→ {(A : r , C : t)}. ◦ Can be rewritten as {z.A | z ∈ x} ∪ {z.A | z ∈ y }
40 / 67
Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}} → {} • Creating heterogeneous objects: {z.A | z ∈ (x ∪ y )} ◦ ill-typed under x 7→ {(A : r , B : s)} y 7→ {(A : r , C : t)}. ◦ Can be rewritten as {z.A | z ∈ x} ∪ {z.A | z ∈ y } • General case more difficult: {z.A | z ∈ e} ◦ ill-typed when e is another comprehension that returns a heterogeneous set
41 / 67
Expressiveness? (2) Theorem The static type system for N RC(=) is expressively complete: • Every N RC(=) expression e(x, . . . , y ) that is well-defined under r , . . . , s and only produces outputs in a type t has an equivalent expression e 0 (x, . . . , y ) such that x : r , . . . , y : s ` e 0 : t. • Moreover, e 0 is of size linear in e Some ways in which expressions can be well-defined, but ill-typed: • Unreachable code: {{}.A | x ∈ {}} → {} • Creating heterogeneous objects: {z.A | z ∈ (x ∪ y )} ◦ ill-typed under x 7→ {(A : r , B : s)} y 7→ {(A : r , C : t)}. ◦ Can be rewritten as {z.A | z ∈ x} ∪ {z.A | z ∈ y } • General case more difficult: {z.A | z ∈ e} ◦ ill-typed when e is another comprehension that returns a heterogeneous set I would appreciate any pointers to the literature on similar results for generalpurpose (functional) programming languages! 42 / 67
Polymorphic Expressivenes Consider the Extension of N RC with • Complement projection dropA (e) that retains all of e’s fields but A Operational semantics and typing rule: e → (A : o, B : o 0 , . . . , C : o 00 )
T ` e : (A : r , B : s, . . . , C : t)
dropA e → (B : o 0 , . . . , C : o 00 )
T ` dropA e : (B : s, . . . , C : t)
Note that: • We can easily simulate dropA x in N RC(=) if we know the type of x
43 / 67
Polymorphic Expressivenes Consider the Extension of N RC with • Complement projection dropA (e) that retains all of e’s fields but A Operational semantics and typing rule: e → (A : o, B : o 0 , . . . , C : o 00 )
T ` e : (A : r , B : s, . . . , C : t)
dropA e → (B : o 0 , . . . , C : o 00 )
T ` dropA e : (B : s, . . . , C : t)
Note that: • We can easily simulate dropA x in N RC(=) if we know the type of x • But not if x’s type is unknown!
Theorem • A typing of an expression e is a pair (T , s) such that T ` e : s • Say that two expressions e(x, . . . , y ) and e 0 (x, . . . , y ) are polymorphically equivalent if they have the same set of typings and, for each such typing (T , s), e1 and e2 evaluate to the same output on each input of type T • No expression in N RC(=) is polymorphically equivalent to dropA x 44 / 67
Polymorphic Expressiveness (2) Consider the Extension of N RC with • Cartesian Product e1 × e2 • Join e1 o n e2 Typing rules: (φ1 , φ2 , ψ are record types, + is record type concatenation) T ` e1 : {φ1 } T ` e2 : {φ2 } φ1 and φ2 have disjoint sets of attributes
T ` e1 : {φ1 + ψ} T ` e2 : {φ2 + ψ} φ1 and φ2 have disjoint sets of attributes
T ` e1 × e2 : {φ1 + φ2 }
T ` e1 o n e2 : {φ1 + φ2 + ψ}
45 / 67
Polymorphic Expressiveness (2) Consider the Extension of N RC with • Cartesian Product e1 × e2 • Join e1 o n e2 Typing rules: (φ1 , φ2 , ψ are record types, + is record type concatenation) T ` e1 : {φ1 } T ` e2 : {φ2 } φ1 and φ2 have disjoint sets of attributes
T ` e1 : {φ1 + ψ} T ` e2 : {φ2 + ψ} φ1 and φ2 have disjoint sets of attributes
T ` e1 × e2 : {φ1 + φ2 }
T ` e1 o n e2 : {φ1 + φ2 + ψ}
Theorem • No expression in N RC(=, drop, ×) is polymorphically equivalent to e1 o n e2 . • No expression in N RC(=, drop, o n) is polymorphically equivalent to e1 × e2 .
46 / 67
Polymorphic Expressiveness (3)
Open Research Questions • Is there a reasonable notion when a query language is “polymorphically
complete”? • What operators are needed to obtain such a language?
47 / 67
Typability and Type Inference Two classical problems Typability Input: Problem:
Expression e(x, . . . , y ) Do there exists T and t such that T ` e : t?
Type Inference Input: Problem:
Expression e(x, . . . , y ) Give an explicit description of the set of all typings (T , s) for which T ` e : s?
Practical Motivation: • Complexity of Typability tells us something about the complexity of typechecking queries in implicitly typed programming languages • Type inference is essential for query optimization in the absence of schema information (Kleisli, . . . ). 48 / 67
Typability and Type Inference What notion of type formulae is “just right” for a given query language?
49 / 67
Typability and Type Inference What notion of type formulae is “just right” for a given query language?
Theorem [Buneman and Ohori; 1996] There exists a polynomial time algorithm that, given an expression e(x, . . . , y ) in N RC(=), returns false if e is untypable, and otherwise returns a kinded type formula describing all of e’s typings.
50 / 67
Typability and Type Inference What notion of type formulae is “just right” for a given query language?
Theorem [Buneman and Ohori; 1996] There exists a polynomial time algorithm that, given an expression e(x, . . . , y ) in N RC(=), returns false if e is untypable, and otherwise returns a kinded type formula describing all of e’s typings.
Theorem [R´ emy; 1993] There exists a polynomial time algorithm that, given an expression e(x, . . . , y ) in N RC(=, drop), returns false if e is untypable, and otherwise returns a type formula with row variables describing all of e’s typings.
51 / 67
Typability and Type Inference for N RC(=, drop, ×, o n) Our proposal: a type inference algorithm based on constrained types, in the spirit of HM(X) [Odersky, Sulzmann, and Wehr; 1999]
Signature of principal type formulas A principal type formula for e(x1 , . . . , xn ) is a conjunctive, many-sorted, firstorder logic formula ϕ(x1 , . . . , xn , z) that, interpreted in the structure of all possible types T , defines all typings of e: x1 : s1 , . . . , xn : sn ` e : t
⇔
T |= ϕ(s1 , . . . , sn , t)
Example: principal type formula for x ∪ y (∃u) x = Set(u) ∧ y = Set(u) ∧ z = Set(u)
52 / 67
Typability and Type Inference for N RC(=, drop, ×, o n) The signature of the logic and its interpretation: Symbol x, y , . . . ρ, ρ0 , . . . = ⊆ # Set Record A ,
Arity type row type × type row × row row × row type → type row → type type → row row × row → row
Interpretation in T
equality relation on types containment of functions e.g. A : s ⊆ B : t, A : s relates rows with disjoint domains maps s to {s} maps A : s, . . . , B : t to (A : s, . . . , B : t) maps s to A : s maps (ρ1 , ρ2 ) to ρ1 ∪ π ˆdom(ρ1 ) (ρ2 )
Example: principal type formula for {v .A | v ∈ (x × y )} (∃ρ)(∃ρ0 ) x = Set(Record(ρ)) ∧ y = Set(Record(ρ0 )) ∧ ρ # ρ0 ∧ (∃u) z = Set(u) ∧ A(u) ⊆ ρ, ρ0
53 / 67
Typability and Type Inference for N RC(=, drop, ×, o n)
Theorem Every N RC(=, drop, ×, o n) expression e has a principal type formula ϕe , of size linear in the size of e, and computable from e in polynomial time.
54 / 67
Typability and Type Inference for N RC(=, drop, ×, o n)
Theorem Every N RC(=, drop, ×, o n) expression e has a principal type formula ϕe , of size linear in the size of e, and computable from e in polynomial time.
Main application: Typability of N RC(=, drop, ×, o n)-expressions is in NP.
55 / 67
Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is in NP. Proof sketch: Step 1. Expression e → principal type formula ϕe e is typable
⇔
ϕe is satisfiable
Example: {v .A | v ∈ (x × y )} ∪ {v .B | v ∈ x} ⇔ (∃u)(∃ρ1 )(∃ρ2 ) z = Set(u) ∧ x = Set(Record(ρ1 )) ∧ y = Set(Record(ρ2 )) ∧ ρ1 # ρ2 ∧ A(u) ⊆ ρ1 , ρ2 ∧ B(u) ⊆ ρ1
56 / 67
Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is in NP. Proof sketch: Step 2. Principal type formula ϕe → quantifier free formula ψe ϕe is satisfiable
⇔
ψe is satisfiable
Example: (∃u)(∃ρ1 )(∃ρ2 ) z = Set(u) ∧ x = Set(Record(ρ1 )) ∧ y = Set(Record(ρ2 )) ∧ ρ1 # ρ2 ∧ A(u) ⊆ ρ1 , ρ2 ∧ B(u) ⊆ ρ1 ⇔ z = Set(u) ∧ x = Set(Record(ρ1 )) ∧ y = Set(Record(ρ2 )) ∧ ρ1 # ρ2 ∧ A(u) ⊆ ρ1 , ρ2 ∧ B(u) ⊆ ρ1 57 / 67
Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is in NP. Proof sketch: Step 3. Quantifier free formula ψe → guess quantifier free formula θe without row variables ψe is satisfiable ⇔ θe is satisfiable Example: z = Set(u) ∧ x = Set(Record(ρ1 )) ∧ y = Set(Record(ρ2 )) ∧ ρ1 # ρ2 ∧ A(u) ⊆ ρ1 , ρ2 ∧ B(u) ⊆ ρ1 ⇔ z = Set(u) ∧ x = Set(Record(B(u1 ))) ∧ y = Set(Record(A(u2 ))) ∧ B(u1 ) # A(u2 ) ∧ A(u) ⊆ B(u1 ), A(u2 ) ∧ B(u) ⊆ B(u1 ) 58 / 67
Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is in NP. Proof sketch: Step 4. Quantifier free formula θe without row variables → simplified formula σe θe is satisfiable
⇔
σe is satisfiable
Example: z = Set(u) ∧ x = Set(Record(B(u1 ))) ∧ y = Set(Record(A(u2 ))) ∧ B(u1 ) # A(u2 ) ∧ A(u) ⊆ B(u1 ), A(u2 ) ∧ B(u) ⊆ B(u1 ) ⇔ z = Set(u) ∧ x = Set(Record(B(u1 ))) ∧ y = Set(Record(A(u2 ))) ∧ A(u) ⊆ B(u1 ), A(u2 ) ∧ B(u) ⊆ B(u1 ) 59 / 67
Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is in NP. Proof sketch: Step 4. Quantifier free formula θe without row variables → simplified formula σe θe is satisfiable
⇔
σe is satisfiable
Example: z = Set(u) ∧ x = Set(Record(B(u1 ))) ∧ y = Set(Record(A(u2 ))) ∧ A(u) ⊆ B(u1 ), A(u2 ) ∧ B(u) ⊆ B(u1 ) ⇔ z = Set(u) ∧ x = Set(Record(B(u1 ))) ∧ y = Set(Record(A(u2 ))) ∧ u = u2 ∧ u = u1 The latter formula can efficiently be solved by unification
60 / 67
Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is NP-hard. Proof sketch: By a reduction from Positive one-in-three 3SAT. Abbreviate πA (e) := {v .A | v ∈ e}.
61 / 67
Typability and Type Inference for N RC(=, drop, ×, o n) Theorem: Typability of N RC(=, drop, ×, o n)-expressions is NP-hard. Proof sketch: By a reduction from Positive one-in-three 3SAT. Abbreviate πA (e) := {v .A | v ∈ e}. (x1 ∨ y1 ∨ z1 ) ∧ (x2 ∨ y1 ∨ z2 ) ∧ (x2 ∨ y3 ∨ z1 ) is satisfiable ⇔ πA (x1 × y1 × z1 ) ∪ πA (x2 × y1 × z2 ) ∪ πA (x2 × y3 × z1 ) is typable
62 / 67
Typability and Type Inference for N RC(=, drop, ×, o n) What about typability for expressions without cartesian product operator?
63 / 67
Typability and Type Inference for N RC(=, drop, ×, o n) What about typability for expressions without cartesian product operator?
Theorem Typability of NNRC-expressions without cartesian product operator is NP-hard. Proof sketch: • Uses proof idea from Ohori and Buneman (1988), who showed that typability for “generalized join” is NP-hard. • Reduction from Monotone 3SAT.
64 / 67
Typability and Type Inference for N RC(=, drop, ×, o n) What about typability for expressions without cartesian product operator?
Theorem Typability of NNRC-expressions without cartesian product operator is NP-hard. Proof sketch: • Uses proof idea from Ohori and Buneman (1988), who showed that typability for “generalized join” is NP-hard. • Reduction from Monotone 3SAT. Reductions transfer to programming languages with symmetric record concatenation, join, or mixin modules
65 / 67
The complexity of type checking DBPL’s with the N RC(=, drop, ×, o n) as the embedded QL When the ambient language is the simply typed λ-calculus • Type checking moves from P-complete to NP-hard
66 / 67
The complexity of type checking DBPL’s with the N RC(=, drop, ×, o n) as the embedded QL When the ambient language is the simply typed λ-calculus • Type checking moves from P-complete to NP-hard When the ambient language is ML: • Type checking was already Exptime-complete • However, Exptime-hardness is only due to peculiar programs which rarely occur in practice • Type checking ML is typically in linear time in practice • In contrast, NP-hardness for N RC(=, drop, ×, o n) is due to cartesian product and join, which do occur in practice
67 / 67