Practical Course in Biodatabases Lecture 6 Relational Databases. Ilari Scheinin

Practical Course in Biodatabases Lecture 6 – Relational Databases Ilari Scheinin 27.1.2009 JOIN Syntax  using multiple tables in a query  multiple...
2 downloads 0 Views 276KB Size
Practical Course in Biodatabases Lecture 6 – Relational Databases Ilari Scheinin 27.1.2009

JOIN Syntax  using multiple tables in a query  multiple ways, some maybe more clear than others

id name

persons int(10) unsigned varchar(255)

id

name 1 John 2 Mary

phone_numbers id int(10) unsigned person_id int(10) unsigned type varchar(20) number varchar(20)

id

person_id type 1 1 mobile 2 1 work 3 2 mobile

number 050-123 4567 09-123 4567 040-234 5678

 http://dev.mysql.com/doc/refman/5.0/en/join.html Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

26.01.2009

2

JOIN  how many numbers does each person have?  SELECT name, COUNT(*) FROM persons, phone_numbers GROUP BY name; name COUNT(*) John 3 Mary 3

 not what we wanted...

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

26.01.2009

3

JOIN  SELECT * FROM persons, phone_numbers; id

name 1 2 1 2 1 2

John Mary John Mary John Mary

id 1 1 2 2 3 3

person_id 1 1 1 1 2 2

type mobile mobile work work mobile mobile

number 050-123 4567 050-123 4567 09-123 4567 09-123 4567 040-234 5678 040-234 5678

 cartesian product

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

26.01.2009

4

JOIN  we need to specify how to join the two tables together 

even if there is a foreign key defined

 SELECT name, COUNT(*) FROM persons, phone_numbers WHERE persons.id = phone_numbers.person_id GROUP BY name; name COUNT(*) John 2 Mary 1

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

26.01.2009

5

JOIN  SELECT name, COUNT(*) FROM persons JOIN phone_numbers ON persons.id = phone_numbers.person_id GROUP BY name;

name COUNT(*) John 2 Mary 1

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

26.01.2009

6

JOIN

person_id name

persons int(10) unsigned varchar(255)

phone_numbers phone_id int(10) unsigned person_id int(10) unsigned type varchar(20) number varchar(20)

 SELECT name, COUNT(*) FROM persons JOIN phone_numbers USING (person_id) GROUP BY name; name COUNT(*) John 2 Mary 1

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

26.01.2009

7

OUTER JOIN  what if there are no matching rows? id

name 1 John 2 Mary 3 Jeff

id

person_id type 1 1 mobile 2 1 work 3 2 mobile

number 050-123 4567 09-123 4567 040-234 5678

 SELECT name, COUNT(*) FROM persons JOIN phone_numbers ON persons.id = phone_numbers.person_id name COUNT(*) John 2 GROUP BY name; Mary

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

1

26.01.2009

8

OUTER JOIN  SELECT name, COUNT(*) FROM persons LEFT [OUTER] JOIN phone_numbers ON persons.id = phone_numbers.person_id name COUNT(*) John 2 GROUP BY name; Mary Jeff

1 1

 SELECT name, COUNT(number) FROM persons LEFT [OUTER] JOIN phone_numbers ON persons.id = phone_numbers.person_id name COUNT(*) John 2 GROUP BY name; Mary 1 Jeff Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

0 26.01.2009

9

UPDATE With Multiple Tables  UPDATE table_references SET col_name1=expr1 [, col_name2=expr2] … [WHERE where_condition]

 http://dev.mysql.com/doc/refman/5.0/en/update.html  UPDATE persons JOIN phone_numbers ON persons.id = phone_numbers.person_id SET number = '046-123 4567' WHERE name = 'Mary' and type = 'mobile';

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

26.01.2009

10

DELETE With Multiple Tables  DELETE tbl_name FROM table_references [WHERE where_condition]

 http://dev.mysql.com/doc/refman/5.0/en/delete.html  DELETE phone_numbers FROM persons JOIN phone_numbers ON persons.id = person_id WHERE persons.name = 'John';

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

26.01.2009

11

Ensembl Genome Browser  “The European” genome database  49 species  new versions released ~ every two months 

can introduce changes to the database schema and/or APIs (Perl)

 current version is Release 52 (9.12.2008)

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

26.01.2009

12

Ensembl Genome Browser  uses a MySQL relational database 

http://www.ensembl.org/info/docs/api/core/schema/ index.html



ensembldb.ensembl.org - port 3306 up to version 47 - port 5306 from 48 onwards



martdb.ensembl.org - port 3316 up to version 47 - port 5316 from 48 onwards



ftp://ftp.ensembl.org/pub/

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

26.01.2009

13

Ensembl Schema

gene_stable_id gene_id int(10) unsigned stable_id varchar(20) version int(10) created_date datetime modified_data datetime

gene gene_id int(10) unsigned biotype varchar(40) analysis_id smallint(5) unsigned seq_region_id int(10) unsigned seq_region_start int(10) unsigned seq_region_end int(10) unsigned seq_region_strand tinyint(2) display_xref_id int(10) unsigned source varchar(20) status enum description text is_current tinyint(1) canonical_transcript_id int(10) unsigned canonical_annotation varchar(255)

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

26.01.2009

14

Ensembl IDs  gene_id vs. gene_stable_id.stable_id  gene_ids (etc) only "technical" ids 

used for e.g. joining tables together



can (and do) change

 stable_ids are "the Ensembl IDs" 

the ones you see on the web page



e.g. ENSG00000208579

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

26.01.2009

15

Ensembl Schema gene gene_id int(10) unsigned biotype varchar(40) analysis_id smallint(5) unsigned seq_region_id int(10) unsigned seq_region_start int(10) unsigned seq_region_end int(10) unsigned seq_region_strand tinyint(2) display_xref_id int(10) unsigned source varchar(20) status enum description text is_current tinyint(1) canonical_transcript_id int(10) unsigned canonical_annotation varchar(255)

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

seq_region seq_region_id int(10) unsigned name varchar(40) coord_system_id int(10) unsigned length int(10)

coord_system coord_system_id int(10) unsigned species_id int(10) unsigned name varchar(40) version varchar(255) rank int(11) attrib set

26.01.2009

16

Ensembl Schema

gene gene_id int(10) unsigned biotype varchar(40) analysis_id smallint(5) unsigned seq_region_id int(10) unsigned seq_region_start int(10) unsigned seq_region_end int(10) unsigned seq_region_strand tinyint(2) display_xref_id int(10) unsigned source varchar(20) status enum description text is_current tinyint(1) canonical_transcript_id int(10) unsigned canonical_annotation varchar(255)

transcript transcript_id int(10) unsigned gene_id int(10) unsigned analysis_id smallint(5) unsigned seq_region_id int(10) unsigned seq_region_start int(10) unsigned seq_region_end int(10) unsigned seq_region_strand tinyint(2) display_xref_id int(10) unsigned biotype varchar(40) status enum description text is_current tinyint(1)

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

26.01.2009

17

Ensembl Schema

transcript transcript_id int(10) unsigned gene_id int(10) unsigned analysis_id smallint(5) unsigned seq_region_id int(10) unsigned seq_region_start int(10) unsigned seq_region_end int(10) unsigned seq_region_strand tinyint(2) display_xref_id int(10) unsigned biotype varchar(40) status enum description text is_current tinyint(1)

exon_transcript exon_id int(10) unsigned transcript_id int(10) unsigned rank int(10)

exon exon_id int(10) unsigned seq_region_id int(10) unsigned seq_region_start int(10) unsigned seq_region_end int(10) unsigned seq_region_strand tinyint(2) phase tinyint(2) end_phase tinyint(2) is_current tinyint(1)

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

26.01.2009

18

MySQL Client Software  option 1: command line  mysql -h host -P port -u username -p database_name  mysql -h 127.0.0.1 -P 13403 -u anonymous homo_sapiens_core_52_36n

 echo "alias ensembl='mysql -h 127.0.0.1 -P 13403 -u anonymous homo_sapiens_core_52_36n'" >>~/.bash_profile

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

26.01.2009

19

MySQL Client Software  option 2: MySQL Query Browser 

http://dev.mysql.com/downloads/gui-tools/5.0.html

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

26.01.2009

20

Workshop  up-to-date connection instructions: https://www.cs.helsinki.fi/i/ischeini/biodb/

Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases

26.01.2009

21