Practical Course in Biodatabases Lecture 6 – Relational Databases Ilari Scheinin 27.1.2009
JOIN Syntax using multiple tables in a query multiple ways, some maybe more clear than others
id name
persons int(10) unsigned varchar(255)
id
name 1 John 2 Mary
phone_numbers id int(10) unsigned person_id int(10) unsigned type varchar(20) number varchar(20)
id
person_id type 1 1 mobile 2 1 work 3 2 mobile
number 050-123 4567 09-123 4567 040-234 5678
http://dev.mysql.com/doc/refman/5.0/en/join.html Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
26.01.2009
2
JOIN how many numbers does each person have? SELECT name, COUNT(*) FROM persons, phone_numbers GROUP BY name; name COUNT(*) John 3 Mary 3
not what we wanted...
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
26.01.2009
3
JOIN SELECT * FROM persons, phone_numbers; id
name 1 2 1 2 1 2
John Mary John Mary John Mary
id 1 1 2 2 3 3
person_id 1 1 1 1 2 2
type mobile mobile work work mobile mobile
number 050-123 4567 050-123 4567 09-123 4567 09-123 4567 040-234 5678 040-234 5678
cartesian product
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
26.01.2009
4
JOIN we need to specify how to join the two tables together
even if there is a foreign key defined
SELECT name, COUNT(*) FROM persons, phone_numbers WHERE persons.id = phone_numbers.person_id GROUP BY name; name COUNT(*) John 2 Mary 1
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
26.01.2009
5
JOIN SELECT name, COUNT(*) FROM persons JOIN phone_numbers ON persons.id = phone_numbers.person_id GROUP BY name;
name COUNT(*) John 2 Mary 1
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
26.01.2009
6
JOIN
person_id name
persons int(10) unsigned varchar(255)
phone_numbers phone_id int(10) unsigned person_id int(10) unsigned type varchar(20) number varchar(20)
SELECT name, COUNT(*) FROM persons JOIN phone_numbers USING (person_id) GROUP BY name; name COUNT(*) John 2 Mary 1
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
26.01.2009
7
OUTER JOIN what if there are no matching rows? id
name 1 John 2 Mary 3 Jeff
id
person_id type 1 1 mobile 2 1 work 3 2 mobile
number 050-123 4567 09-123 4567 040-234 5678
SELECT name, COUNT(*) FROM persons JOIN phone_numbers ON persons.id = phone_numbers.person_id name COUNT(*) John 2 GROUP BY name; Mary
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
1
26.01.2009
8
OUTER JOIN SELECT name, COUNT(*) FROM persons LEFT [OUTER] JOIN phone_numbers ON persons.id = phone_numbers.person_id name COUNT(*) John 2 GROUP BY name; Mary Jeff
1 1
SELECT name, COUNT(number) FROM persons LEFT [OUTER] JOIN phone_numbers ON persons.id = phone_numbers.person_id name COUNT(*) John 2 GROUP BY name; Mary 1 Jeff Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
0 26.01.2009
9
UPDATE With Multiple Tables UPDATE table_references SET col_name1=expr1 [, col_name2=expr2] … [WHERE where_condition]
http://dev.mysql.com/doc/refman/5.0/en/update.html UPDATE persons JOIN phone_numbers ON persons.id = phone_numbers.person_id SET number = '046-123 4567' WHERE name = 'Mary' and type = 'mobile';
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
26.01.2009
10
DELETE With Multiple Tables DELETE tbl_name FROM table_references [WHERE where_condition]
http://dev.mysql.com/doc/refman/5.0/en/delete.html DELETE phone_numbers FROM persons JOIN phone_numbers ON persons.id = person_id WHERE persons.name = 'John';
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
26.01.2009
11
Ensembl Genome Browser “The European” genome database 49 species new versions released ~ every two months
can introduce changes to the database schema and/or APIs (Perl)
current version is Release 52 (9.12.2008)
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
26.01.2009
12
Ensembl Genome Browser uses a MySQL relational database
http://www.ensembl.org/info/docs/api/core/schema/ index.html
ensembldb.ensembl.org - port 3306 up to version 47 - port 5306 from 48 onwards
martdb.ensembl.org - port 3316 up to version 47 - port 5316 from 48 onwards
ftp://ftp.ensembl.org/pub/
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
26.01.2009
13
Ensembl Schema
gene_stable_id gene_id int(10) unsigned stable_id varchar(20) version int(10) created_date datetime modified_data datetime
gene gene_id int(10) unsigned biotype varchar(40) analysis_id smallint(5) unsigned seq_region_id int(10) unsigned seq_region_start int(10) unsigned seq_region_end int(10) unsigned seq_region_strand tinyint(2) display_xref_id int(10) unsigned source varchar(20) status enum description text is_current tinyint(1) canonical_transcript_id int(10) unsigned canonical_annotation varchar(255)
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
26.01.2009
14
Ensembl IDs gene_id vs. gene_stable_id.stable_id gene_ids (etc) only "technical" ids
used for e.g. joining tables together
can (and do) change
stable_ids are "the Ensembl IDs"
the ones you see on the web page
e.g. ENSG00000208579
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
26.01.2009
15
Ensembl Schema gene gene_id int(10) unsigned biotype varchar(40) analysis_id smallint(5) unsigned seq_region_id int(10) unsigned seq_region_start int(10) unsigned seq_region_end int(10) unsigned seq_region_strand tinyint(2) display_xref_id int(10) unsigned source varchar(20) status enum description text is_current tinyint(1) canonical_transcript_id int(10) unsigned canonical_annotation varchar(255)
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
seq_region seq_region_id int(10) unsigned name varchar(40) coord_system_id int(10) unsigned length int(10)
coord_system coord_system_id int(10) unsigned species_id int(10) unsigned name varchar(40) version varchar(255) rank int(11) attrib set
26.01.2009
16
Ensembl Schema
gene gene_id int(10) unsigned biotype varchar(40) analysis_id smallint(5) unsigned seq_region_id int(10) unsigned seq_region_start int(10) unsigned seq_region_end int(10) unsigned seq_region_strand tinyint(2) display_xref_id int(10) unsigned source varchar(20) status enum description text is_current tinyint(1) canonical_transcript_id int(10) unsigned canonical_annotation varchar(255)
transcript transcript_id int(10) unsigned gene_id int(10) unsigned analysis_id smallint(5) unsigned seq_region_id int(10) unsigned seq_region_start int(10) unsigned seq_region_end int(10) unsigned seq_region_strand tinyint(2) display_xref_id int(10) unsigned biotype varchar(40) status enum description text is_current tinyint(1)
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
26.01.2009
17
Ensembl Schema
transcript transcript_id int(10) unsigned gene_id int(10) unsigned analysis_id smallint(5) unsigned seq_region_id int(10) unsigned seq_region_start int(10) unsigned seq_region_end int(10) unsigned seq_region_strand tinyint(2) display_xref_id int(10) unsigned biotype varchar(40) status enum description text is_current tinyint(1)
exon_transcript exon_id int(10) unsigned transcript_id int(10) unsigned rank int(10)
exon exon_id int(10) unsigned seq_region_id int(10) unsigned seq_region_start int(10) unsigned seq_region_end int(10) unsigned seq_region_strand tinyint(2) phase tinyint(2) end_phase tinyint(2) is_current tinyint(1)
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
26.01.2009
18
MySQL Client Software option 1: command line mysql -h host -P port -u username -p database_name mysql -h 127.0.0.1 -P 13403 -u anonymous homo_sapiens_core_52_36n
echo "alias ensembl='mysql -h 127.0.0.1 -P 13403 -u anonymous homo_sapiens_core_52_36n'" >>~/.bash_profile
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
26.01.2009
19
MySQL Client Software option 2: MySQL Query Browser
http://dev.mysql.com/downloads/gui-tools/5.0.html
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
26.01.2009
20
Workshop up-to-date connection instructions: https://www.cs.helsinki.fi/i/ischeini/biodb/
Ilari Scheinin, Practical Course in Biodatabases, Lecture 6 – Relational Databases
26.01.2009
21