Additional data file 3

http://jbiol.com/content/5/4/11 Journal of Biology 2006, Volume 5, Article 11 Reguly and Bretikreutz et al. Additional data file 3 A 175 111 48...
Author: Gilbert Malone
4 downloads 1 Views 637KB Size
http://jbiol.com/content/5/4/11

Journal of Biology 2006,

Volume 5, Article 11

Reguly and Bretikreutz et al.

Additional data file 3

A

175 111

487 MIPS only NO EVIDENCE SHARED LC Only E-TEXT UNAVAILABLE PAPER NOT IN DATABASE OVER REPRESENTED

481

376

1965

2116

B 50 51 18

BIND only NO EVIDENCE SHARED LC Only E-TEXT UNAVAILABLE PAPER NOT IN DATABASE

982

98

1059

C

886 518 3455

SHARED DIP Only NO EVIDENCE LC Only E-TEXT UNAVAILABLE

2494 193

D 22 61 23 79

High Confidence

433

HTP Unable to re-access full text Typographical error Not an interaction (i.e. only indirect evidence) Curator error (i.e. interaction not present, completely wrong gene or misinterpretion)

1769

Supplementary Figure 1 Curation benchmarks for the LC dataset. (a) Comparison of interactions curated in the LC-PI dataset to those curated in the MIPS dataset for 931 shared publications. Categories of non-overlapping interactions and corresponding number of interactions are indicated. (b) Comparison of interactions curated in the LC-PI dataset to those curated in the BIND dataset for 263 shared publications. (c) Comparison of interactions curated in the LC-PI dataset to those curated in the DIP dataset for 1,267 shared publications. (d) Assessment of interaction confidence and curation error rate by re-examination of 2,387 singly validated interactions in the LC-FYI dataset.

Journal of Biology 2006, 5:11

Journal of Biology 2006,

Volume 5, Article 11

A

Reguly and Breitkreutz et al.

http://jbiol.com/content/5/4/11

Biological Component LC-PI

Biological Component LC-GI

vacuole

peroxisome

chromosome

vacuole

golgi

chromosome

peroxisome

golgi

cytoskeleton

cytoskeleton

periphery

periphery

nucleolus

bud

bud

er

nucleus

endosome

endosome

membrane

er

cytoplasm

cytoplasm

unknown

unknown

ribosome

ribosome

mitochondrion

membrane

nucleolus nucleus

mitochondrion

1

1.5

B

2

2.5

3 3.5 Bias

4

4.5

5

1

Biological Process LC-PI

1.5

2

2.5

3 3.5 Bias

4

4.5

5

4.5

5

Biological Process LC-GI

electron transport membrane organization/biogenesis vitamin metabolism meiosis conjugation cell homeostasis sporulation protein catabolism lipid metabolism cytokinesis morphogenesis amino acid and derivative metabolism cell wall organization and biogenesis signal transduction cytoskeleton organization/biogenesis pseudohyphal growth carbohydrate metabolism response to stress ribosome biogenesis and assembly cell cycle cell budding protein modification organelle organization and biogenesis nuclear organization and biogenesis transport unknown vesicle-mediated transport transcription DNA metabolism protein biosynthesis RNA metabolism precursor metabolites and energy cellular respiration

electron transport membrane organization/biogenesis meiosis conjugation cell homeostasis precursor metabolites and energy sporulation lipid metabolism vitamin metabolism cytokinesis morphogenesis amino acid and derivative metabolism cell wall organization and biogenesis protein catabolism signal transduction cytoskeleton organization/biogenesis pseudohyphal growth carbohydrate metabolism response to stress cell cycle cell budding protein modification DNA metabolism nuclear organization and biogenesis transport vesicle-mediated transport transcription organelle organization and biogenesis unknown protein biosynthesis cellular respiration ribosome biogenesis and assembly RNA metabolism

1

1.5

2

2.5

3

3.5

4

4.5

5

1

1.5

2

2.5

C

3

3.5

4

Bias

Bias Biological Function LC-PI

Biological Function LC-GI isomerase ligase helicase peptidase DNA binding signal transducer motor transporter enzyme regulator transferase protein kinase phosphoprotein phosphatase lyase protein binding translation regulator transcription regulator nucleotidyltransferase unknown structural molecule oxidoreductase hydrolase RNA binding

isomerase nucleotidyltransferase helicase peptidase ligase DNA binding signal transducer motor enzyme regulator transferase protein kinase phosphoprotein phosphatase lyase protein binding translation regulator unknown transcription regulator hydrolase RNA binding structural molecule transporter oxidoreductase

1

1.5

2

2.5

3

3.5

4

4.5

5

1

1.5

2

2.5

3

3.5

4

4.5

5

Bias

Bias

Supplementary Figure 2 Distribution of terms in GO categories in LC-PI and LC-GI dataset. (a) Cellular component (localization). (b) Biological process. (c) Molecular Function. GO terms in each category are as listed. Each bar in the box-plot shows the average bias (normalized by the degree) for each term category. Boxes have lines at the lower quartile, median, and upper quartile values. Outliers are indicated with a red cross.

Journal of Biology 2006, 5:11

http://jbiol.com/content/5/4/11

Journal of Biology 2006,

0

1000

1000

2000

2000

3000

3000

4000

4000

5000

5000

1000

2000

3000

4000

0

5000

0

1000

1000

2000

2000

3000

3000

4000

4000

5000

5000

1000

2000

3000

4000

1000

2000

3000

4000

5000

LC-PI ( 3289 , 11334 )

HTP-PI ( 4474 , 11571 ) 0

0

Reguly and Bretikreutz et al.

LC-GI ( 2689 , 8165 )

HTP-GI ( 1454 , 6103 ) 0

0

Volume 5, Article 11

5000

0

1000

2000

3000

4000

5000

Supplementary Figure 3 Relative coverage and overlap of interaction datasets. Dot matrix representations of all interactions in each of the LC and HTP datasets were created and overlaid on the same ordinates, in order of increasing dataset size (HTP-GI < LC-GI < LC-PI < HTP-PI). Each point corresponds to an interaction pair between two genes/proteins. Blank regions correspond to proteins/genes present in the prior dataset(s) but absent from the visualized dataset. Total nodes and interactions are indicated in parentheses for each dataset.

Journal of Biology 2006, 5:11

Journal of Biology 2006,

Volume 5, Article 11

Reguly and Breitkreutz et al.

LC-PI

http://jbiol.com/content/5/4/11

HTP-PI

LC-GI

HTP-GI

5000

Number of Interactions

4000

3000

2000

1000

0

0-4

4-8

8-14

14-21

21-30

30-49

49-76

76-170

170-Inf

2

Abundance (x 10 ) Supplementary Figure 4 Raw distributions of interactions for each indicated dataset as a function of protein abundance. Protein/gene pairs were separated into bins according to protein abundance determined in a large scale study [67]. Abundance is indicated as the estimated number of molecules per cell.

Journal of Biology 2006, 5:11

http://jbiol.com/content/5/4/11

Journal of Biology 2006,

Volume 5, Article 11

Reguly and Bretikreutz et al.

A

Probability Density

0.04 LC-PI HTP-PI Random pairs

0.03

0.02

0.01

0 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Pearson correlation coefficient

B

Probability Density

0.04 LC-GI HTP-GI Random pairs

0.03

0.02

0.01

0 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Pearson correlation coefficient Supplementary Figure 5 Expression correlation for interaction pairs in LC versus HTP datasets. (a) LC-PI and HTP-PI datasets. (b) LC-GI and HTP-GI datasets. Probability densities of the average Pearson correlation coefficient (PCC) were calculated from a global expression profiling compendium representing over 300 different conditions [71].

Journal of Biology 2006, 5:11

Journal of Biology 2006,

Volume 5, Article 11

Reguly and Breitkreutz et al.

http://jbiol.com/content/5/4/11

2,934 Hits

2,320 Baits

LC-PI

HTP-PI

LC-PI & HTP-PI

Supplementary Figure 6 Dense regions in the physical interaction network. The combined LC-PI and HTP-PI datasets were hierarchically clustered in two dimensions (corresponding to bait and prey in interaction pair). White pixels represent LC interactions, red pixels represent HTP interactions and yellow pixels represent overlap between LC and HTP interactions. In order to emphasize complexes in the largely empty potential interaction space, pixel size and spread in the clustergram was scaled non-linearly as a function of pixel density. The following GO biological process categories were enriched in the overlap (P < 0.001): DNA metabolism, RNA metabolism, cell cycle, cytoskeleton organization and biogenesis, nuclear organization and biogenesis, ribosome biogenesis and assembly, and transcription. The following GO function categories were enriched in the overlap (P < 0.001): DNA binding, RNA binding, enzyme regulator, nucleotidyltransferase, phosphoprotein phosphatase, protein binding, protein kinase and transcription regulator.

Journal of Biology 2006, 5:11