Supplemental Data. Induced Pluripotent Stem Cells. and Embryonic Stem Cells Are Distinguished. by Gene Expression Signatures

Cell Stem Cell, Volume 5 Supplemental Data Induced Pluripotent Stem Cells and Embryonic Stem Cells Are Distinguished by Gene Expression Signatures Ma...
Author: Clarence Norris
12 downloads 1 Views 2MB Size
Cell Stem Cell, Volume 5

Supplemental Data Induced Pluripotent Stem Cells and Embryonic Stem Cells Are Distinguished by Gene Expression Signatures Mark H. Chin, Mike J. Mason, Wei Xie, Stefano Volinia, Mike Singer, Cory Peterson, Gayane Ambartsumyan, Otaren Aimiuwu, Laura Richter, Jin Zhang, Ivan Khvorostov, Vanessa Ott, Michael Grunstein, Neta Lavon, Nissim Benvenisty, Carlo M. Croce, Amander T. Clark, Tim Baxter, April D. Pyle, Mike A. Teitell, Matteo Pelegrini, Kathrin Plath, and William E. Lowry

Table S1. Cell lines derived or cultured in our lab and summary of their respective passages for each type of analysis

Figure S1. Defining the state of hiPSCs relative to hESCs at different passages by Pearson correlation. (A) A correlation matrix comparing genome-wide mRNA expression data in all cell lines (hESCs, hiPSCs and fibroblasts (fibr)) used in our lab and in this study based on a Pearson’s rho measurement. hiPSC lines are broken up into early passage (e) and late passage (l) hiPSCs. The correlation is presented in pseudocolor with red representing the highest correlation and green the lowest. (B) As in (A) except that hESC, e-hiPSC, l-hiPSC and fibroblast (fibr) expression data from individual cell lines were averaged before correlation analysis was performed. The expression profile for l-hiPSC is more highly correlated to hESC lines than to e-hiPSC lines as determined using Fisher’s z-transformation comparison of correlations (z = 0).

Figure S2. The e-hiPSC expression signature is not found with early passage hESC. Averaged gene expression data from early passage hiPSC (e-hiPSC) and late-passage hiPSCs (l-hiPSC) lines were compared to gene expression of independently cultured hESC lines from two different early passages (P5 and P28) using Pearson correlation.

Figure S3. Defining the hiPSC expression signature classes at different passages by Pearson correlation. (A) Correlation matrix of averaged expression data from the indicated cell types based on Pearson’s rho for the genes defined as significantly different between hESCs and ehiPSCs in Fig. 1B (3,947 genes). In all matrices, the correlation is presented in pseudocolor with red representing the highest correlation and green the lowest. (B) A similar correlation matrix as (A) using genes defined as significantly different between hESC and late passage (l-) hiPSC (860 genes of Fig. 1C). (C) A similar correlation matrix as (B) using genes shared between the gene groups defined in Figs 1B and 1C (318 genes of Fig. 1D).

Figure S4. Expression of hESC-specific marker genes is expressed significantly lower in e-hiPSC than l-hiPSC. (A) 48 genes hESC-specific genes (Lowry, et al., 2008) are expressed at a significantly lower level in early pasage (e)-hiPSCs compared to hESC (p37, p41 & p50) and late passage (l)-hiPSCs. * indicates a P-value < 10-5, as determined using a Wilcoxon signed rank test. The expression difference between hESCs and l-hiPSCs is not significant when applying the same test. An analysis of early (P5) and middle (P28) passage of hESC lines shows that changes in expression of these hESC-specific genes are not normally associated with extended culturing of the hESCs (P-value is not significant). As expected, the hESC-specific genes are significantly upregulated from the fibroblast state in all pluripotent cell lines (all expression differences between hESCs or hiPSCs compared to fibroblasts have P-values < 10-15). All expression ratios are relative to the mean expression value for a gene across all cell lines. (B) Expression data for the ORF regions for SOX2, OCT4 and c-MYC. Normalized expression values for indicated Affymetrix probes located within the ORF regions of SOX2, OCT4 and c-MYC are given as averages for each indicated cell types. Standard deviation error bars represent the variation observed between the different cell lines within one category. These data indicate that SOX2 and OCT4 are less abundant in ehiPSC than hESC, but recover to hESC levels in l-hiPSCs. Error bars represent standard deviation.

Figure S5. Genes found to be upregulated by expression microarray are also differentially expressed between hESCs and l-hiPSCs at the protein level. Note that CAT was shown to be differentially expressed both in early hiPSCs and late hiPSCs.

Figure S6. hESCs and l-hiPSCs have similar cell cycle profiles. hESCs and l-hiPSCs grown at similar densities were isolated from feeders, fixed, and subjected to cell cycle analysis. The percentage of cells in each stage of the cell cycle was quantified for each cell line. These data are representative of three independent experiments. Error bars represent standard deviation.

Figure S7. Genes that are differentially expressed between hiPSCs and hESCs, irrespective of passage (common-hiPSC signature genes), are characterized by the most dramatic expression changes between hESCs and fibroblasts. The expression differences between hESCs and fibroblasts (absolute value of the fold change) are presented in box plots for early (e)- and late (l)-hiPSC signature genes (boxes 1 and 2, respectively) as well as for common hiPSC signature genes (box 3). On average, a larger expression difference between fibroblasts and hESCs exists for ehiPSC signature genes than for l-hiPSC signature genes (*P = 0.0013, as determined using the Wilcoxon rank sum test). However, genes in common to the early and late hiPSC signature genes display a significantly larger expression difference between hESC and fibroblasts than either l-hiPSC and e-hiPSC signature genes (**P < 10-11, ***P < 10-9, respectively). All hiPSC gene groups are significantly different from the foldchange observed (hESC/fibr) for all genes on the array (P < 10-28).

Figure S8. Differential expression patterns between hESC and hiPSC are conserved among independent reprogramming experiments. As described in Figure 2 of the accompanying manuscript, iPSC signature genes defined as genes differentially expressed between hiPSC and hESC lines (Student’s ttest (P < 0.05) and at least a 1.5 fold-change) were obtained from from Fig 1 of this study (Chin et al) and from additional published reprogramming experiments. Each set of signature genes was divided into two groups. Group 1 contains those genes that are more highly expressed in ESCs versus iPSCs, and group 2 those that are more more abundant in iPSCs compared to ESCs. The matrix in (A) summarizes the overlap of hiPSC signature genes of group 1 between the different experiments, while the matrix in (B) summarizes the overlap for group 2 genes. The values on the diagonal designate the total number of genes identified as significantly different in expression between hESCs and hiPSCs. The intersection of the rows and columns give the number of genes that are in common between the two respective experiments and the corresponding significance (P-value) as determine using Fisher’s exact test. For the Soldner et al experiment [20] data were analyzed before (2lox) and after (1lox) the excision of the reprogramming factors. Similarly, for Yu et al. [21], the iPSCs were generated with episomal vectors and analyzed before (episomal) and after subcloning. Genomic integrations were not detected for any of these subclones. The Maherali iPSCs [19] were reprogrammed with integrating lentiviruses.

Figure S9. Gene ontology analysis of human and mouse iPSC signature genes from different labs and experiments. (A) Comparison of all signficant Gene Ontology terms for hiPSC signature genes derived for the indicated datasets (from Fig. 2). For each presented functional group, at least one dataset had to have an enrichment score of 3 (P < 0.001), which was used as the signifcance threshold for the analyses. (B) Similar GO analysis as in (A) for mouse iPSC signature genes of the Mikkelsen, et al. (2008) (Fig. S9A) and Maherali, et al. (2007) (Fig. 3) datasets.

Figure S10. Mouse reprogramming experiments using different cell types of origin yield an overlapping iPSC expression signature. (A) Overlap of miPSC signature genes (those differentially expressed between mESCs and miPSCs) from mouse fibroblast and B-cell reprogramming experiments (Maherali, et al., 2007, and Mikkelsen, et al., 2008, respectively). The overlap is signifcant as determined using the Fisher’s exact test. (B) Similar to Figure 3C but using the B-cell derived miPSC to mESC comparison of the Mikkelsen, et al. (2008), dataset. Genes found to be significantly different between mESCs and B-cell miPSCs were divided into those expressed at higher levels in mESC than miPSC, or vice versa. Each of these two groups was further sub-classified into two groups, either more highly expressed in mESC than fibroblasts (red color) or those more highly expressed in fibroblasts than hESC (blue color). (C) As in (A) except comparing e-hiPSC signature genes from Chin, et al. to the B-cell miPSC signature genes from Mikkelsen, et al. The overlap was determined using Fisher’s exact test.

Figure S11. E-class genes are highly similar in their H3K27 trimethylation pattern between hESC and hiPSC lines. Pearson correlation of H3K27 trimethylation for E-class genes between the indicated cell types. The promoter methylation data were divided into sixteen 500bp windows and Pearson’s rho correlation values plotted as a function of the distance from the transcription start site (TSS). The TSS resides between bin 11 and 12.

Figure S12. The classification of signature genes based on the H3K27me3 methylation pattern is highly significant. Differentially methylated genes between hESC and fibroblast lines were determined at different stringencies and classified according to their pattern in late passage hiPSC as E (hESC-like), N (neutral) or F (fibroblast-like) class genes (P = 0.01, P = 0.05, P = 0.1) (experimental data). To validate the classification of differentially methylated loci, the data was permuted 1000 times, and randomly assigned to hESC-fibroblast pairs. Signature genes were then re-classified at different stringencies (P = 0.01, P = 0.05, P = 0.1) (bootstrap). Error bars represent standard deviation.

Figure S13. Histones H3K27me3 methylation patterns of l-iPSC signature genes that are differentially methylated when comparing ESCs and fibroblasts. Hierarchical clustering of H3K27me3 patterns for the (A) 40 late passage hiPSC signature genes that were determined to be differentially methylated at K27 between fibroblasts and ESCs, and (B) for the 21 genes contained in (A) that are part of the enduring hiPSC signature. For the methylation data, each row represents the -5.5kb to +2.5kb promoter region with respect to the transcription start site (TSS) for the indicated cell type. The 8kb regions are divided into sixteen 500bp regions displayed in pseudocolor based on the average log ratio of the IP to input probe signal intensity. Probes within a give 500bp region are averaged. Dark grey coloring indicates missing values for enrichment due to the lack of probes. The left most column presents a heatmap of the log2 expression ratio between hESC and l-iPSC. Note that even though these genes are differentially expressed between ESCs and iPSC, their methylation pattern appears to reset to an ESC level in l-iPSC indicating that this methylation mark does not explain the expression differences between ESC and l-iPSC or that subtle differences in methylation that are not detected by our analysis could be responsible.

Figure S14. Histones H3K4me3 and H3K27me3 methylation patterns of hESC signature genes. Hierarchical clustering of H3K4me3 and H3K27me3 patterns for the ~50 hESC signature genes used by Lowry, et al. (PNAS, 2008). Each row represents the -5.5kb to +2.5kb promoter region with respect to the transcription start site (TSS). The 8kb regions are divided into sixteen 500bp regions displayed in pseudocolor based on the average log ratio of the IP to input probe signal intensity. Probes within a give 500bp region are averaged. Dark grey coloring indicates missing values for enrichment due to the lack of probes. Note that most of these genes are dramatically differentially expressed between hESC and fibroblasts.

Figure S15. aCGH analysis of l-hiPSC lines. Examples of aCGH data from chromosomal regions that have significant deviation from the genomic background (fibroblasts) signal (Z-score > 18). Hybridizations were conducted for the indicated l-hiPSC line relative to the fibroblast cell line (NHDF1). All abnormalities represent deletions which appear above the averaged line. Note: Regions with extremely high Z-scores (> 45) show a “steps” in deviation from the midline, whereas lower Z-scores appear as “hills”.

 

Suggest Documents