EXPERIMENTAL VALIDATION OF MICROARRAY DATA. Patrick Tan

EXPERIMENTAL VALIDATION OF MICROARRAY DATA Patrick Tan Dept of Cellular and Molecular Research, National Cancer Centre/Defence Medical and Environmen...
Author: Angela Robinson
1 downloads 1 Views 909KB Size
EXPERIMENTAL VALIDATION OF MICROARRAY DATA

Patrick Tan Dept of Cellular and Molecular Research, National Cancer Centre/Defence Medical and Environmental Research Institute, Singapore 169610, Republic of Singapore E-mail : [email protected]

In this chapter, we describe a few popular experimental methodologies that can be used to independently validate the results of a microarray experiment. In contrast to many of the other chapters in this book, the techniques described here are decidedly non-mathematical or statistical in nature. Instead, most of them utilize standard ‘wet-bench’ molecular biology protocols that have proven themselves over the years to be highly robust and reproducible. Due to space limitations, it is obviously not possible to comprehensively describe the underlying theories and in-depth technical issues associated with each method. However, for each methodology, I have provided a brief schematic description that should sufficiently illustrate how the methodology works, and more importantly what it is meant to test. For readers who are interested in further investigating these techniques, there are several excellent molecular biology textbooks available in the literature (1). Better yet, some readers (presumably from a computer science or mathematical background) may pick up the challenge and decide to spend some time in a molecular biology laboratory!

Contents This chapter is divided into four major sections. In the first section, we introduce the concept of experimental validation, and discuss its importance, aims, and technical challenges specifically with regard to validating microarray data. In the second section, we briefly describe a typical microarray experiment, so as to illustrate the typical output generated from such experiments that will serve as the target for subsequent validation. Third, we describe a few popular techniques of performing these validations, ranging from the ‘quick and dirty’ (in silico validation) to more involved techniques (quantitative PCR). Finally, we conclude this chapter with some speculations on key issues that experimental validation techniques will face in the near future.

1. Experimental Validation – Introduction, Aims, and Challenges Validation :

1. To declare or make legally valid. 2. To mark with an indication of official sanction. 3. To establish the soundness of; corroborate. (from Webster’s Dictionary)

1.1

Introduction

In the fast-paced world of genomic research, the process of experimentally validating microarray data is often performed (and thought of) as a side issue in many microarray laboratories. Indeed, unlike many early papers in the field that typically included a series of validation experiments [2], there are now an increasing number of microarray-related papers in the literature have apparently forgone performing such ‘wet-bench’ validations altogether. In one respect, this could be viewed as an indication that the microarray platform has successfully evolved to a new state of accepted technological maturity with regards to reliability and consistency. There is some truth to this, particularly with the increasing availability of inexpensive commercially-

fabricated microarrays, which are typically of high quality and display minimal chip to chip variance. Nevertheless, for the aspiring microarray researcher, it still remains highly useful to be familiar with some of the more popular experimental techniques available that can be used to establish the ‘robustness’ and scientific validity of a particular microarray finding. Some of these techniques are described in this chapter. 1.2

Aims of Validation

Before we enter into the specifics of these descriptions, however, it is worth addressing some of the common general questions associated with the validation exercise. Such questions include : When should one perform an experimental validation? The answer to this question can range from the trivial, such as performing a few validations to appease a particularly nit-picky reviewer of a scientific manuscript, to more serious long-term concerns, such as identifying the handful of genes from a large-scale expression data set for further functional analysis and characterization. In essence, an experimental validation should be performed when there is a need to independently establish the ‘robustness’ or ‘reliability’ of a microarray finding. Why should one perform an experimental validation? Perhaps the most significant reason as to why experimental validations are essential is because, depending on the particular technological platform used, microarray data is inherently noisy. The large numbers of gene expression measurements obtained in a typical microarray experiment, by virtue of their sheer numbers, can often yield significant numbers of false positive and negative results. Such artefacts can arise due to numerous experimental and technical reasons, including (but not limited to) : spurious signals caused by microarray probes cross-hybridizing to related transcripts of similar sequence similarity, mistakes in the assignment of probe identities (this is a particularly significant concern when dealing with cDNA microarrays, as many cDNA clone libraries invariably contain a certain degree of mistaken assignments), artefacts

induced by the sample preparation technique (eg preferential enrichment of certain transcripts during RNA amplification procedures, degradation of RNA during the extraction process), and artefacts caused by the hybridisation procedure itself (eg biased hybridisations due to the Cy3 or Cy5 flourophore used). All these examples point to the fact that microarray technology is still very much an evolving field, and as such, it is essential to have the resources to perform at least a measure of independent validation of the microarray result. In addition, it is also worth noting that in certain cases, the validation exercise may also lead to unexpected novel biological findings. Thus, it would be a fallacy to regard such validation exercises as mere inconvenient confirmations of the microarray data. 1.3

Challenges of Validation

What are some the Challenges encountered during the validation process? Since many of the validation techniques described in this chapter rely upon fairly conventional (and hence reliable) molecular biology techniques, a dilemma often arises that while microarray experiments are almost by definition high-throughput in nature, almost all of the currently available validation techniques are not. Hence, the selection of ‘which gene to validate’ is often performed in an ad-hoc manner, usually because the said gene may be involved in a particular cellular process of interest to the researcher. Another challenge lies in that many microarray platforms and validation techniques measure gene expression values across different scales and reference values – for example, while it is possible to use global normalization techniques as a reference to compare microarray gene expression measurements across different microarrays, validation techniques such as quantitative or semiquantiative PCR typically employ a set of ‘housekeeping’ or ‘control’ genes as a reference. Another challenge lies when the validation attempt relies on quantifying the protein levels of a gene has been predicted by the microarray data to be differentially regulated : there is a notorious lack of correlation between the proteome and the transcriptosome, and so a failure to observe a corresponding regulation at the protein level may lead to an erroneous conclusion that the microarray data is incorrect.

Finally, a more practical concern is the monetary cost involved in such validation exercises – while the reagents required to do a basic validation are fairly inexpensive, more sophisticated techniques, such as quantitative PCR, require specialized equipment and reagents, which can become quite expensive if one wants to validate multiple genes. Besides cost, it may also be highly time-intensive and laborious to manufacture and design such reagents (eg quantitative PCR probes and protein antibodies) if they are not already available. Thus, the validation of microarray data is an exercise not to be taken lightly, and its success will undoubtedly benefit from careful pre-planning before performing the experiment. 2.

A Typical Scenario for Experimental Validation

A schematic scenario for a typical microarray experiment is presented in Figure 1. Briefly, a series of biological samples are initially selected, prepared, and processed, culminating in their hybridization onto the microarray (in this case, we are assuming the microarray will be used for mRNA expression profiling). The hybridized microarray is then scanned, allowing the flouresence intensities corresponding to each microarray probe to be determined. This process is repeated for each biological sample, until all the expression data for the complete series of biological samples has been acquired. The combined data set is then subjected to various forms of statistical and computational analysis, including unsupervised learning techniques such as hierarchical clustering, or supervised approaches such as support vector machines. The end result of this analysis is usually the identification of a reduced number of genes, selected from the initial global data set, that exhibit a particular behavior of interest, such as being differentially regulated across a particular class distinction (eg cancer vs normal tissue), or being transcriptionally induced upon a drug treatment. We will refer to this set of reduced genes as the ‘target set’. For many microarray experiments, the numbers of genes in the ‘target set’ will usually still be too large for every single target set member to be experimentally validated. The next crucial step, then, is to

select candidate members of the target set for further experimental validation. The decision of which candidate members to select for

Figure 1 : Schematic Workflow of a Typical Microarray Experiment (see Main Text for Details)

validation is far from a trivial exercise and can be dependent upon multiple factors, such as I) the magnitude of regulation, as genes exhibiting dramatic patterns of differential regulation are more likely to be ‘truly’ regulated compared to genes exhibiting subtle levels of regulation), II) the absolute expression level of the gene in the cell or tissue, as abundantly expressed genes are typically easier to measure than genes expressed at 1-2 mRNA transcripts per cell, III) the prior availability of detection reagents such as antibodies or PCR probes, and IV) scientific relevance to the hypothesis being tested. Sometimes, some of these requirements may run counter to one another. For example, many important regulatory genes such as transcription factors are not abundantly expressed, although such genes may be highly relevant to the biological question being addressed. Alternatively, while it may be more convenient to validate well characterized genes since they are more likely to have detection reagents available (criteria III), the characterization of novel transcripts such as

ESTs (expressed sequence tags) may ultimately provide a greater overall contribution to the field (criteria IV). Thus, in selecting the candidate genes from the target set for validation, the researchers will have to carefully consider all these variables. Once the candidate genes are chosen, the validation process can then be performed. We will now proceed to describe a few popular validation methods, such as in silico validation, PCR-based validation, and antibody based validation techniques. It should be noted that for the purpose of this chapter, we have assumed that the validations will be performed in the same laboratory that generated the original microarray data, and that these validations are usually performed on the same or similar biological samples (a ‘biological replicate’) that were used to generate the original data. However, in addition to these techniques, it is also important to appreciate that in the long run, the ultimate test of a result’s validity will lie in its ability to be reproduced by researchers working in other laboratories using different analytical or experimental approaches, on independently generated biological samples. Such ‘cross-centre’ validations are not discussed here. 3.

Validation in silico

The validation of microarray data using in silico techniques is probably the easiest and most cost effective method currently available. By querying the scientific literature through public databases such as PUBMED, the researcher is able to ask the following question : of the many genes found to be regulated in my microarray experiment (the target set), how many of these genes have been previously shown by other researchers to be regulated under similar or identical experimental conditions? This can prove to be a very powerful approach, due to the huge and ever increasing availability of scientific reports describing how different genes are regulated. In addition to scientific literature databases such as PUBMED, other common databases that can be accessed for this purpose include ENTREZ, UNIGENE, and LOCUSLINK, which are all available at the NCBI (National Center for Biotechnology and Information) in the USA. It is now even possible to incorporate knowledge from various externally generated microarray data sets

through the internet., through public repository expression databases such as SAGEmap (for SAGE (serial analysis of gene expression) data) and Arrayexpress, or in many cases from the authors’ web-sites. (eg the Stanford Microarray Database). A major advantage of in silico validation is its cost-effectiveness and speed, as automated searching scripts can be employed (technically) search every single regulated gene in one’s data set against these other data bases. An example of in silico validation is shown in Figure 2. In this project, we sought to compare the genomic content of the Gram negative pathogen B. pseudomallei, to a related species, B. thailandensis. The technique used was array-based comparative genomic hybridization (array-CGH), employing whole-genome DNA microarrays containing probes for every single predicted gene in the Burkholderia pseudomallei genome (Figure 2a). This technique is highly similar to conventional mRNA expression profiling, except genomic DNA is hybridized to the microarray. To assess the validity of this approach, we first asked if the microarray could identify genes that had already been shown in the literature to be differentially present in these two species. For example, a paper from Reckseidler et al (2001) (3), using Southern Blot hybridizations, had shown that a contiguous cluster of genes related to the production of Type I oligo-polysaccharides was present in B. pseudomallei but not in B. thailandensis. After performing the genomic hybridizations and acquiring the microarray data, we analyzed the hybridization flouresence ratios of the microarray probes and correlated the microarray data to the published literature. We found that of the eleven genes identified as absent in B. thailandensis in the Reckseidler study, nine of these genes were also identified as ‘deleted’ in the microarray data, while the remaining two genes were not represented by probes on the microarray (Figure 2b) Thus, this experiment supported the validity of using the microarray-based approach for this particular comparative genomic study, as genes shown by other techniques to be absent from B. thailandensis (in comparison to B. pseudomallei) were also predicted to be absent by the microarray as well. Interestingly, in addition to these eleven genes, an additional seven genes within this cluster were also found to be deleted by the microarray, consistent with

the hypothesis that the entire genomic region involved in Type I OPS production is deleted in B. thailandensis., A)

B)

Figure 2 : Example of in silico validation. A) Schematic of the array-based comparative genomic hybridization procedure (array-CGH). (B) Comparison of predictions by different methods. The left column lists eleven genes predicted by Reckseidler et al (2001) to be absent in B. thailadensis, while the right column depicts the microarray data. Genes colored in blue are commonly predicted by both methods to be deleted in B. thailandensis. Red arrows depict genes predicted to be deleted by Reckseidler et al (2001) but not the microarray data. Note that the microarray does not contain array probes for these genes. A further seven genes were additionally predicted to be deleted by the microarray data.

As can be seen from this simple example, in silico techniques can be used in a very powerful manner to validate microarray data. However, despite its ease, rapidity, and cost-effectiveness, it should not be forgotten that there are also some disadvantages associated with this technique. For example, because this technique relies heavily on the published literature, it is necessarily dependent upon the existence of prior knowledge. Because of this, the findings validated by this approach are usually not novel. Another concern is that this approach assumes that the prior knowledge in the literature is ‘true’ and can be used as a gold standard. Hence, one must have confidence in the results of other researchers. This is usually not a problem when there are multiple reports in the literature from different centers that commonly agree on how a particular gene is regulated. The situation is more tricky, however, when there is only report, from an obscure journal, that describes a particular finding – especially when the reported regulation is inconsistent with one’s own observations! Because of these factors, validation by in silico techniques are usually performed very early on in the microarray experimental workflow, either as a preliminary analysis or to provide an assessment of initial data quality of the expression data set. Furthermore, the genes that are best used in this type of validation exercise are typically those that have been well-studied, and whose regulation is a matter of general consensus in the field. 4.

Validation Using PCR

4.1

Overview of Basic PCR

Several of the most widely employed validation methods for microarray data are techniques utilizing some variation of the polymerase chain reaction (PCR). As we will describe below, PCR-based methods are highly versatile in that they can be configured to measure a wide diversity of DNA or RNA based targets. Another major characteristic of PCR is its exquisite sensitivity and specificity, which can allow a researcher to reliably detect and measure a specific molecular transcript amidst a highly complex mixture of related macromoleculaes. The overall robustness and reliability of PCR is also

sufficiently high such that these methods are rapidly becoming the ‘gold standard’ to which other techniques are bench-marked. There are numerous excellent resources in the literature that cover the intricacies and nuances of the PCR technique in a highly comprehensive manner, and readers who are interested in further understanding and exploring the PCR technique are referred to these references for more extensive discussions (1). Here, we provide a brief schematic description of how the PCR process is able to target and amplify a specific DNA sequence (Figure 3).

Figure 3 : The Basic Process of PCR. Repeated cycles of denaturation, annealing and extension lead to an exponential amplification of a single DNA template/target.

The process is broadly divided into three main steps : denaturation, annealing, and extension, which collectively form a ‘cycle’. Prior to the actual PCR process, oligonucleotide primers are designed and synthesized that are complementary to a specific DNA sequence uniquely found in the target gene of interest. These primers are added to the complex biological pool, which could be either genomic DNA, or a cDNA population generated from a source of mRNA. The mixture containing the original target and the exogenous primers is then heated to separate the DNA strands (denaturation), and a subsequent cooling step then allows the primers to hybridize specifically to the DNA target

(annealing). Since the initial concentration of the primers is usually in great excess compared to the original target (also known as the ‘template’), most of the denatured target strands will preferentially anneal to a primer rather than another complementary target strand. A thermo-stable enzyme (DNA polymerase) then extends the primer sequence to generate a complete and new double stranded DNA template (extension). This process of denaturation, annealing, and extension is repeated over multiple cycles, resulting in an exponential amplification in the target DNA. There are several variants of this basic PCR technique that can be used for the validation of microarray data. Here, we describe three different types : Qualitative PCR, Semi-quantitative PCR, and Quantitative PCR 4.2

Qualitative PCR

Qualitative PCR assays are best utilized in microarray validation experiments where one desires to test if a particular gene (either at the mRNA or DNA level) is either absent or present. That is, the nature of the difference is a binary one – either the gene is there, or it is not. It should not be used in scenarios where one wants to assess relative fold changes, as in such cases it is likely that the gene of interest will be expressed in both test populations, but at different levels of abundance. In scenarios where qualitative PCR is appropriate, this technique can be associated with several advantages. For example, since the difference being assayed can be assessed in black and white terms, it is often not required to normalize the test data against an external source, or if needed the normalization can be a fairly general one (eg the total amount of DNA or RNA used in the assay). Also, qualitative PCR can be performed using standard PCR reagents and does not require specialized equipment beyond a generic laboratory thermocycler (the instrument in which the PCR process is executed). An example of qualitative PCR validation is now described. Here, we have taken the array-CGH experiment described in Figure 2, where we compared the genomic content of two different species, B. pseudomallei and B. thailandensis. After analyzing the microarray data,

we identified a number of novel candidate genes that were predicted to be present in B. pseudomallei but not in B. thailandensis. In contrast to Figure 2, these genes were not previously identified in the literature as being differentially present between the two species, and thus it was not possible to validate these predictions by in silico methods. In Figure 4, we have used qualitative PCR and oligonucleotide primers designed against these candidate genes to test if these genes were present in multiple isolates of B. pseudomallei (P), B. thailandensis (T) and a related species, B. mallei. As can be seen from the Figure, it is clear that these genes are present in B. pseudomallei and B. mallei, but not in B .thailandensis. Thus, the qualitative PCR results have validated the microarray data.

Figure 4 : Validation of ORFs deleted in B. thailandensis by Qualitative PCR. Oligonucleotide primers were designed against two ORFs (3534302 and 3534002) and used in PCR reactions to amplify these ORFs in multiple isolates of B. pseudomallei (P), B. thailandensis (T), and B. mallei (M). Successful amplification of the ORF is represented by a white band, just under 1.5 kb.

4.3

Semi-Quantitative PCR

The semi-quantitative PCR technique is a simple yet effective methodology to validate microarray results based upon quantitative expression data. Unlike qualitative PCR, this technique can be used to compare the relative expression level of a gene across different biological samples. Because the gene to be measured is already expressed at some level in all the samples, it is often necessary to

incorporate an internal normalization control in the assay, so that the samples can be effectively compared. These normalization controls are usually ‘housekeeping’ genes, such as GADPH, β-actin, and 18S RNA, that share the common characteristics of high absolute expression and minimal biological regulation (ie the gene is expressed at comparable levels in all biological samples). The semi-quantitative PCR technique is conceptually similar to the qualitative PCR technique; accept that in the latter a fixed number of denaturing/annealing/extension cycles are usually performed before the final result is assayed. In semi-quantitative PCR, a variation of this strategy is used to compare the relative accumulation of a particular DNA target during the PCR procedure. A simple manner in which this can be achieved is to remove a fixed aliquot of the PCR reaction mixture at various cycle points (eg 10 cycles, 15 cycles, 20 cycles, etc) and analyzing the products on an agarose gel as usual. If a particular DNA or cDNA target is already present in higher concentration in a particular biological sample, then that sample will exhibit an earlier accumulation of the DNA target relative to the other samples. Because of its similarity to qualitative PCR, semi-quantitative PCR is also fairly inexpensive and can be performed using standard laboratory reagents. An example of semi-quantitative PCR is provided in Figure 5. Here, we have generated gene expression profiles of four related cellular populations in the immune system : monocytes, macrophages, immature dendritic cells, and mature dendritic cells. The purpose of this project was to identify genes that were differentially regulated across these four separate populations, as these populations share a common developmental lineage (Figure 5a). After analyzing the microarray data, we identified several candidate genes exhibiting such differential transcriptional regulation. One such gene was TARC, which was predicted to be present only in the dendritic cell population (both immature and mature dendritic cells), while another gene, RGS1, was predicted to be expressed only in monocytes and mature dendritic cells, but not in the other two cellular populations. To assess the reliability of these predictions, we tested the expression of TARC and RGS1 using semi-quantitative PCR and oligonucleotide primer pairs specific for

either the TARC or the RGS1 genes. As a normalization control, the βactin gene was employed.

A)

B)

Figure 5 : Example of Semi-quantitative PCR. (a) Developmental lineage of monocytes, macrophages, immature dendritic and mature dendritic cells. Under exposure to certain extracellular signals (cytokines), monocytes can be induced to develop along alternative developmental pathways to form macrophages or dendritic cells. (b) Validation of microarray data. Semi-quantitative PCR was performed for 20 amplification cycles. Each lane represents a different cellular population. Expression of TARC is observed only in dendritic cells (lanes 4 and 5), while expression of RGS1 is only observed in monocytes and mature dendritic cells (lanes 1 and 5). The equivalent amplification of the control βactin gene confirms that a comparable amount of cDNA was loaded into each lane.

As can be seen from Figure 5B, after 20 cycles of PCR, TARC was clearly seen to be present only in the immature and mature dendritic

cells, while RGS 1 was only present in monocytes and mature dendritic cells. Importantly, the equivalent amplification of the β-actin gene in all the lanes confirms that similar amounts of starting cDNA were used for each reaction. 4.4

Real-time Quantitative PCR

The Real Time Quantitative PCR (qPCR) procedure is an extremely powerful technique that is fast becoming the accepted ‘gold standard’ in the field, and the bench-mark to which other measurement techniques are being compared. Similar to the other PCR–based techniques described above, qPCR requires very small amounts of starting biological material (usually around a few nanograms of DNA or RNA), in contrast to typical microarray experiments that require 5-50 ug of total RNA. Other specific advantages to qPCR, are its speed (a typical reaction can be performed within 30-45 minutes), and the ability to multiplex multiple gene expression measurements in a single reaction, allowing for many genes to be simultaneously validated in a single reaction tube. In qPCR, the quantity of the amplification product is measured during every PCR cycle. Because of this, another common term used for qPCR is ‘real-time PCR’, and oftentimes the two labels (qPCR and realtime PCR) are used interchangeably. Most qPCR systems on the market rely upon the detection and quantification of a fluorescent reporter whose levels of fluorescence have been designed to vary in proportion to the amount of amplified DNA target. At present, there are two main methods for the quantitative detection of the amplified product: I. II.

Double-stranded DNA-binding agents (e.g., SYBR Green I ) Fluorescent probes based upon FRET (fluorescence resonance energy transfer) technology

4.4.1 Double-stranded DNA-binding Agents This group of small molecules possesses the common ability to bind double-stranded DNA and can be divided into two classes: intercalators and minor groove binders. Ethidium bromide is an example

of an intercalating agent and SYBR Green I is a minor groove binding dye. Regardless of their binding mechanism, the major requirements for a DNA binding dye to be used in qPCR are (i) increased fluorescence when bound to double-stranded DNA and (ii) no inhibition of the PCR reaction. In the case of SYBR Green I, fluorescence of this molecule is only observed when it is bound to double-stranded DNA. This special property of the SYBR Green I molecule allows the amplified product to be quantified during the extension phase of the PCR procedure, as it is during this part of the PCR cycle that double-stranded product is present (Figure 6). A side note is that in this system, the initial starting concentration of the DNA target, and indeed the biological sample as a whole, is assumed to be negligible compared to the final concentration of the amplified product.

Figure 6 : Activity of SYBR Green 1. During the denaturation and annealing process, the SYBR Green I molecular does not bind the single-stranded DNA and consequently does not fluoresce (left panel). Upon primer extension, the DNA target is now present as a double-stranded template, which now binds the SYBR Green I. This causes the bound SYBR Green I to fluoresce (middle panel, green symbols). After a new round of denaturation and annealing (right panel), the extension reaction occurs once again and SYBR Green I molecules bind and florescence, with the total amount of fluorescence being proportional to the concentration of the amplified product (lower right). This figure was obtained from http://www.sigmaaldrich.com/img/assets/6600/ sg_ls_mb_pcrxdiagram.gif

Using a DNA binding dye such as SYBR Green I for qPCR applications carries both advantages and disadvantages. A positive aspect of this approach is that these generic DNA binding dyes essentially allow a researcher to detect any double-stranded DNA template generated during PCR. As such, the same dye can be used to validate a large number of different genes, so long as the oligonucleotide primers used to amplify the template are sufficiently specific. Despite this versatility, however, this lack of specificity may render it difficult to discriminate between specific and non-specific products generated during the PCR process, raising the potential of obtaining a false-positive result. Another disadvantage is that as only one gene can be quantified in each reaction, that quantification of the gene of interest and the control normalization gene often need to be performed as separate reactions. For multiple genes to be quantified in a single reaction, more sophisticated techniques such as FRET probes need to be used (described next).

4.4.2 Fluorescence Resonance Energy Transfer (FRET) The phrase ‘Fluorescence resonance energy transfer’ (FRET) refers to the distance-dependent transfer of energy between two adjacent fluorophores without the emission of a photon. Over the years, a number of qPCR applications have been developed that exploit the FRET phenomena in assays where the flouresence intensity of one of the flourophores is dependent upon the amount of amplified product in the reaction. Although many variations of this technique are offered by different commercial vendors, the primary conditions for the FRETassay are: I.

II. III.

The two flurophores, referred to as the ‘Donor’ and ‘Acceptor’, must be in close physical proximity such that the flouresence of the ‘Acceptor’.is quenched by the ‘Donor’ For quenching to occur, the excitation spectrum of the Acceptor must overlap the fluorescence emission spectrum of the Donor The Donor and Acceptor transition dipole orientations must be approximately parallel.

A typical FRET qPCR procedure is illustrated in Figure 7. Before the assay begins, two oligonucleotides are created to specifically target the gene of interest. The first oligonucleotide is a conventional primer, similar to that used in the qualitative and semi-quantitative PCR assays. The second oligonucleotide, located 3’ or downstream to the first primer, is also sequence-specific to the gene of interest but carries two flourophores, the ‘Donor’ and the ‘Acceptor’ Because both flourophores are forced to lie in close proximity, being constrained by the length of the primer, the ‘Donor’ is able to exert a FRET to the ‘Acceptor’, causing the flouresence of the ‘Acceptor’ to be quenched.

Figure 7 : An example of FRET during the qPCR process. Two primers specific to the target gene of interest are created, where one primer contains two flourophores that are sufficiently close for FRET to occur, resulting in flouresence of the Acceptor (green flourophore) to be quenchd by the Donor (red flourophore) (top). After denaturation and annealing (middle), the DNA polymerase (Taq) extends the DNA template beginning from the upstream primer. Upon encountering the downstream primer, the enzyme cleaves the 5’ nucleotide, releasing the Acceptor flourophore (bottom). The Donor flourophore is now unable to quench the Acceptor, allowing the latter to flouresence and to be measured. Picture obtained from www.probes.com/handbook/ figures/0710.html

During the PCR annealing step, the two oligonucleotides hybridize, to adjacent regions of the target DNA, followed by extension

of the upstream (non-flourescent) primer by the DNA polymerase. However, because DNA polymerases also exhibit an intrinsic 5’ -> 3’ exonuclease activity, the extending polymerase, upon encountering the flouresent primer, cleaves the primer. This cleavage event frees the ‘Acceptor’ molecule from the oligonucleotide. Once freed, the Acceptor is able to exhibit flouresence when appropriately stimulated, and this new flouresence is then measured by a detector. Since the numbers of ‘Acceptor’ molecules, or ‘Reporters’ that are released upon each cycle is directly dependent upon the amount of available DNA template, this allows a very efficient method to quantify the amplification process as it occurs. The advantage of using sequence-specific fluorogenic FRET probes over generic DNA binding dyes is that a specific hybridization between the probe and the target sequence is required to generate the fluorescent signal. Thus, with fluorogenic probes, non-specific amplification due to mis-priming or primer-dimer artifacts do not generate spurious signals. Another advantage of sequence-specific fluorogenic probes is that they can be labeled with different, distinguishable reporter dyes. By using probes labeled with different reporters, amplification of two distinct sequences can be detected in a single PCR reaction. This, for example, would allow the gene of interest and the housekeeping normalization control gene to be quantified in the same reaction. The disadvantage of fluorogenic probes is that since different probes must be synthesized to detect different sequences, this can prove to be a costly endeavor, especially if a large number of genes need to be validated. An example of qPCR is now provided, using the generic DNAbinding dye approach (SYBR Green I). Here, the expression level of one particular gene, ESR1, also known as the Estrogen Receptor, was quantified in six primary human breast tumor specimens. After performing a reverse transcription reaction to generate cDNA mixtures for each tumor specimen, we performed qPCR reactions on each sample using primers to the ESR1 gene. (For normalization, qPCR reactions were also performed for the 18S rRNA gene as a ‘housekeeping’ control). A common term used in such studies is the ‘crossing point’, which refers to the point in the PCR process (usually in terms of cycles) at which a sample’s fluorescence exceeds the background noise level. The lower the crossing point, the more copies were present at the start of the amplification and the higher the crossing point, the fewer the copy

numbers of the gene of interest. As can be seen in Figure 8, certain tumors were found to possess a lower crossing point for the ESR1 gene than the other tumors. These results indicate that the ESR1 gene is more highly expressed in these tumors compared to the others. At this point, we also note that it is often useful to include a ‘calibrator’ in the qPCR process, which is essential for if data for one transcript is compared with another. However, describing how the calibrator is used would require a lengthy overview and is beyond the scope of this chapter (interested readers are encouraged to peruse the many excellent qPCR resources available in the literature).

Figure 8 : Quantitative PCR Using a Generic DNA-binding Dye (Sybr Green I). Six tumor samples were assayed for the expression level of the ESR1 gene, which encodes the estrogen receptor. Each line represents a single tumor sample. Three tumor samples exhibited an earlier ‘crossing point’ (see Main Text), as represented by a smaller cycle number. These samples expressed higher levels of the ESR1 gene compared to the other three samples. A ‘calibrator’ reaction (black arrow) is also often performed as an internal control to control for subtle differences in PCR efficiency. Use of the calibrator is not addressed in this chapter.

5.

Validation Using Protein Antibodies

We close this chapter by briefly describing how protein antibodies can be used for validation of microarray data. By binding and specifically detecting a particular protein of interest, protein antibodies

can be used to measure the protein levels of a gene identified by the microarray to be transcriptionally regulated. Although protein antibodies can function as a very powerful method of validation, this approach is usually chosen only when an antibody reagent to the gene of interest is already available, as unlike oligonucleotides, the creation of a specific protein antibody from scratch can be a very time-consuming and laborious affair. Nevertheless, it should be noted that the repertoire of antibodies that are currently available, either academically or through commercial vendors, is quite sizeable, making this technique a viable option for the validation of microarray data. Like PCR, protein antibodies can also be used for validation in multiple ways, such as in western blotting, flouresence activated cell sorting (FACS), and immunohistochemical applications. To comprehensively describe each of these applications is beyond the scope of this chapter – as such we have limited ourselves to a single case-study. An example of how protein antibodies can be used for validation in FACS analysis is now described, based upon the gene expression study of Figure 5, where four related cellular populations in the immune system (monocytes, macrophages, immature dendritic cells, and mature dendritic cells) were transcriptionally profiled. In this analysis, two genes, CD1a and CD16, were identified as being differentially regulated at the transcriptional level. The CD1a gene was predicted to be upregulated in immature dendritic cells, with a subsequent decline in expression in the mature dendritic cell population. Conversely, CD16 expression, based upon the microarray data, was present in monocytes, upregulated in macrophages, and downregulated in immature and mature dendritic cells Since both CD1a and CD16 were known cell-surface marker genes, whose protein products are readily accessible on the cell surface, we used antibodies to CD1a and CD16 in a series of FACS experiments to validate these microarray observations. In these experiments, a fluorescent antibody to either CD1a or CD16 was incubated with each of the distinct cellular populations, and the ability of the antibody to bind to the protein of interest was quantified by a series of FACS distribution charts (Figure 9). As can be seen from Figure 9, incubation of the different cell populations with a CD1a antibody confirmed the microarray data, as was the case for CD16. Thus, the protein antibody FACS analysis confirmed the microarray data.

Figure 9 : FACS distribution plots of CD1a and CD16 cell-surface protein expression in different cellular populations. The grey peak represents the background flouresence level, and expression of the protein-of-interest is depicted as a right-shifted peak. CD1a exhibits strong expression in the dendritic cell population, while CD16 exhibits strong expression in macrophages, and marginal expression in monocytes.

6.

Conclusions

As can be seen from this chapter, there are numerous experimental methodologies for validating microarray data. Although each technique is associated with its own unique advantages and disadvantages, a common theme of all these techniques is that they are best performed on single or a few genes or proteins at a time. Because of this limitation, it is often a challenge to decide, from the large numbers of candidate genes in a microarray target set, which genes are suitable selections for validations, and which ones are not. In the future, a major area of research will be to develop more high-throughput methods of performing such validation exercises, so that microarray validation can be carried out as a systematic exercise, rather than in the ad-hoc manner in which it is currently practised. Far from being a cursory exercise, the field of microarray validation promises to be an active area of exciting

research findings and novel applications breeding ground for the foreseeable future. 7. 1.

References Sambrook, J. and Russell, D.W. eds. Molecular Cloning – A Laboratory Manual vols 1-3. 3rd edition, 2001. Cold Spring Harbor Press.

2.

Iyer, V.R., Eisen, M.B., Ross, D.T., Schuler, G., Moore, T., Lee, J.C.F., Trent, J.M., Stuadt, L.M., Hudson, J., Boguski, M.S., Lashkari, D., Shalon, D., Botstein, D., and Brown, P.O (1999). The transcriptional program in response of human fibroblasts to serum. Science 283, 83-87

3.

Reckseidler, S. L., DeShazer, D., Sokol, P. A., and Woods, D. E. (2001). Detection of bacterial virulence genes by subtractive hybridization : Identification of capsular polysaccharide of Burkholderia pseudomallei as a major virulence determinant. Infect. Immun. 61: 34-44