DNA Microarrays: An Alternative Approach to Identifying
Gene Expression Characteristics
The topics treated here are a continuation of my earlier discussion of two Saccharomyces cerevisiae genes, COX5b and YIL103W (if unfamiliar with these genes, the reader is referred to the earlier web page prior to proceeding any futher here). In that inquiry, I reviewed the knowledge gathered to date on COX5b and its moleluclar roles in S. cerevisiae, or yeast. Since YIL103W is an unannotated gene (that is, an ORF hypothesized to have a function), I also proposed a role for YIL103W based upon data collected from online computational resources. Thus, a principal aim of the present discussion is to evaluate my proposed role for YIL103W via the use of DNA microarrays. The second aim is to compare the microarray expression profiles for COX5b to its accepted roles.
Why DNA microarrays?
Any investigator who rightly calls himself a scientist must always defend his rationale agaisnt any objections. It is not that I consider myself an investigator, but the nature of the present discussion is as equally bound by the demands of the justification as are the conjectures of an investigator. In this spirit, then, let us consider briefly why one would use microarray technology to observe gene expression behaviour.
Biologists would be irrevocably hindered in their study of genes without a technique to detect expression or gene products. Some examples of techniques that generate data from which conclusion can be infered about gene expression include: Nothern blots, in situ hybridization, indirect immunofluorescence, and RNase protection, to name a few. DNA microarrys are a recent addition to the repository of experimental techniques available to biologists. Microarrays, also called GeneChips, rely on the same property of nucleic acids as the techniques mentioned above - hybridization. Oligonucleotide probes for each gene of interest are covalently bound to a glass slide. Each bound DNA probe is refered to as spot or feature. The range of spots that can be added to a single slide is from several to tens of thousands, i.e., a probe for every gene in a genome. A slide is then incubated in a solution containing the cDNA transcriptome of a given organism. The transcriptome is all of the mRNAs made by a single genome. cDNA strands are allowed to hybridize to its complementary probe on the glass slide. Since the cDNAs are synthesized with fluorescent precurors, visualiztion of DNA bound to spots is achieved through epifluorescence and automated photoimaging. Microarrays, then, faciliate the study of large sets of genes, even whole genomes, in a single assay. The true power of microarrays, however, lies in their ability to detect alterations in expression patterns over time and in response to environmental conditions.
As a consequence of microarrays, many biologist are now testing a new conception of the genome as a dynamic entitiy. It follows from this dynamic view of the genome that the level of expression for genes are likely to be vary temporally. More importantly, it is reasonable to hypothesize that the quantities and intensities of gene expression would also enhance an understanding of a gene's function. Particularlly, in the case of COX5b, questions of interest might be as follows. Is the gene always on? If so, to what degree? It is important to note here that as a anoxic gene, expression of COX5b is already known to respond differentially to oxygen saturation. That said, which environmental shifts other than anoxia would trigger alterations in the expression of COX5b? DNA microarrays enable us to answer these questions.
With regard to the unannotated gene, YIL103W, the power of DNA microarrays becomes its ability to generate hypotheses about unknown genes. As mentioned earlier, I shall use microarray data to evaluate my proposed role for YIL103W. Raw microarray data are reported in the form of expression ratios. That is, the quantity of cDNA bound from experimental cells divided by the quantity of cDNA bound from control cells. A ratio greater than 1.0 indicates an induction in expression, and conversely, a fraction between 0 and 1.0 signifies a repression in gene activity. Investigators observe expression activity over a given period of time and among several treatments. Since microarray slides have thousands of spots, investigators are able to detect the expression activity of thousands of genes in response to enivronmental manipulations. Thus it is possible to group genes based upon similar expression patterns. An important distinction must be noted presently. Genes grouped or clusterd on the basis of micorarry data share only expression characteristics. Although tempting, it is unreasonable to conclude that clustered genes also share sequence identity. Isoforms of a gene, for example, can exhibit high sequence homology but differ dramatically in their expression patterns.
Picking up the thread of discussion, the principle of guilt by association holds that genes that share expression patterns also have similar functions. The logic undergirding guilt by association could be construed in the following manner. Genes required for the uncoiling of the DNA double helix are likely to be induced and repressed in concert, before and after S phase of the cell cycle. Similarly, if YIL103W were to cluster or show high corrrelation with genes of known function, the presumably YIL103W is likely to have as similar function. In what follows, I shall report the expression pattern for YIL103W found in yeast cells under varying environmental manipulations.
My earlier proposition was this: "YIL103W is, at leat, a diphthamide related protein, or may share some of its functional characteristics but have a distinct cellular localization. Upon a considerationo of the CD (conserved domain) data alone, this implication would certainly appear to be true. Although the the BLAST2Seq findings reinforce potential functional similarity between 103p and diphthamide, the low percent identity values, 23% and 21%, with diphthamide proteins precludes the possiblity that 103p and Dph2 are the same protein or even isoforms of the same family." The reader is directed back to this page for CD and BLAST2Seq data. In addition, 103p was the name ascribed to YIL103W hypothetical protein for the purposes of the discussion.
Now let's look at what insights micorarry data may afford us.
Figure 1. Expression patterns for YIL103W and clustered genes in response to enivronmental changes. Genes included above are only those for which their Pearson correlation coefficient was greater than 0.8, with respect to YIL103W. Green signifies a fold repression, and red signifies a fold induction. Gray regions indicate lost or undetectable data. Each column represents a time point at which RNA was harvested from yeast cells. The color scale is provided below. Data obtained from Gash, et.al. 2001 (SGD database, 2003; <http://db.yeastgenome.org/cgi-bin/SGD/expression/expressionConnection.pl>).
As shown in Figure 1, the expression patterns of 20 yeast genes were found to correlate with YIL103W in several experiments. From the first bar in Figure 1, we see that the expression of YIL103W undergoes its most dramatic change during heat shock (21°C to 37°C) and amino acid starvation, -2.5 and -3.0 fold repressions, respectively. All of the similar genes below YIL103W are also strongly repressed in respose to heat shock and amino acid starvation.
In examinig each environmental manipulation more closely, it is of interest to note that it is only at the first time point of the amino acid starvation trials where expression of any gene in Figure 1 is repressed. At subsequent time points, no change in transcriptional activity was detected between control and treatment cells. Gash and colleagues (2001) report that in the amino acid stravation trial, experimental yeast cells were grown on complete minimal media, collected, and then allowed to grow on minimal media that lacked free amino acids but supplied glucose and uracil. A consideration of the amino acid starvation data reveals two possible interpretations. The first may formulated in the follwing manner:
(1). When starved of free amino acids, yeast cells repress their translational machinery. Protein synthesis is a costly reaction, requiring the energy input - in the form of the hydrolysis of GTP.
(2). Consequently, it follows that genes repressed immediately upon a return to glucose/uracil minimal media are likely to be involved in translation, inasmuch as translation was supressed in starved cells due to a lack of amino acids and that translational genes were not needed.
(3). At later time points, expression of these proposed trnaslational genes returned to their constituative levels (mirrored in control cells), accounting for the 1:1 ratio at later time points.
The second or alternative interpretation simply constends that the genes shown to be repressed following amino acid starvation are unrelated to protein translation.
To falsify one of the two above interpretations, it is necessary to turn to the genes which have a know function. In Figure 1, the name of any annotated gene is listed adjacent to the ORF name. Anyone other than a yeast biologist is unlikely to be able to state the function of the 20 genes presented in Figure 1. The reason for which we are interested in the function of these known genes is so that we can determine if any of them are related or participate in the same pathway. As discussed earlier, the logic of guilt by association leads to us to make this propostion: if YIL103W clusters with a group of annotated genes, known to share a similiar function, then YIL103W presumably also has the former shared function.
A close examination of the known genes in Figure 1 reveals that a number of them do in fact share a similar molecular role. Specifically, I will draw attention to any known genes that fall under Gene Ontology groupings. Below are 5 genes from Figure 1 with correlated expression patterns to YIL103W and their GO characterizations (molecular function; biological process; cellular component).
DIP2 (snoRNA binding; processing of 20s pre-rRNA; small nucleolar ribonucleoprotein complex)
IMP4 (rRNA binding; processing of 20s pre-rRNA; small nucleolar ribonucleoprotein complex)
RLP7 (rRNA binding; processing of 27s pre-rRNA; small nucleolar ribonucleoprotein complex)
RPF1 (rRNA binding; processing of 27s pre-rRNA; nucleolous)
UTP6 (snoRNA binding; processing of 20s pre-rRNA; small nucleolar ribonucleoprotein complex)
When considered in sum, all five genes listed above share roles in the assembly and proccessing of ribosomal-RNA (rRNA) in yeast. In eukaryotes the various ribosomal protein and nucleic acid molecules participate in translation. The ribosome itself is composed over several subunits, namely 20s and 27s which are mention above. Expression profile data (Gash, et.al. 2001) demonstrated that these 5 genes (DIP2, IMP4, RLP7, RPF1, and UTP6) and our hypothetical gene, YIL103W, exhibited similar transcriptional activity in response to amino acid stravation.
Given our two earlier interpretations of the expression data and the known functions of the 5 ribosomal genes, it is reasoable to propose that YIL103W has a role in rRNA transcript binding and assembly in yeast cells. Although it difficult to make definitive statements, the probabiliy of the second interpretation being true becomes small when one considers the fact that YIL103W clustered with five known ribosomal genes over the coure of multiple experimental manipulations: two heat shock trials, osmotic-stress, amino acid starvation, and nitrogen depletion. The inference drawn here stands in contrast to the previous proposal made on the basis of computational algorithims. At that time, I proposed YIL103W was diphthamide related protein. Basic research techniques are now required to discriminate between the two hypotheses offered in my two discussions.