This web page was produced as an assignment for an undergraduate course at Davidson College.
I found this article to be particularly interesting considering my interest in RNAi, asRNA, and the like. This paper successfully illucidated several misteries regarding a small but complex genome in Bacillus subtilis. Since I was more unfamiliar with this articles' methods, I am unsure of how well controlled the experiments were. However, I was frustrated by some of the images and their labeling. For example, I think Figure 2B was unnecessarily cluttered and could have been easily separated into a B and C. This was particularly noticeable for this image since the scales were unlabeled and tiny. I looked up what a bipartite graph typically is and I did not see the connection between the example images I saw and the images in the paper. I believe it was trying to show sigma binding sites and their conservation across several different promoters with the x axis being the lenth of the promoter, but I can only speculate without labels. Overall, however, I found this paper to be very effective at painting the B. subtilis genome in broad strokes and I would love to see more research in this area that zooms in on one asRNA site and how it works in several conditions. The scope of this paper is also particularly impressive with over a hundred conditions the bacteria were in and unsupervised algorithms that combed through a lot of data.
The paper starts by introducing us to the star organism Bacillus subtilis. It is a Gram positive bacterium found in the soil. The researchers inform us that they seek to classify and characterize this bacterium's transcriptome. The transcriptome in question contains small RNAs as well as antisense RNAs (asRNAs). They expose a prototrophic (same metabolic capabilities as wt, as opposed to auxotrophic) strain of the bacteria to as many conditions as they possibly can (104) to see how the conditions affect the transcriptome. These conditions include different nutrients, aerobic conditions, anaerobic conditions, fermentation, spore formation, biofilm formation, motility, and several stressors. From bacteria in these conditions, the researchers acquired 269 RNA samples which they hybridized to a microarray with a 22 base resolution. They found that only 4% of the previously annotated CDSs (coding sequences) were undetected in any condition. They also found that 85% of the CDSs were highly-expressed (top 30% most expressed in each condition) in one or more conditions, suggesting that many genes have a condition in which they are highly needed. They also found that roughly 3% of all the genes were highly expressed in every condition. The researchers concluded that these were genes involved in essential processes.
The researchers then marked places in the genome that had large changes in mRNA abundance which they labeled as either upshifts or downshifts. They claim to have used high-confidence mRNA 5' and 3' ends to find these but I was unsure of some of the methodology. To say that the supplemental material explaining some of their methods is dense would be a gross understatement. By looking at sequence information, the researchers did determine that most of the upshifts were "genuine promoters." Interestingly, almost half (46%) of all CDSs can be transcribed by more than one promoter. This lab added a lot to the knowledge of B. subtilis since their method increased the number of known CDSs by 11%.
The researchers also performed some experiments involving a Rho knockout B. subtilis. These data are shown in Figure 1. They concluded that Rho inhibits asRNA transcription since the Rho knockouts had expression levels that did not diminish as did wt signals for the same conditions. They also claim that many asRNAs are created via readthrough off of another promoter that either has no corresponding transcription terminator or only a partially effective one. Almost 13% of the CDS overlapped potential asRNA sites which suggests that many genes can be post-transcriptionally/pre-translationally regulated although I doubt this is happening as often as it could.
Lastly, the researchers characterized many of the upshifts/promoters in the B. subtilis genome. This data can be seen in Figure 2. They created a hierarchical tree for the promoters and grouped them based on sigma factor binding sites. They did all this independent of DNA sequence information using unsupervised algorithms. They found that roughly two thirds of variance in promoters can be attributed to sigma factor motifs. SigA clusters contributed the least variance and so must rely on other TFs (besides sigma). Interestingly, non-SigA promoters initiate almost half of all detected asRNAs. However, asRNAs, as expected, are not nearly as abundant as protein-coding RNAs. Roughly 80% of all asRNAs are the result of initiation by alternative sigma factors (not A) or imperfect termination. The researchers conclude that many asRNAs could be the result of "spurious" transcription events caused by non-SigA promoters.
A) This figure shows portions of the B. subtilis genome as annotated by GenBank. The next image down shows transcription profiles of each strand as determined by the tiled microarray data. The image below that shows transcriptional units such as promoters and terminators on both strands and directionality. Lastly, they show a new annotation based on transcription profiles including antisense RNA that silences the complimentary gene’s transcript.
B) This image shows three distinct genomic regions to demonstrate how initiation and termination affect transcription levels on both strands. It is helpful when looking at these images to remember that the expression level graphs are oriented in the same direction for both strands (up is more transcription) even while the RNA features are flipped for each strand.
C) This image shows that Rho-null mutants cannot terminate transcription properly. The mutants maintain high transcriptional activity even when activity would fade in wt.
D) This is a principal component analysis on three axes, which accounts for roughly 60% of variance in the population of bacteria. The paper will use this image later to determine the "shortest tour" of conditions. Interestingly, this lab decides to label the components these bacteria are being graphed on. Ordinarily, the components themselves are not important to principle component analysis since you are just looking for the components that explain the most amount of variation in order to seperate each dot the most.
This table groups and classifies RNA features. The left and right columns seperate the data into two arbitrary length distinctions (50-150 and >150. "n" is the number of newly discovered features in that category, "Pred. CDS" is the number of coding sequences predicted by algorithm in each category, and "Genes" is a sum of the number of asRNAs and CDSs for that feature type. The characterized features include antisense RNA which are classified as having 100 or more bp overlap with a CDS or 50% overlap if the CDS is smaller than 100bp. The features are divided into element type such as part of 5’ of previously annotated mRNA, part of 3’ UTR, after 3’ with no terminator, after 3’ with partial terminator, independent of other annotated mRNAs with terminator, independent with no terminator, in between two other genes with their own promoters, and part of polycistronic (one mRNA can encode for multiple polypeptides) mRNA from one promoter.
A) This is an image of hierarchical clustering of promoters with correlation coefficient on the x axis. Hierarchical clustering compares two things and makes a cluster. Then, this cluster is compared to a third thing and is placed based on its correlation to the cluster. This continues until everything has been compared to the original two components.
B) This is an image of color coordinated segregation of promoters into clusters or groups based on sigma binding site consensus motifs shown to the right. The sigma factor is a huge component of transcription initiation and it makes sense to cluster promoters based on which sigma factors bind and where. The sigma motifs are very small and hard to distinguish with no labels.
C) This image shows the activity of each promoter cluster in each supercluster with the x-axis being the shortest tour through conditions. Notice the similarity of most of the patterns with maybe the exception of D/W and parts of H/L. This tells us that promoters with the same sigma factor sites behave similarly.
D) This is a graphical representation of how much each promoter cluster contributes to the variance in promoters. The x-axis is a statistical measurement of variance explained versus variance unexplained for each cluster. Notice that the third supercluster down in C accounts for the largest chunk of variance
A) This is a graph showing the number of highl-expressed asRNAs in each condition along the shortest tour of conditions. Notice that most of them must result from failure of transcription termination since not many are from the independent and independent with no terminator condition.
B) This heat map shows that many asRNAs have constant expression levels across all conditions. However, some small sections of color change in some rows show that asRNA quantity may be important to survival in that condition.
Davidson College Biology Department