This web page was produced as an assignment for an undergraduate course at Davidson College.

My Favorite Yeast Genes:

Topoisomerase 1 and YOL008W

This web page follows two Saccharymyces cerevisiae genes, TOP1 and YOL008W. The two genes are located within close proximity to each other on chromosome XV. TOP1 is an annotated gene with a known gene product and function, so I will use information from web resources to elucidate the role of TOP1. YOL008W is a hypothetical open reading frame (ORF), or a non-annotated gene. Very little is known about YOL008W other than its sequence information, so I will use web tools in an attempt to characterize this unknown gene and its cellular role in yeast.

Annotated Yeast Gene: TOP1/YOL006C

The gene TOP1, or YOL006C, in Saccharomyces cerevisiae encodes topoisomerase 1. Topoisomerases are enzymes that change the topology of a DNA strand by cleaving the strand and rejoining it at a different location. The activity of topoisomerases play roles in replication, transcription, recombination, and chromosomal condensation. In S. cerevisiae, TOP1 is a type IB topoisomerase, meaning that it cleaves single-stranded DNA rather than double-stranded (the type I part) and that it relaxes both positively and negatively supercoiled DNA (the type B part). Type II topoisomerases cleave double-stranded DNA, while type IA topoisomerases only relax negatively supercoiled single-stranded DNA. In order to relax supercoiled DNA, the Top1 protein forms a DNA-enzyme complex and cleaves the single-stranded DNA so that there is a covalent linkage with the 3’ end of the cleaved strand. (SGD, 2003l; http://db.yeastgenome.org/cgi-bin/SGD/locus.pl?locus=Top1). See Figure 1 for an example of topoisomerase activity.

Figure 1. Image of E. coli Topoisomerase I. Top1 functions to break a single strand of DNA and re-attach it in order to relieve supercoiling. Image from <http://www.bio.miami.edu/dana/250/25002_4.html>.

In Gene Ontology terminology, the molecular function of TOP1 is “DNA topoisomerase type I activity,” which is basically the activity described above. The biological processes for TOP1 include “regulation of mitotic recombination, DNA topological change, DNA strand elongation, chromatic assmebly, dissasembly, regulation of transcription from Pol II promoter, RNA elongation from Pol II promoter, mitotic chromosome condensation, and nuclear migration.” It seems clear that TOP1 plays many important roles in the life of a yeast cell. The cellular component for TOP1 is listed as the nucleus; the Top1 protein must be located within the nucleus in order to perform all of its roles mentioned previously (SGD, http://db.yeastgenome.org/cgi-bin/SGD/GO/goAnnotation.pl?locus=TOP1).

TOP1 is located on chromosome XV of Saccharomyces cerevisiae from coordinates 315387 to 313078 on the Crick strand. See Figure 2 for the chromosomal location (Note: Figure 1 also shows the location of YOL008W; see below for information on this ORF). Top1 encodes a 769 amino acid protein that has a molecular weight of 89,995 Da. The protein is a monomer that does not have any transmembrane domains. (SGD, 2003; http://db.yeastgenome.org/cgi-bin/SGD/protein/protein?sgdid=S0005366). See Figure 3 for the structure of the N-terminus domain of Top1.

 

Figure 2. Chromosomal location of TOP1 and YOL008. The blue band labeled TOP1 depicts the ORF for TOP1. It is located on the Crick strand. YOL008W (colored in red, above and to the left of TOP1) is on the Watson strand. See below for more information on YOL008W. (SGD, http://db.yeastgenome.org/cgi-bin/SGD/ORFMAP/ORFmap?sgdid=S0005366).

 

Figure 3. Structure of the N-terminal fragment of Yeast Topoisomerase 1. Notice the alpha helices in this region of the protein (PDB, Entry 10IS, http://www.rcsb.org/pdb/cgi/explore.cgi?pid=198301065553458&page=0&pdbId=1OIS).

To view a chime image from PDB for Top1, click here.

To view the protein sequence for Top1, click here.

To view the nucleotide sequence for TOP1, click here.

Even though TOP1 plays a role in many important biological functions, yeast are viable without a functional copy of TOP1, so TOP1 is not critical for yeast survival. Systematic deletions and null mutations in TOP1 have resulted in viable yeast cells. One deletion in TOP1 has been linked to a mild phenotype change; yeast with a mutation show sensitivity when grown on a medium of a pH of 8.0 after five generations (SGD, http://db.yeastgenome.org/cgi-bin/SGD/phenotype/phenotype.pl?feat=TOP1&type=locus). Also, mutations at 4710 different loci have all resulted in viable yeast (SGD, http://db.yeastgenome.org/cgi-bin/SGD/phenotype/phenotype.pl?phenotype=829). Since yeast are still viable with mutations in TOP1, it is most likely that TOP2 can substitute for TOP1 in yeast during replication and transcription (OMIM, http://www3.ncbi.nlm.nih.gov/htbin-post/Omim/dispmim?126420).

Topoisomerases are highly conserved; the yeast TOP1 protein has been found to have 57% sequence similarity to human Top1 (SGD, http://db.yeastgenome.org/cgi-bin/SGD/locus.pl?locus=Top1). The map locus for human Top1 is 20q12-q13.1. Several mutations in human Top1 have been linked to camptothecin (CPT) resistence. CPT is an alkaloid from plants that has been shown to have antitumor activity (OMIM, 2003; http://www3.ncbi.nlm.nih.gov/htbin-post/Omim/dispmim?126420#ALLELIC%20VARIANTS). I performed a Blastp search to view the sequence similarity for Top1 with other proteins. I found that there were many hits with very low E-values (over 40 hits with E-values between e-172 to 1e-09) in many different organisms. Two of the top hits were for Top1 in Candida Albicans and Emericella nidulans. Figure 4 shows the homology for the Top1 protein.

Figure 4. Blastp results for yeast Top1 protein. There is a high degree of similarity between other Top1 proteins; Top1 proteins are highly conserved. Enter accession number NP_014637 into Blastp to view sequence similarities.

For more information about the structure of homologs to Top1, click here. This site contains the identities, similarities, sequence alignment, and PDB structure files for the homologs.

For links to scientific publications on TOP1, click here. Links to publications about TOP1 structure, function, and activity are included.

For a link to OMIM on human Top1, click here.

______________________________________________________________________________

Non-Annotated Yeast Gene: YOL008W

YOL008W is a hypothetical ORF that is located in close proximity to TOP1. See Figure 2 for the location of YOL008W. It is important to note that while it is very close to TOP1, it is located on the opposite strand. YOL008W is on the S. cerevisiae chromosome XV and has coordinates of 310312 to 310935. The molecular function, biological process, and cellular component are all unknown for this gene (SGD, 2003; http://db.yeastgenome.org/cgi-bin/SGD/locus.pl?locus=YOL008W). A large scale deletion study for YOL008W done by Giaever et al. (2002) resulted in viable yeast, so no phenotipic effects of the gene were observed. Another large scale deltion study conducted by Steinmetz et al. (2003) resulted in a growth defect for the yeast on a carbon source (SGD, 2003; http://db.yeastgenome.org/cgi-bin/SGD/phenotype/phenotype.pl?feat=YOL008W&type=feature). Very little is known about this gene, so I will attempt to characterize YOL008W from its sequence using genomic databases.

The nucleotide and amino acid sequence (respectively) for the unknown ORF are shown below (NCBI, 2003; http://www.ncbi.nlm.nih.gov/cgi-bin/Entrez/getff?gi=43&form=3&db=g&from=310312&to=310935&pid=6324566). The protein encoded by this gene is predicted to contain 207 amino acids and have a molecular weight of 23, 765 Da (SGD, 2003; http://db.yeastgenome.org/cgi-bin/SGD/protein/protein?sgdid=S0005368). To view more information on the protein, click here.

YOL008W Nucleotide Sequence:
> gi|27808715:310312-310935
ATGGTTTTGATAATAAGGCCCTCACAGACATTGATATTATTCAGGAAAGCGATGCTCAAGCCAATTGGGA
GATATCCTCTTAAAAGAAATTTTTTTGGTTTGAGCGGTACCAATCACACTATTAGGGAACAGCGATATGT
TTTGCGCAAGGCCATAAACGCCCCTCCAAGCACAGTCTACGCTGCAGTGTCAGAAGTTGCCCAATATAAG
GAATTTATTCCTTATTGTGTTGATTCGTTTGTAGATAAACGAAATCCTGTGGATAACAAGCCTCTCATTG
CGGGGCTTCGAGTTGGTTTCAAACAATACGATGAGGAATTTATATGCAATGTTACCTGTAAAGATACTGA
TCATACGTATACCGTTGTTGCAGAAACAATATCTCATAATTTGTTTCACCTTTTGATTTCGAAATGGACC
ATAATGCCTCACCCAAATAGACCAAATGCGGCCATGGTAGAACTTCTATTAAGATTTAAATTCAAATCTC
GGATATATAACAGTGTCTCTCTAATATTTGCGAAAACTGTGACTGAATTGGTGATGAACGCATTTGCCAA
AAGAGCATACCATTTAGTAAGATTAGCAATGCTAAAACCTTCTTCTAAAGAAGGCTCTCCGTGA

YOL008W Amino Acid Sequence:
> gi|6324566|ref|NP_014635.1| Hypothetical ORF; Yol008wp [Saccharomyces cerevisiae]
MVLIIRPSQTLILFRKAMLKPIGRYPLKRNFFGLSGTNHTIREQRYVLRKAINAPPSTVYAAVSEVAQYK
EFIPYCVDSFVDKRNPVDNKPLIAGLRVGFKQYDEEFICNVTCKDTDHTYTVVAETISHNLFHLLISKWT
IMPHPNRPNAAMVELLLRFKFKSRIYNSVSLIFAKTVTELVMNAFAKRAYHLVRLAMLKPSSKEGSP

I used Blastn to search for similarity with other nucleotide sequences. I did find some hits, but none of the alignments seemed significant; all matches were between 40 and 50 nucleotides. The E-values for the hits were relatively high (0.29 to 4.6), so I was not convinced that any of the hits showed significant similarity. Enter accession number Z74750 into Blastn to view the similarities for the nucleotide sequence.

A Blastp search with the amino acid sequence (accession number NP_014635.1) for the ORF indicates that the protein shares a lot of similarity with many other proteins (there are many hits with low E-values), but the problem is that most of the hits are for unknown sequences as well. Hits appear for many different organisms including Arabidopsis, Homo sapiens, Drosophila, and Mus musculus. See Figure 5 for the Blastp results. The fourth hit is for a Oligoketide cyclase/lipid transport protein in Magnetospirillum magnetotacticum (a type of bacteria).

Figure 5. Blastp seach for YOL008W (accession number NP_014635.1). Many hits appear with low E-values, but many of the results are hypothetical proteins.

From the Blastp search, I was able to find a link to conserved domains for the protein. YOL008W shows sequence similarity with to Oligoketide cyclase/lipid transport proteins (COG2867) and to Aromatic-Rich Protein Family (ARPF). The conserved domain with COG2867 is 146 residues long with 100% alignment. This seems to be significant alignment, so perhaps it is possible that YOL008W is involved in cyclase or lipid transport. The conserved domain for ARPF is 143 residues long with 88.1% alignment. This also seems to be of significant similarity, so perhaps the protein has a role in polyketide synthase. See Figure 6 for an image of the conserved domains. To learn more about sequence similarity for YOL008W click here.

Figure 6. Conserved domains with similarity to YOL008W. Oligoketide cyclase/lipid transport proteins (COG2867) and Aromatic-Rich Protein Family (ARPF) have significant similarity with YOL008W. Perhaps YOL008W has functions similar to members of these families (NCBI Conserved Domain Search, 2003; http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?RID=1065574421-6221-180796.BLASTQ3).

The Kyte-Doolittle Hydropathy plot can be used to predict the transmembrane domains in a protein. No portions of the amino acid sequence had a hydropathy score of over 1.8, so it does not seem likely that the protein is an integral membrane protein. It seems that the protein functions in the cytoplasm or the nucleus, rather than being a transport or receptor protein in the membrane. See Figure 7 for the results from the Kyte-Doolittle Hydropathy Plot.

Figure 7. YOL008W does not have any domains with a hydropahy greater than 1.8. The window size for the Kyte-Doolittle Plot was set to 9. The protein does not appear to contain a transmembrane domain (Kyte-Doolittle Hydropathy Plot; http://occawlonline.pearsoned.com/bookbind/pubbooks/bc_mcampbell_genomics_1/medialib/activities/kd/kyte-doolittle.htm).

By using PREDATOR, I was able to predict the secondary structure for YOL008W. It seems that there are very few beta sheets in the structure of the protein; most its structure is composed of alpha helices and random coils. See Figure 8 for the structure prediction from PREDATOR. Also, I tried searched for a chime file in PDB, but found no results.

Figure 8. YOL008W is predicted to contain 41.55% alpha helices and 49.76% raondom coils (PREDATOR, http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_preda.html).

By using web tools to extract data from the sequence information for YOL008W, I was able to learn a little more about the gene, althought I can still cannot determine with strong conviction what function the protein has. The protein for YOL008W is composed mainly of alpha helices and random coils and is most likely a protein that functions in the cytoplasm or nucleus, rather than in the membrane. Because deletions in the gene resulted in viable yeast, YOL008W does not appear to be critical for yeast survival. The protein has high sequence similarity with the conserved domains for Oligoketide cyclase/lipid transport proteins and the Aromatic-rich protein family. From the conserved domain database, I looked for more information on each of these families. The ARPF family is mainly composed of proteins with unknown functions, although a hypothesis is that they function in polyketide synthesis. Click here for a brief description of the ARPF family. I searched Gene Ontology to find out more about a polyketide synthetase. I was unable to find much information, but I think that it may function in the cytosol. This is consistent with YOL008W not being an integral membrane protein. Oligoketide cyclase/lipid transport proteins are involved in lipid metabolism. From this information, the best prediction that I can make regarding the role of YOL008W is that it functions in the cytoplasm and has a role in lipid metabolism and/or polyketide synthesis. Although I was unable to determine the exact function of YOL008W, the online genomic tools have proved very useful for finding out more about an unknown protein when the only known information is the sequence.

Works Cited

In-Born Errors of Metabolism (Image of topoisomerase). <http://www.bio.miami.edu/dana/250/25002_4.html>. <Accessed 2003 October 8.

NCBI. 2003. Blast. <http://www.ncbi.nlm.nih.gov/BLAST/>. Accessed 2003 October 7.

[NPSA] Network Protein Sequence Analysis. 2003. Predator Secondary Structure Prediction Method. <http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_preda.html>. Accessed 2003 October 7.

[SGD] Saccharomyces Genome Database. 2003. Geno Ontology: Annotations. <http://db.yeastgenome.org/cgi-bin/SGD/GO/goAnnotation.pl?locus=TOP1>. Accessed 2003 October 6.

[SGD] Saccharomyces Genome Database. 2003. TOP1/YOL006C. <http://db.yeastgenome.org/cgi-bin/SGD/locus.pl?locus=Top1>. Accessed 2003 October 6.

 

 

Back to Sarah's Homepage for Genomics, Proteomics and Bioinformatics.

______________________________________________________________________________

Davidson College

Davidson College Biology Department

Davidson College Genomics Homepage

Sarah's Molecular Biology Homepage

Please send questions and comments to Sarah Baxter at sabaxter@davidson.edu. Thanks!