My investigation of isocitrate dehydrogenase (IDH) and caspase 3 involved a few publicly available software packages (BLAST, ScanPROSITE, PHI-BLAST, PSI-BLAST, HMMER) and one program that I wrote myself, Divide-and-BLAST. Most of these tools require amino acid sequences for the proteins being investigated, and I obtained these sequences from the protein database at the National Center for Biotechnology Information, using the Entrez search tool. In order to enable future researchers to better access these tools to investigate other proteins, I have compiled a set of directions that one can follow to use these tools for the purpose of examining the evolution of proteins.
The input sequences used for IDH1 (cytosolic IDH, NADP+ dependent) were from the following organisms: Homo sapiens (human - gi|6647551), Mus musculus (mouse - gi|6647554), Saccharomyces cerevisiae (yeast - gi|1708403), Arabidopsis thaliana (gi|4585978) and Escherichia coli (gi|124171). For the caspase 3 study, I used the protein sequence from Xenopus laevis (African clawed frog - gi|2493528, Gallus gallus (chicken - gi|3450875), Rattus norvegicus (Norwegian rat - gi|1004371), Homo sapiens (gi|4757912) and Mus musculus (gi|4757912).
I used BLAST with IDH and caspase 3 to get an initial idea of any homologous proteins in other organisms, as well as other related protein families.
I ran the ScanPROSITE program with IDH and caspase 3 as input sequences to find any conserved patterns. The output of PROSITE includes a "signature" for the patterns found, and I used these patterns to initiate PHI-BLAST searches, as explained below.
The pattern that PHI-BLAST uses as input needs to be in PROSITE format. Therefore, I used ScanPROSITE to find the correct patterns to initiate the PHI-BLAST search for the proteins I was investigating.
I used PSI-BLAST to look for regions of my protein sequence that were in common with a certain kind or family of protein. For example, in the case of IDH, I picked out matches with other dehydrogenase proteins, and ran the second iteration of PSI-BLAST. Thus, I was able to direct my similarity searches in a given direction, but it was necessary to be careful. Over-representation of a spurious match or matches -- based on chance, not true homology -- in an iteration will yield more matches that are not truly related to the original protein query sequence. In general, a good rule of thumb is to choose several known closely homologous matches to reduce the effect of one or more false hits in the next iteration.
CLUSTALW is a public domain multiple sequence alignment program (Thompson et al., 1994). It generates an alignment file that can be fed into HMMER. HMMER is a suite of programs that uses Hidden Markov Models (HMMs) (Durbin et al., 1998). One of the programs in HMMER called hmmemit can be used to generate a consensus sequence for a given alignment of input sequences.
First I used CLUSTALW to generate an alignment file for my input sequences (5 for IDH1, 5 for caspase 3). Then I ran hmmemit using the alignment file as input to generate a consensus sequence. The consensus sequence was then analyzed with Divide-and-BLAST (see below).
In this case DAB was run using the default parameters, i.e. length of sub-sequences equal to 20 amino acids, overlap of 10 amino acids and both expect values at the default of 10. The length and overlap parameters were chosen based on testing with ranges of values, and found to be optimal at their default values of 20 and 10 amino acids respectively.
I employed Chime in two ways for my investigations. First, I used it to look at the tertiary structures of IDH1 and caspase 3, and compare them with other proteins that showed up as matches in the output from the different similarity search programs. Second, with Chime scripting, I was able to present the results of my study of IDH1 and caspase 3 in an interactive manner on the World Wide Web.
Back to Table of Contents