Divide and BLAST

Divide And BLAST

DAB

Figure 1. A diagrammatic representation of the Divde-and BLAST process.

Introduction

Divide-and-BLAST is a Perl program written to facilitate the discovery of proteins weakly similar to a protein sequence of interest. The BLAST program at NCBI works very well when it comes to high similarity searches. Unfortunately, weak similarities are often listed below hundreds of high similarity hits, and may not even be shown if the number of hits is small or if the cutoff expect value is too low (NCBI, 1999).

Divide-and-BLAST attempts to address the problem of filtering high similarity hits from a list of hits for a sequence, leaving possibly significant weak similarity hits for further investigation. The program divides its input sequence into a number of sub-sequences, whose length and overlap can be specified as parameters. It then submits both the full sequence and each sub-sequence to the BLAST server using the BLAST network client. After receiving the results, Divide-and-BLAST then removes the hits for the full sequence from the list of hits for each sub-sequence (Fig. 1). The output is a file listing the unique hits for each sub-sequence. If a protein is found to have unique hits on more than one sub-sequence and not show up in the list of hits for the full sequence, it is very likely that there exists a significant similarity between the protein and the input sequence. Even if there are no such duplicate hits between sub-sequences, some of the relatively high similarity unique hits might warrant further investigation using other methods, computational or experimental.

Installing and Running Divide-and-BLAST

Windows PC:

Download dabgui_w32.exe
The BLAST network client is required. This allows searching of the NCBI sequences databases remotely, i.e. without having a local copy of the databases. Download netblastz.exe from the NCBI ftp server.
With dabgui_w32.exe and blastcl3.exe in the same directory, double click on dabgui_w32.exe to run.

Unix:

Download source code: dab-gui.pl.txt
Remove the .txt extension from the file name; in other words, rename the file dab-gui.pl
Download appropriate unix version of BLAST network client (netblast.*.tar.gz, where * is your flavor of unix) from the NCBI ftp server.
Run the program by typing 'perl dab-gui.pl' at the command line.

Mac:

OS X executable is under development. Unix instructions do not immediately apply, because Tk library is not built into perl on Mac OS X. It can be included by following instructions at this page. Then DAB can be run with Unix instructions above.

Sample files:

A simple input file for testing DAB: idh1_human_short.txt

A sample output for Divide-and-BLAST can be seen here. These were the results obtained when Divide-and-BLAST was used to analyze the human isocitrate dehydrogenase protein sequence, using sub-sequences of length 20 amino acids and overlap of 10 amino acids. Notice the hits for isopropylmalate dehydrogenase; Divide-and-BLAST clearly found an evolutionary relationship, and localized it to a certain area of the sequence.

What are Expect values?
In general, higher expect values mean lower similarities and vice versa. The Expect value parameter is the cutoff value -- any hits with Expect values above the one specified will not be shown. Since Expect value depends on length, sometimes increasing the Expect value for the sub-sequence BLASTs might turn up more unique hits than with the default value of 10.0. For a detailed explanation of Expect values, see the BLAST FAQ at NCBI.

References

National Center for Biotechnology Information. 1999. BLAST Frequently Asked Questions.<http://www.ncbi.nlm.nih.gov/BLAST/blast_FAQs.html Accessed 1999 16 Dec.

GCAT Home Page

Genomics Course

Biology Home Page