Sequencing by Hybridization
One of the great benefits of the Human Genome Project and similar projects is the opportunity to study disease and other genetically determined characteristics via sequence comparison. For instance, sickle-cell disease was linked to a base-pair substitution in the disease causing allele resulting in the alteration of the protein by a single amino acid from glutamic acid to valine.1 This discovery required the sequencing of both the normal and mutant alleles- a rather timely process when accomplished by standard techniques based on the manipulation of target DNA. The publication of entire genomes opens the door for sequence comparison on a much larger scale, limited only by the speed of which DNA of interest can be analyzed. Sequencing by hybridization (SBH) technology provides researches with a tool powerful enough to enlarge the scope of comparison to the genome level.
The Basic Idea
Hybridization increases in specificity as primer size decreases. For sequencing applications a large probe is uninformative at the single nucleotide level yet a small probe covers only a tiny portion of an average size gene. SBH technology tackles this problem by controlling the synthesis and location of a large number of small probes of uniform length on a fixed array. Different sections of target DNA bind very exclusively to only those primers that exactly compliment their sequence. Target DNA does not have to be extensively manipulated and is examined simultaneously by all probes on an array because hybridization is a fundamentally parallel process.2 After exposure to a fluorescently labeled target sequence the array is scanned by a laser and hybridization at the discrete, square shaped locations detected.2 By creating an array of probes which overlap in sequence researchers can deduce the order along the target molecule of the hybridization events which occurred on the array. In principle, sequencing of unknown genetic material might be accomplished with the entire set of 4^N probes of length N. However, de novo SBH of large molecules has proved too complex to be successfully accomplished.3 SBH has instead been used for resequencing, sequence verification and analysis of single nucleotide polymorphisms.
Probe Array Synthesis
Since arrays contain a very large number of different probes, synthesis must be accomplished in a parallel fashion or else manufacturing would be far too time consuming to be commercially feasible. The company Affymetrix has developed a method to synthesize different primers on an array in parallel using light-directed combinatorial chemistry.2 First a synthetic linker is attached uniformly to the surface of a glass tile and subsequently blocked with a photoreactive blocker. Next a photolithographic mask is used to expose to light only desired locations on the tile to the effects of light. These locations are deprotected as the blocking compound disassociates with the linker upon exposure. A particular deoxynucleotide is then applied to the tile and binds only to locations that have been photodeprotected. One out of four nucleotides can be applied at a time so for a diverse array four cycles must be used for every position along the length of the probes.2 Production time, as a consequence, depends only on the length of the probes and not on the size of the array and is therefore well adapted to the production of larger tiles. The end product of this process is a small tile with a pattern of tiny squares delineated by exposure to light which each represent a unique probe sequence.
Figure 1. The figure depicts the stepwise
synthesis of an Affymetrix (R) probe array
using the light-directed combinatorial chemistry method. www.affymetrix.com/technology/synthesis.html
Accessed 2000 21 Feb.
Application: Single Nucleotide
Polymorphism Analysis
Using a 4L tiled array
One
scheme, called a 4L tiled array, can be used to compare a target and known
reference sequence and identify differences of a single nucleotide.
A set of four probes corresponding to every position along the length of
a reference sequence is synthesized on a tiled array resulting in a total
number of 4xL positions on the tile. Each set of probes differs from
the reference sequence by only a single nucleotide at a fixed position
along the length of every probe in the array. If an array consists
of 10 nucleotide probes and the varied base is at position 4 relative to
the 3í end then the array is termed a P10,4. When the
reference sequence, which matches one square in every probe set is hybridized
with the array and read by laser, every set should produce one hybridization
signal.4 This exclusiveness is critical to the functioning of
the array and is produced by the requirement of an exact match for these
small probes. A suspected reference sample, therefore, can be verified
directly by examining its hybridization to each probe set. Each set
identifies a single position along the reference sequence corresponding
to the complement of the variable base of the one probe which hybridizes.4
Since each set of probes fails to overlap at only a single position relative
to its neighbors, the array ensures that repeated sequences are differentiated.
For any repetition, probe hybridizations spanning the repetitionís ends
will differentiate between different locations along the molecule.
The
overlap among neighboring probe sets in the 4L array also permits the identification
of differences in sequence of a single nucleotide between reference and
target molecules. Since hybridization requires an exact match and
only one nucleotide position varies for every probe, neighboring probes
overlapping a substitution in a target sequence will fail to hybridize
and produce a characteristic loss of signal.4 To positively
identify these characteristic "footprints" reference and target sequence
must be hybridized with identical arrays under identical conditions.
The footprint can then be detected as a reduction of the fluorescence intensity
of the target relative to the reference sequence among the probes neighboring
the substitution.4 Stephen P. A. Fodor and his lab at Affymetrix
improve on this method by labeling reference and target molecules with
different fluorescent markers to improve the fidelity of the signals.4
When two
or more single nucleotide substitutions occur in a span shorter than the
length of the probes along the target sequence an enlarged footprint appears.
Multiple mismatches as well as other aberrant signals produce ambiguity
that detracts from the accuracy of the method. Therefore, Fodorís
group developed computer algorithms for the 4L scheme which flags these
regions for conventional sequence analysis.4
References
2000 Feb 21. GeneChip(R) Probe Array Synthesis. < www.affymetrix.com/technology/synthesis.html> Accessed 2000 21 Feb..
1Campbell, Neil A. Biology, fourth edition. The Benjamin/Cummings Publishing Company, Inc. 1996. pp. 317f.
2Fodor, Stephen P. A. 1997. Massively Parallel Genomics. Science 277: 393-395.
3Wallraff, G., J. Labadie, P. Brock, R. DiPietro, T. Nguyen, T. Huynh, W. Hinsberg, G. McGall. DNA Sequencing on a Chip. Chemtech 27: 22-32. Feb.
4Fodor, Stephen P. A., M. S. Morris, D.
J. Lockhart, J. Winkler, D. Stern, X. C. Huang, A. Berno, E. Hubbell, R.
Yang, M. Chee. 1996. Accessing Genetic Information with High-Density DNA
Arrays. Science 274: 610-614.
Back to Molecular Biology Homepage
Please send questions or comments to: fisturgill@davidson.edu