当前位置: 首页 > 期刊 > 《核酸研究》 > 2004年第4期 > 正文
编号:11372374
SNPWaveTM: a flexible multiplexed SNP genotyping technology
http://www.100md.com 《核酸研究医学期刊》
     Keygene NV, Agro Business Park 90, PO Box 216, 6700 AE Wageningen, The Netherlands

    *To whom correspondence should be addressed. Tel: +31 317 466866; Fax: +31 317 424939; Email: michiel.van-eijk@keygene.com

    ABSTRACT

    Scalable multiplexed amplification technologies are needed for cost-effective large-scale genotyping of genetic markers such as single nucleotide polymorphisms (SNPs). We present SNPWaveTM, a novel SNP genotyping technology to detect various subsets of sequences in a flexible fashion in a fixed detection format. SNPWave is based on highly multiplexed ligation, followed by amplification of up to 20 ligated probes in a single PCR. Depending on the multiplexing level of the ligation reaction, the latter employs selective amplification using the amplified fragment length polymorphism (AFLP?) technology. Detection of SNPWave reaction products is based on size separation on a sequencing instrument with multiple fluorescence labels and short run times. The SNPWave technique is illustrated by a 100-plex genotyping assay for Arabidopsis, a 40-plex assay for tomato and a 10-plex assay for Caenorhabditis elegans, detected on the MegaBACE 1000 capillary sequencer.

    INTRODUCTION

    Recently, large-scale sequencing of complete genomes has fueled the discovery of single nucleotide polymorphisms (SNPs) in humans (1), mouse (2), Arabidopsis (3) and a number of other organisms (4). Since SNPs represent the most common type of genetic variation in the genome, powerful SNP genotyping technologies are needed to fully exploit the opportunity offered by SNPs to detect allelic variation in genes involved in (complex) traits in humans, (farm) animals, microorganisms and plants. Hence, over the past years, a large number of different SNP detection techniques have been developed, based on various methods of allele discrimination, target amplification and detection platforms; reviewed by Syv?nen (5), Kwok (6) and Twyman and Primrose (7). However, few of these SNP genotyping methods are multiplexed at all steps, which is needed for cost-effective genotyping of many SNPs per sample.

    Recently, a number of papers (8–11) have been published that describe the use of a multiplexed oligonucleotide ligation assay , followed by amplification with a single primer pair as a way to overcome this limitation. For OLA, these methods employ either linear ligation probes (8), circularizing ‘padlock’ probes (10) as first described (13) and applied (14–16) by the Landegren laboratory, or the recently described molecular inversion probes (11). However, although these methods are highly multiplexed, they are not very flexible with respect to the multiplex composition and include either a laborious probe preparation method (8) or relatively expensive, overnight hybridization-based detection (9–11).

    Here we describe the SNPWaveTM technology, which employs amplification of allele-specific products from a highly multiplexed ligation mixture using PCR in combination with length-based detection on a (capillary) sequencing instrument. Depending on the number of polymorphisms to be scored in the sample, SNPWave incorporates the principle of selective amplification known from amplified fragment length polymorphism (AFLP?) technology, a complexity reduction technique introduced by our laboratory in the early 1990s for multiplex amplification and detection of DNA markers without prior sequence information (17,18), to amplify probes corresponding to 10 loci simultaneously. By combining multiplexed ligation-dependent amplification with the known robustness of amplification using a single primer pair, low-cost probe synthesis, and high throughput detection using flanking sizing standards, SNPWave allows detection of SNPs under uniform reaction conditions in a highly flexible way. The power, robustness and flexibility of the SNPWave technology is illustrated by a 100-plex assay for SNP genotyping in Arabidopsis, a 40-plex assay for tomato and a 10-plex assay for Caenorhabditis elegans. In addition to its application for SNP genotyping, the flexibility of the SNPWave technology allows detection of non-polymorphic sequences or low-abundant sequences in a complex background, and/or combination with hybridization- and mass-based detection platforms.

    MATERIALS AND METHODS

    Probe design

    Circularizing padlock ligation probes (13–16) were designed with ProbeDesigner software (Keygene NV), which uses the two alleles and flanking sequences of 10 SNP loci as input information. AFLP primer sequences M00k, 5'GATGAGTCCTGAGTAA-3' and reverse complemented E00k, 5' GAATTGGTACGCAGTC-3' were selected from a collection of binding regions for PCR amplification. Next, the flanking sequences of each locus were selected to be as close as possible to a melting temperature (Tm) of 68°C. Length stuffers were incorporated in all probe sequences. Twenty probes for two alleles of 10 loci were assigned to the length combinations 82 and 84 bp, 87 and 89 bp, 92 and 94 bp, 97 and 99 bp, 102 and 104 bp, 107 and 109 bp, 112 and 114 bp, 117 and 119 bp, 122 and 124 bp, or 127 and 129 bp. Spacing of 2 bp between alleles and 3 bp between loci was chosen to avoid co-migration of SNPWave products derived from different loci. ProbeDesigner performs assignment of SNP loci to these probe sizes such that as many as possible locus-specific flanking sequences meet the selected Tm threshold. This favors assignment of loci with AT-rich flanking sequences to the larger probe sizes. Next, a sequence similarity search was performed to identify homologies within and between ligation probes or regions that might cause secondary structures (e.g. hairpins). When this involved stuffer sequences, the stuffers of both alleles were automatically replaced by selecting others with the same length from a fixed collection. This process was iterated until no more sequence similarities or suitable stuffer sequences were found. Two selective bases for AFLP-derived selective amplification were introduced adjacent to primer-binding regions (Table 1) by replacing stuffer or flanking sequences at the appropriate positions while maintaining total lengths of the probes as defined above. Theoretically, having two selective nucleotides for each amplification primer incorporated in the ligation probes allows the selection of 256 (16 x 16) different subsets of ligated probes for co-amplification. A total of 32 (16 + 16) different AFLP primer combinations can be selected for amplification with a +2 AFLP primer in one direction and a +0 primer (no selective bases) in the other direction. Twenty (10 + 10) of these AFLP primer combinations allow selective amplification of 20 different subsets of 10 polymorphic loci from a 100-plex ligation.

    Table 1. Flexible SNPWave amplification in Arabidopsis using a two-dimensional 10 x 10 selective amplification design

    For 100 polymorphic loci between the Arabidopsis accessions Columbia and Landsberg erecta obtained from The Arabidopsis Information Resource (TAIR; www.arabidopsis. org), probe design was repeated 10 times to design 200 allele-specific ligation probes. For tomato, this was done four times to design 80 ligation probes with selective bases representing 40 SNP loci. A difference with the padlock probes for Arabidopsis is that these padlock probes contain one selective base adjacent to the primer-binding regions instead of two. For C.elegans, probe design was done once to design 20 ligation probes without selective bases for 10 SNP loci. All padlock ligation probes were purchased high-pressure liquid chromatography (HPLC)-purified from Metabion (Planegg-Martinsried, Germany). Their sequences are listed in the Supplementary Material available at NAR Online.

    DNA samples

    Seeds from the Arabidopsis ecotypes Columbia and Landsberg erecta were obtained from the Nottingham Arabidopsis Stock Centre (NASC; Nottingham, UK). Arabidopsis leaf samples of 92 different accessions were provided by Dr Maarten Koornneef, Wageningen University, The Netherlands, and originated from NASC, the Arabidopsis Biological Resource Center (ABRC; Columbus, OH) and the Sendai Arabidopsis Seed Stock Center (SASSC; Sendai, Japan). Homozygous tomato line Lycopersicon esculentum cv. Moneyberg, Lycopersicon hirsutum line cv. G1560 and 44 F2 offspring from Moneyberg and G1560 as parental lines were obtained from De Ruiter Seeds CV (Bergschenhoek, The Netherlands). Forty-eight hybrid L.esculentum tomato lines were obtained from Rijk Zwaan (De Lier, The Netherlands), De Ruiter Seeds CV, Enza Zaden (Enkhuizen, The Netherlands) and Vilmorin Clause and Companies (Chappes, France). DNA was isolated from leaf material of individual seedlings using a modified CTAB procedure described by Stuart and Via (19). Five C.elegans DNA samples (HW1, Loopy 1A, 8.7, 14.2 and 47.6) were provided by Dr P. Feldmann, Devgen NV (Ghent-Zwijnaarde, Belgium). All DNA samples were diluted to a concentration of 100 ng/μl in TE (10 mM Tris–HCl pH 8.0, 1 mM EDTA) and stored at –20°C.

    SNPWave reaction conditions

    Ligation reactions for Arabidopsis polymorphisms were performed in a 25 μl volume containing 625 ng of Arabidopsis DNA, 1x Taq DNA ligase buffer , 0.2 U/μl Taq DNA ligase (NEB) and 0.05 fmol/μl of each of 200 ligation probes. Next, 10 cycles of repeated denaturation, probe hybridization and ligation were performed in a Perkin Elmer 9700 thermal cycler (Applied Biosystems, Foster City, CA) using the following profile: initial denaturation for 2 min at 94°C, followed by 10 cycles of 15 s at 94°C and 60 min at 60°C, followed by storage at 4°C. Following ligation, the mixture was diluted with 85 μl of 1x Taq DNA ligase buffer to 110 μl. Forty-plex ligation reaction conditions for tomato samples and for 10-plex tomato and C.elegans samples were similar, except that 100 ng of DNA was used, ligation reactions were performed in a 10 μl volume and diluted by adding 30 μl of 1x Taq DNA ligase buffer.

    Selective primers were used to amplify subsets of 10 polymorphic Arabidopsis loci from a 100-plex ligation mixture or 10 polymorphic tomato loci from a 40-plex ligation mixture: for Arabidopsis, 10 μl of diluted ligation reaction was amplified in a 20 μl mixture containing 1x GeneAmp? PCR buffer (Applied Biosystems), 200 μM of each dNTP (Amersham Biosciences, Buckinghamshire, UK), 0.02 U/μl AmpliTaq Gold DNA polymerase (Applied Biosystems), and either 1.5 ng/μl FAM-, JOE- or NED-labeled selective AFLP primer E00k+2 and 1.5 ng/μl unlabeled primer M00k, or 1.5 ng/μl FAM-, JOE- or NED-labeled primer E00k and 1.5 ng/μl unlabeled AFLP primer M00k+2 (MWG, Ebersberg, Germany). AFLP primer sequences E00k, 5'GACTGCGTACCAATTC-3 and M00k, 5'GAT GAGTCCTGAGTAA-3' were as described by Vos and co-workers (18). Selective primers E00k+2 or M00k+2 contained two additional bases at the 3' end. Specifically, E00k+2 with selective bases AC, AG, CA, CT, TC, TG, GA, GT, CG or GC was used in combination with M00k, or M00k+2 containing one of these selective bases was used in combination with a E00k (Table 1). Amplification conditions for 40-plex ligation reactions in tomato were identical, except that JOE-labeled selective primer E00k+C was used in combination with unlabeled primer M00k+C. Amplification of 10-plex ligated tomato and C.elegans probes was performed using NED-labeled E00k and M00k without selective bases. High purity salt-free (HPSF)-purified FAM- and JOE-labeled primers were purchased from MWG, and HPLC-purified NED-labeled primers from Applied BioSystems. Selective amplification using a touch-down profile was as described by Vos and co-workers (17,18), modified by addition of a hot start to activate the AmpliTaq Gold DNA polymerase: 12 min at 94°C, followed by 13 cycles of 30 s at 94°C, 30 s at 65°C with a reduction of 0.7°C per cycle to 56°C in cycle 13, followed by 1 min at 72°C. This was followed by 23 cycles of 30 s at 94°C, 30 s at 56°C and 1 min at 72°C, and storage at 4°C.

    Purification of SNPWave reactions

    SNPWave PCR products were desalted over Sephadex G-50 superfine columns in a 96-well plate format prior to detection by capillary electrophoresis on the MegaBACE 1000 (Amersham Biosciences). Briefly, dry Sephadex G-50 superfine resin (Amersham Biosciences) was loaded into the wells of a 96-well plate (MultiScreen?-HV, Millipore Corporation, Bedford, MA) using the 45 μl column loader (Millipore), and excess resin removed. The resin was rinsed twice with 200 μl of Milli-Q water per well and packed by centrifugation for 5 min at 900 g. Next, 200 μl of Milli-Q water was added to each well and incubated for 2–3 h to swell the resin. Multiscreen-HV plates with swollen resin were tightly sealed with parafilm and stored at 4°C or centrifuged for 5 min at 900 g for immediate use. For purification, 8 μl of SNPWave products of each fluorescent label (FAM, JOE and/or NED) were mixed and diluted with 120 μl of Milli-Q water. A 20 μl aliquot of mixed and diluted SNPWave product was carefully applied to the center of each well and the Multiscreen-HV plate was placed on top of a standard U-bottom microtiter plate. Centrifugation was carried out for 5 min at 900 g and eluates of 20 μl of purified SNPWave products per well were collected and diluted 20-fold with Milli-Q water.

    Detection of SNPWave reactions on the MegaBACE 1000

    Prior to injection, 5 μl of 250-fold diluted ET-900 ROX sizing standard (Amersham BioSciences) was added to 5 μl of purified and diluted SNPWave product. Samples containing ET-900 ROX sizing standard were heat-denatured by incubation for 1 min at 94°C and subsequently put on ice. MegaBACE capillaries were filled with 1x LPA matrix (Amersham Biosciences) according to the manufacturer’s instructions. Electrokinetic injection of the samples was for 45 s at 3 kV. For runs with an ET-900 ROX sizing standard, run parameters were 110 min at 10 kV; for runs with a flanking sizing standard, run parameters were 35 min at 10 kV. Electropherograms were generated using Genetic Profiler software, version 2.0 (Amersham BioSciences).

    Data processing and scoring

    SNPs were scored using SNPXtractor software version 1.0 (Keygene NV). Raw data files (.rsd) generated by the MegBACE Instrument Control Manager (ICM) software (Amersham BioSciences) were imported into SNPXtractor, and cross-talk correction, peak smoothing and recognition of the ET-900 ROX sizer fragments was performed. Next, automatic peak finding, sizing and calculation of peak intensities normalized to the total ET-900 ROX sizer band intensities was carried out. Further scoring was done in a semi-automated fashion: a pseudo-gel image was generated in SNPXtractor to indicate and verify correct placement of SNPWave fragments in all the capillaries. Once the presence of an SNPWave reaction product was indicated by the user, peak intensities at the corresponding mobilities in the remaining capillaries were calculated. SNP genotypes were derived by processing fragment intensities of peaks representing two alleles of a locus. First, total intensities of a locus were recalculated as a ratio according to the following formula:

    where a1 and a2 are the intensities for the corresponding alleles. Next, the ratios were binned and fitted to a set of Gaussian distributions by using the EM algorithm (20). Graphical representations of such fits are shown in Figure 4. Finally, genotypes were assigned based on the distribution of the ratios in the fit classes, with A = homozygous parent 1, B = homozygous parent 2 and H = heterozygous. U (unknown) scores were assigned for missing data points.

    Figure 4. (A) Two-fit histogram of the Arabidopsis SNP locus SGCSNP93 generated by SNPXtractor scoring software. SNPWave reactions of 93 Arabidopsis accessions were prepared as described in Materials and Methods, using JOE-labeled primer E00k+0 and unlabeled primer M00k+CG. Purified and diluted SNPWave products were supplemented with ET-900 ROX sizing standard and detected on the MegaBACE 1000. SNPWave data were scored using SNPXtractor as described in Materials and Methods. Two-fit histograms were obtained for every locus, as would be expected for a germline screening involving homozygous lines. (B) Three-fit histogram of tomato SNP locus 34 generated by SNPXtractor. Ten-plex SNPWave reactions of 96 tomato samples (consisting of parental lines L.esculentum cv. Moneyberg and L.hirsutum G1560, 44 of their F2 offspring and a collection of 48 L.esculentum hybrid tomato lines) were prepared as described in Materials and Methods. JOE-labeled primer E00k+0 and unlabeled primer M00k were used in the amplification reaction. Purified and diluted SNPWave products were supplemented with the ET-900 ROX sizing standard and detected on the MegaBACE 1000. Data were scored using SNPXtractor software, and three-fit histograms were obtained for most loci, as would be expected in light of the origin of the samples used.

    Fragment sizing using a flanking sizing standard

    Sizing using the ET-900 ROX sizing standard required a run time of 110 min in order to detect all sized fragments (60– 900 nt) in all capillaries, whereas the length of the largest SNPWave product is only 129 bp. To increase throughput by using shorter run times of 35 min, a flanking sizing standard and a flanking sizing standard algorithm were developed. All flanking sizing standard fragments were 5' FAM-labeled oligonucleotides (Metabion): S65 (65 nt), S68 (68 nt), S71 (71 nt), S74 (74 nt), S132 (132 nt), S135 (135 nt) and S138 (138 nt). Their sequences are listed in the Supplementary Material. A flanking sizing standard was made by combining the oligonucleotides to a concentration of 0.2 nM S65, 0.2 nM S68, 0.225 nM S71, 0.3 nM S74, 0.625 nM S132, 0.875 nM S135 and 1.625 nM S138. A 5 μl aliquot of flanking sizing standard was added to 5 μl of purified and diluted SNPWave product prior to detection on the MegaBACE. A sizing algorithm recognizing the flanking sizing standard based on the known lengths and characteristic peak intensity patterns was incorporated in SNPXtractor. Initially, the flanking sizing standard algorithm was calibrated using ET-900 ROX as a reference. SNPWave products were sized by interpolation relative to the flanking sizing standard detected in the FAM channel.

    PCR amplification and direct sequencing

    (Nested) PCR amplification and direct sequencing of PCR products from 15 tomato SNP loci was performed to validate the accuracy of the SNPWave genotyping. First round PCRs were with 100 ng of genomic DNA in a 25 μl volume with 1x GeneAmp PCR buffer, 200 μM of each dNTP, 0.03 U/μl AmpliTaq DNA polymerase (Applied Biosystems) and 0.2 μM forward and reverse locus-specific primers. Cycling was carried out in a Perkin Elmer 9700 thermal cycler as follows: initial denaturation for 2 min at 94°C, followed by 16 cycles of 30 s at 94°C, 30 s at 50°C and 30 s at 72°C, followed by 7 min at 72°C and storage at 4°C.

    Second round PCRs were in a 25 μl volume containing 1 μl of first round PCR product, 1x GeneAmp PCR buffer, 200 μM of each dNTP, 0.03 U/μl AmpliTaq DNA polymerase and 0.4 μM nested forward and reverse primers. Thermal cycling using a touch-down profile was as follows: 16 cycles of 30 s at 94°C and 30 s at 61°C with a reduction of 0.7°C per cycle to 50°C in cycle 16, followed by 1 min at 72°C. This was followed by 24 cycles of 30 s at 94°C, 30 s at 50°C and 1 min at 72°C, and storage at 4°C. Sequences of either the forward or reverse locus-specific primer for each SNP locus contained an M13 tail to facilitate direct sequencing. All (nested) PCR primers used are listed in the Supplementary Material.

    Templates for sequence reactions were 200 ng of second round PCR product for those <600 bp and 400 ng for those >600 bp. Templates were treated with 0.033 U/μl shrimp alkaline phosphatase (SAP; USB, Cleveland, OH) and 0.033 U/μl exonuclease I (USB) in a total volume of 10 μl containing 1x GeneAmp PCR buffer and 0.06x SAP dilution buffer (USB). Sequence reactions were in 20 μl, containing 0.82 μl of SAP/exonuclease I-treated template, 1x sequencing buffer (26 mM Tris–HCl, 6.5 mM MgCl2, 5.0% glycerol; pH 9.0), 4 μl of ET-terminator pre-mix (Amersham Biosciences) and 4.5 ng/μl 24mer M13 sequencing primer (5'-CGCCAGGGTTTTCCCAGTCACGAC-3'). Thermal cycling conditions were 50 cycles with 20 s at 94°C and 2 min at 60°C. Sequence reactions were precipitated using 0.7 M NH4Ac and 2.5 vols of 100% ethanol. Precipitates were washed once with 70% ethanol, dried and dissolved in 50 μl of Milli-Q water. A 10 μl aliquot was detected on a MegaBACE 1000 using standard sequence filters, following injection for 10 s at 3 kV and running for 120 min at 9 kV, as recommended (Amersham Biosciences). Processing of sequence traces and base calling were performed using Sequence Analyzer version 3.0. (Amersham Biosciences). Traces were manually inspected at the SNP site to call the genotypes.

    RESULTS

    SNPWave is a flexible multiplexed technique for detection of (single nucleotide) polymorphisms, based on (selective) amplification of ligated probes from a complex ligation mixture. A general outline of the SNPWave procedure is presented in Figure 1. The SNPWave technique consists of three steps common to most SNP genotyping techniques: allele discrimination, amplification and detection. Allele discrimination is based on hybridization and ligation of allele-specific oligonucleotide probes to target DNA using the oligonucleotide ligation assay (12). This first step is carried out in a multiplex of at least 10 loci using allele-specific ligation probes containing stuffer sequences and primer-binding sequences with or without selective nucleotides. Amplification and length-based detection are as described before (18). In this study, we present the SNPWave technology using circularizing padlock ligation probes (13–16); in a separate manuscript (M.van Eijk et al., in preparation), we will report how a novel probe type, the Keylock probe, can be used in SNPWave assays.

    Figure 1. Principle of the SNPWave method. Allele-specific ligation probes are hybridized to denatured genomic DNA. SNP allele discrimination is based on the specificity of the Taq (Thermus aquaticus) DNA ligase. (A and B) Closed circular probes are formed only in cases where the 3'-hydroxylated SNP allele-specific end of the ligation probe hybridizes immediately adjacent to the 5'-phosphorylated common probe sequence of the opposite end of the probe. (C) Next, closed circular probes are amplified, commonly with AFLP primers containing two selective nucleotides. This ensures efficient amplification only of those closed ligation probes containing perfectly base-paired nucleotides (such as GC or TG) adjacent to common primer sequences (denoted by red and green arrows). Blue boxes indicate length stuffers incorporated in the ligation probes, which allows detection of the amplification products by size. Total probe lengths differed by two bases between alleles of a locus, and by three bases between loci to avoid co-migration of amplification products.

    Development of a 100-plex Arabidopsis SNPWave assay

    In order to demonstrate selective amplification of ligation probes from a complex mixture, we developed a 100-plex SNPWave assay using known SNPs between the Arabidopsis ecotypes Columbia and Landsberg erecta. Length stuffers and selective nucleotides were incorporated in 200 ligation probes for 100 loci according to the design described in Table 1. This two-dimensional (10 x 10) design allowed flexible amplification of 20 subsets of 10 polymorphic loci from a 100-plex ligation mixture for whole-genome screening or fine mapping, respectively. For amplification of each subset of 10 loci, an AFLP primer with two selective bases was used in combination with a non-selective primer. Electropherograms and corresponding pseudo-gel images obtained with three of these primer pairs are shown in Figure 2. All 20 possible primer pairs were tested and proved to be fully selective (data not shown). Signal intensities and allele discrimination of 90 loci yielded reliable genotyping results (data not shown); failure of the remaining 10 loci was in four cases due to insufficient allele discrimination of one or both ligation probes and in six cases due to the absence of a detectable signal for one or both probes.

    Figure 2. Selective amplification of 10 polymorphic loci from a 100-plex ligation reaction of Columbia and Landsberg erecta. Left: 100-plex ligation reactions were performed using 625 ng samples of genomic DNA from the Columbia and Landsberg erecta Arabidopsis ecotypes as described in Materials and Methods. Three 10-plex AFLP +0/+2 amplification reactions were carried out using primer combinations with +0/+AG (1), +0/+TC (2) and +0/+CG (3) selective bases, of which the +0 primers were labeled with JOE, FAM and NED, respectively, as described in Materials and Methods. These AFLP primer combinations amplify loci 11–20, 41–50 and 81–90, respectively, as described in Table 1. Pooled and purified amplification products were separated by capillary electrophoresis using a MegaBACE 1000, including the ET-900 sizing standard (Amersham Biosciences). Electropherograms of SNPWave products showing their size in base pairs on the x-axis and fluorescence intensities on the y-axis were generated with Genetic Profiler (version 2.0) software (Amersham Biosciences). Right: Pseudo-gel images of the electropherograms shown on the left were generated using SNPXtractor software (Keygene NV). ET-900 sizer fragments are not shown. Size references in base pairs are included on the left, and locus numbers are shown on the right.

    Subsequently, 93 Arabidopsis accessions were genotyped using all 10 (E00k+0/M00k+2) primer combinations, with +2 selective nucleotides as described in Materials and Methods. A pseudo-gel image of the results of 69 Arabidopsis accessions with primer combination +0/+GC, detecting SNP loci 81–90, is shown in Figure 3. This figure illustrates the existence of considerable polymorphism in the Arabidopsis germplasm and the ability to genotype Arabidopsis samples using this SNPWave assay. Data scoring and processing were performed in a semi-automated fashion using SNPXtractor software, based on ratios of band intensities of the respective alleles of a locus. Assignment of genotypes according to the A, B, H format (21) was as described earlier for AFLP (22). An example of a two-fit resulting from SNPWave data of locus SGCSNP93 genotyping in 93 Arabidopsis accessions is shown in Figure 4A. This figure demonstrates that all band intensity ratios fall in two distinct classes, representing the homozygous (A and B) genotypes expected for homozygous lines. For comparison, a three-fit corresponding to tomato SNP locus 34 scored in 94 samples (46 germplasm lines and 48 samples of an F2 mapping population; see ‘Validating data accuracy’ below) is shown in Figure 4B. These data demonstrate the applicability of conventional genotyping algorithms to facilitate SNPWave data scoring in a semi-automated fashion using SNPXtractor.

    Figure 3. Genotyping of 69 Arabidopsis accessions using the SNPWave technology. One hundred-plex ligation reactions were performed using 625 ng samples of genomic DNA from 69 different Arabidopsis accessions as described in Materials and Methods. The AFLP primer combination included +0/+CG as selective bases, which amplifies loci 81–90. FAM-labeled SNPWave amplification products were separated by capillary electrophoresis using a MegaBACE 1000, including the ET-900 ROX sizing standard (Amersham Biosciences). A pseudo-gel image of the SNPWave amplification products was generated using SNPXtractor software (Keygene NV). ET-900 ROX sizer fragments are not shown. Size references in base pairs are included on the left, and locus numbers are shown on the right.

    SNPWave genotyping of other organisms

    To demonstrate wider applicability of the SNPWave technique, we developed a 40-plex assay for tomato and a 10-plex assay for C.elegans SNPs (genome sizes 950 and 97 Mb, respectively).

    Electropherograms and matching pseudo-gel images of SNPWave data from selective amplification of 10 SNP loci from a 40-plex ligation in five tomato samples (two parental lines and three of their F2 offspring) are shown in Figure 5A. Fifteen of 20 SNP alleles derived from 10 SNP loci were detected in these samples. Four alleles were not detected because the parental lines (samples 1 and 2) were homozygous for the same allele, hence these alleles did not segregate in this cross. A fifth allele was not detectable in any of the samples. As expected, segregation was observed among the F2 offspring (samples 3–5) for the six polymorphic SNP loci, including detection of heterozygous genotypes (Fig. 5A). As observed previously in Arabidopsis, amplification of the remaining three subsets of 10 SNP loci with the appropriate selective primers was also fully selective (data not shown).

    Figure 5. (A) SNP genotyping of five tomato samples using the SNPWave technology. Forty-plex ligation reactions were performed using 100 ng samples of genomic DNA from the parental lines (samples 1 and 2) and three F2 offspring (samples 3–5), as described in Materials and Methods. PCR amplification was performed using JOE-labeled selective amplification primer E00k+C and unlabeled M00k+C to amplify 10 SNP loci simultaneously. The parents were heterozygous for six of these 10 SNP loci. Left: JOE-labeled products were separated by capillary electrophoresis using a MegaBACE 1000, including the ET-900 ROX sizing standard. Right: pseudo-gel images of the electropherograms were generated using SNPXtractor software. ET-900 ROX sizer fragments are not shown. Sample numbers 1–5 are shown on the top and size references in base pairs are included on the left. (B) SNP genotyping of five C.elegans samples using the SNPWave technology. Ten-plex ligation reactions were performed using 100 ng of genomic DNA from five different C.elegans samples, numbered 1–5, as described in Materials and Methods. Samples 1 and 2 are the parents of the F2 offspring numbered 3, 4 and 5. The parents were homozygous for the alternative alleles of 10 SNP loci and the ligation probes for these SNPs were designed such that the sizes of the amplification products obtained from the alleles carried by parent 1 were always 2 bp longer than those of parent 2. PCR amplification was performed as described in Materials and Methods using NED-labeled amplification primer E00k and M00k without selective bases to amplify all 10 SNP loci simultaneously. Left: NED-labeled products were separated by capillary electrophoresis using a MegaBACE 1000, including the ET-900 ROX sizing standard (Amersham Biosciences). Right: pseudo-gel image of the electropherograms of SNPWave amplification products shown on the left generated using SNPXtractor software (Keygene NV). ET-900 sizer fragments are not shown. Sample numbers 1–5 are shown at the top, and size references in base pairs are included on the left.

    Electropherograms and corresponding pseudo-gel images of the results of genotyping two parental C.elegans samples (samples 1 and 2) and three F2 offspring (samples 3–5) are shown in Figure 5B. In this case, all 10 SNPs were polymorphic between the parental lines. Eighteen of 20 SNP alleles were detected. Consequently, eight of 10 loci could be scored reliably, while only one allele was detectable for the remaining two SNP loci. Segregation was observed in the F2 offspring.

    Consistent with the results from Arabidopsis, these data indicate a wider applicability of the SNPWave technology in various organisms, with an initial success rate of 80–90% for ligation probes. No attempts were made to recuperate missing data by replacing failing probes with probes designed on the opposite strand.

    Validating data accuracy

    To validate the accuracy of genotyping data obtained using SNPWave detection and scoring procedures, 48 tomato samples (two parental lines and 46 F2 offspring from an L.esculentum x L.hirsutum cross) were subjected to genotyping using three 10-plex SNPWave assays (data not shown). This population was chosen because the F2 offspring will yield heterozygous genotypes important for validation. In parallel, the same 48 samples were used to generate PCR products with locus-specific primers flanking 15 of these SNPs for direct sequencing (data not shown). All sequence traces were manually inspected to confirm base calling at the SNP site with particular attention for heterozygotes. Processing and scoring of the SNPWave electropherograms and direct sequencing resulted in a total of 638 data points for both methods, obtained from 15 SNP loci (range 31–48 genotypes per locus; Table 2). Comparison of the genotypes obtained by both methods yielded 632 identical data points, equaling an average of 99.1% across loci (range 93.5–100% per locus), including 140 heterozygous genotypes. For 12 of 15 loci, the concordance rate was 100% (Table 2). All six obvious explanations for failure of allele consistencies were due to lack of allele discrimination of one ligation probe of a locus. No discrimination by these ligation probes could be derived from careful analysis of their sequences and/or the SNP alleles they were designed for. Overall, these results indicate that the accuracy of SNPWave genotypes is high for most SNP loci, provided that allele discrimination of both ligation probes is sufficient.

    Table 2. Accuracy of SNPWave data

    SNPWave detection with a flanking sizing standard

    SNPWave reaction products comprise only a minor size range (82–129 bp) detectable on an (capillary) electrophoresis platform, in our case the MegaBACE 1000. A FAM-labeled flanking sizing standard was developed to increase throughput by allowing shorter run times and making full use of all four detection (dye) channels. This flanking sizing standard replaced the standard ET-900 ROX genotyping sizing standard, which requires detection of all sizer fragments to size reaction products and therefore run times longer than needed for SNPWave detection. The flanking sizing standard was prepared by combining seven FAM-labeled oligonucleotides with increasing lengths flanking the SNPWave products. Four of these fragments (65, 68, 71 and 74 bases) were shorter than the smallest SNPWave product and three (132, 135 and 138 bases) were larger than the longest SNPWave product. A mixture of these oligonucleotides was added to purified and diluted SNPWave products prior to detection. A flanking sizing standard algorithm was developed in SNPXtractor to size SNPWave products by interpolation, after recognition of the sizer fragments in each capillary. Three identical short (35 min) runs with Arabidopsis SNPWave products from seven accessions were performed to determine the reproducibility of sizing with a flanking sizing standard. Electropherograms of the Kashmir-2 sample of these three runs are shown in Figure 6. Reproducibility of the flanking sizing standard procedure was determined by sizing 16 SNPWave peaks (alleles) observed in these seven samples, and calculating average mobilities and standard deviations for all peaks in these three runs (Table 3). Results indicated that the average sizes differed by no more than 0.2 bases across the alleles, with a maximum SD of 0.26 but often less than 0.1 (Table 3). We concluded that the reproducibility of sizing using the flanking sizing standard algorithm is high and that short run times can be achieved by using this method.

    Figure 6. Detection of SNPWave products with a flanking sizing standard in three consecutive 35 min runs on the MegaBACE 1000. One hundred-plex ligation using 625 ng of genomic DNA of Arabidopsis sample Kashmir-2 (N1264), and SNPWave amplification using JOE-labeled primer E00k and AFLP primer M00k+GC were as described in Materials and Methods. Prior to detection on the MegaBACE 1000, FAM-labeled flanking sizing standards consisting of fragments S65, S68, S71, S74, S132, S135 and S138 were added to purified and diluted SNPWave products, as described in Materials and Methods. Three identical aliquots of an SNPWave product supplemented with flanking sizing standard were detected by capillary electrophoresis on a MegaBACE 1000 in three consecutive 35 min runs. Electropherograms were generated using Genetic Profiler version 2.0 (Amersham Biosciences).

    Table 3. Reproducibility of SNPWave fragment sizing using a flanking sizing standard

    DISCUSSION

    Ligation-dependent selective amplification

    Ligation-dependent multiplexed SNP genotyping techniques have been described in a number of recent publications (8–11). These technologies are based in part on the attractive feature that allele discrimination by the OLA technique (12) can be followed by robust amplification of ligated probes using a single primer pair. The latter is also one of the cornerstones of the AFLP technology (17,18), which employs selective amplification of restriction fragments to which adaptors have been ligated. Specifically, robustness conferred by amplification with a pair of specific primers under stringent conditions, easily scalable multiplexing levels due to primers with selective nucleotides, and the fact that no prior sequence information is required have contributed to widespread use of AFLP since its development in the early 1990s. However, a limitation of AFLP resulting from its sequence information independence is that the composition of AFLP fingerprints is biologically determined by the location of recognition sequences for restriction enzymes in the genome. These ‘random fingerprints’ are often of limited value for applications aimed at routine (diagnostic) detection of selected sets of informative genetic markers. The SNPWave technology addresses this limitation of AFLP, while maintaining its robust and flexible amplification characteristics.

    Length-based detection

    A number of remarks can be made regarding the SNPWave technology in comparison with the multiplexed SNP genotyping techniques cited above: both SNPWave and the multiplex ligation-dependent probe amplification (MLPA) method developed by Schouten and co-workers (8) are based on detection of amplified ligation products by size. MLPA probes are prepared using M13 phage to overcome the length limitations imposed by current chemical oligonucleotide synthesis techniques. As a result, MLPA products can span the entire detection window of sequencing platforms, allowing simultaneous detection of around 40, but possibly more target sequences. MLPA is therefore well suited for diagnostic screening of known mutations, including SNPs and copy number changes (8). However, MLPA probe preparation is time consuming, which increases the development costs of MLPA assays. MLPA is therefore not ideally positioned for applications aimed at detecting large numbers of different target sequences. One of the objectives of the SNPWave technology was to counter this limitation by using only chemically synthesized ligation probes that can be custom-ordered from commercial vendors. As noted earlier by Banér and colleagues (10), the quality of ligation probes is very important for ligation-dependent assays. This is particularly true in combination with length-based detection. In this study, we demonstrate that HPLC-purified padlock probes required for SNPWave detection (up to 129 bases in the format we chose) are within reach of current oligonucleotide synthesis technologies. Although the cost of HPLC-purified ligation probes is still significant (currently US$80–100 per probe, depending on their length), it compares favorably with the development costs of MLPA probes and yields a quantity sufficient for at least 1 million ligation reactions. In addition, the costs for oligonucleotide synthesis is still going down. Compared with MLPA, SNPWave has a lower multiplexing capability at detection, which we set arbitrarily at 20 amplification products corresponding to 10 bi-allelic SNP loci. However, to compensate for this lower information content, SNPWave employs short runs with a flanking sizing standard for detection of SNPWave amplification products in all four dye channels. An advantage of both SNPWave and MLPA is that sequencing platforms are widely available in most research laboratories.

    Hybridization-based detection

    The methods described by Oliphant et al. (9), Banér et al. (10) and Hardenbol et al. (11) employ multiplexed ligation-dependent probe amplification combined with bead arrays (9), standard tag oligonucleotide microarrays (10) and in situ synthesized oligonucleotide DNA chips (11), respectively. An advantage of these detection platforms is that highly multiplexed ligation assays can be performed and detected on a single chip containing generic tag probes. These platforms are therefore attractive for applications requiring detection of hundreds to thousands of SNPs per sample. However, the flexibility of these methods is limited to modulation of the composition of ligation probe mixtures and corresponding tags included on the (solid) support surface. In addition, for applications involving fewer markers per sample, these technologies are less suited, due to the relatively high costs associated with (commercial) DNA chips. Medium-throughput detection platforms, such as the Luminex Lab Map system based on bead hybridization in combination with detection by flow cytometry (23–25), provides an alternative in these cases. Contrary to this, the SNPWave technology is also flexible at the amplification step, based on selective amplification. This additional flexibility allows the use of a standardized (highly multiplexed) ligation mixture for various applications, as illustrated in this study for whole-genome screening and fine mapping in Arabidopsis. The SNPWave technology can be adapted for hybridization-based detection by using hybridization tags instead of length stuffer sequences. Hence, SNPWave compares favorably with these technologies for applications requiring low or medium numbers of SNP data points per sample (up to several hundreds) and/or when flexible amplification of subsets of target sequences is important, but is less suited for genotyping thousands of SNPs per sample.

    Mass spectrometry-based detection

    The SNPWave technology can also be adapted for mass spectrometric detection methods such as matrix-assisted laser desorption/ionization time-of-flight . This requires two modifications in the design of the ligation probes: first, the stuffer sequences incorporated in the ligation probes must be selected such that each stuffer has a unique mass rather than a unique length. A collection of such mass stuffer sequences can be assembled by calculating the masses of all possible stuffer sequences of a given length based on their sequence, and selecting a subset with non-overlapping masses at a chosen mass resolution. Secondly, a cleavable moiety must be introduced in the amplification product to bring the detected fragments within the optimal mass range for MALDI-TOF detection of up to 10 000 Da. This at least includes, but is ideally limited to, the mass stuffer sequences. This ‘MassWave’ approach (30) addresses a major limitation of currently used methods for MALDI-TOF-based SNP detection employing primer extension-directed allele discrimination (27–29), namely that higher multiplexing levels are difficult to accomplish because primer extension must be preceded by PCR amplification of the individual target loci. This introduced the long-known problem of multiplex PCR and associated robustness issues. In contrast and as mentioned earlier, multiplexed ligation-based allele discrimination can be followed by amplification with a single (selective) primer pair, paving the way to high multiplexing levels. Combining multiplexed ligation-based PCR with selective amplification of SNPWave thus enables mass spectrometric detection of target (SNP) sequences in a fully ‘designed’ fashion with respect to both multiplexing levels and mass resolution, under uniform reaction conditions. We expect that this will reduce the cost per data point for this platform and will provide those who have a preference for this platform with an attractive alternative to SNPWave for high-volume screening (i.e. many samples, moderate number of SNPs).

    Conclusions

    We have presented SNPWave, a flexible SNP genotyping technology based on multiplexed ligation followed by amplification with a single generically applicable (selective) primer pair. SNPWave incorporates the known robustness of allele discrimination by OLA (12), multiplexing at every step after DNA isolation, low probe development costs and highly efficient detection on a widely used detection platform (31). Advantages of highly multiplexed ligation followed by selective amplification are savings on genomic DNA and on (labor) costs of ligation reactions, which are difficult to achieve otherwise in the case of length-based detection.We anticipate that the main applications of the SNPWave technology will be in the market segment defined by moderate numbers of SNPs (up to several hundreds) and medium to high number of samples. This includes both (human) diagnostic analyses and agricultural applications such as genetic mapping, genetic diversity analysis and marker-assisted breeding, in a wide variety of species including plants, mammals and microorganisms. With respect to target sequences, applications of the SNPWave technology are not limited to SNPs per se, but may also include detection of non-polymorphic sequences (introgression segments, transgenes, pathogens) and/or selected combinations of transcripts for diagnostic prediction of complex traits. At present, 138 240 SNPs can be scored within 24 h on a single MegaBACE 1000 with 96 capillaries, using 10-plex amplifications, four fluorescent dyes and 36 short runs with a flanking sizing standard.

    SUPPLEMENTARY MATERIAL

    ACKNOWLEDGEMENTS

    The authors thank Professor M. Koornneef and Dr L. Bentsink for helpful discussions and kindly providing Arabidopsis samples, Dr P. Feldmann (Devgen, Ghent-Zwijnaarde, Belgium) and the BioSeeds companies for kindly providing C.elegans and tomato DNA samples, respectively, Nathalie van Orsouw and Marc ten Holte for skilful MegaBACE analyses, Jerina Pot for graphical assistance, and Robbert-Jan de Lang for critical reading of the manuscript. The AFLP? and SNPWaveTM technologies are covered by patents and patent applications owned by Keygene NV. AFLP is a registered trademark of Keygene NV. An application for trademark registration for SNPWave has been filed by Keygene NV. MegaBACE is a trademark of Amersham BioSciences.

    REFERENCES

    Sachidanandam,R., Weismann,D., Schmidt,S.C., Kakol,J.M., Stein,L.D., Marth,G., Sherry,S., Mullikin,J.C., Mortimore,B.J., Willey,D.L. et al. (2001) A map of human genome sequence variation containing 1,42 million single nucleotide polymorphisms. Nature, 409, 928–933.

    Lindblad-Toh,K., Winchester,E., Daly,M.J., Wang,D.G., Hirschhorn,J.N., Laviolette,J.P., Ardlie,K., Reich,D.E., Robinson,E., Sklar,P. et al. (2000) Large-scale discovery of single-nucleotide polymorphisms in the mouse. Nature Genet., 24, 381–386.

    Cho,R., Mindrinos,M., Richards,D.R., Sapolsky,R.J. anderson,M., Drenkard,E., Dewdney,J., Reuber,T.L., Stammers,M., Federspiel,N. et al. (1999) Genome-wide mapping with bi-allelic markers in Arabidopsis thaliana. Nature Genet., 23, 203–207.

    Smigielski,E.M., Sirotkin,K., Ward,M. and Sherry,S.T. (2000) dbSNP; a database of single nucleotide polymorphisms. Nucleic Acids Res., 28, 352–355.

    Syv?nen,A.C. (2001) Accessing genetics variation: genotyping single nucleotide polymorphisms. Nature Rev. Genet., 2, 930–942.

    Kwok,P.Y. (2001) Methods for genotyping single nucleotide polymorphisms. Annu. Rev. Genomics Hum. Genet., 2, 235–258.

    Twyman,R.M. and Primrose,S.B. (2003) Techniques patent for SNP genotyping. Pharmacogenomics, 4, 67–79.

    Schouten,J.P., McElgunn,C.J., Waaijer,R., Zwijnenburg,D., Diepvens,F. and Pals,G. (2002) Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification. Nucleic Acids Res., 30, e57.

    Oliphant,A., Barker,D.L., Stuelpnagel,J.R. and Chee,M.S. (2002) BeadArrayTM technology: enabling an accurate, cost-effective approach to high-throughput genotyping. Biotechniques, 32, S56–S61.

    Banér,J., Isaksson,A., Waldenstr?m,E., Jarvius,J., Landegren,U. and Nilsson,M. (2003) Parallel gene analysis with allele-specific padlock probes and tag microarrays. Nucleic Acids Res., 31, e103.

    Hardenbol,P., Banér,J., Maneesh,J., Nilsson,M., Namsaraev,E.A., Karlin-Neumann,G.A., Fakhrai-Rad,H., Ronaghi,M., Willis,T.D., Landegren,U. et al. (2003) Multiplexed genotyping with sequence-tagged molecular inversion probes. Nat. Biotechnol., 21, 673–678.

    Landegren,U., Kaiser,R., Sanders,J. and Hood L. (1988) A ligase-mediated gene detection technique. Science, 241, 1077–1080.

    Nilsson,M., Malmgren,H., Samiotaki,M., Kwiatkowski,M., Chowdhary,B.P. and Landegren,U. (1994) Padlock probes: circularizing oligonucleotides for localized DNA detection. Science, 265, 2085–2088.

    Nilsson,M., Krejci,K., Koch,J., Kwiatkowski,M., Gustavsson,P. and Landegren,U. (1997) Padlock probes reveal single-nucleotide differences, parent of origin and in situ distribution of centromeric sequences in human chromosomes 13 and 21. Nature Genet., 16, 252–255.

    Banér,J., Nilsson,M., Isaksson,A., Mendel-Hartvig,M., Antson,D.-O. and Landegren,U. (2001) More keys to padlock probes: mechanism for high-throughput nucleic acid detection. Curr. Opin. Biotechnol., 12, 11–15.

    Nilsson,M., Banér,J., Mendel-Hartvig,M., Dahl,F., Antson,D.-O., Gullberg,M. and Landegren,U. (2002) Making ends meet in genetic analysis using padlock probes. Hum. Mutat., 19, 410–415.

    Zabeau,M. and Vos,P. (1993) Selective restriction fragment amplification; a general method for DNA fingerprinting. EP 0534858-A1, B1; US patent 6045994.

    Vos,P., Hogers,R., Bleeker,M., Reijans,M., van de Lee,T., Hornes,M., Frijters,A., Pot,J., Peleman,J., Kuiper,M. et al. (1995) AFLP: a new technique for DNA fingerprinting. Nucleic Acids Res., 21, 4407–4414.

    Stuart,C.N., Jr and Via,L.E. (1993) A rapid CTAB DNA isolation technique useful for RAPD fingeprinting and other PCR applications. Biotechniques, 14, 748–750.

    Dempster,A.P., Laird,N.M. and Rubin,D.B. (1977) Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B, 39, 1–38.

    Stam,P. and van Ooijen,J.W. (1995) Joinmap? Version 2.0: Software for the Calculation of Genetic Linkage Maps. CPRO-DLO, Wageningen.

    Jansen,R.C., Geerlings,H., van Oeveren,A.J. and van Schaik,R.C. (2000) A comment on codominant scoring of AFLP markers. Genetics, 155, 1459–68.

    Chen,J., Ianonne,M.A., Li,M.-S., Taylor,J.D., Rivers,P., Nelsen,A.J., Slentz-Kesler,K.A., Roses,A. and Weiner,P. (2000) A microsphere-based assay for multiplexed single nucleotide polymorphism analysis using single base chain extension. Genome Res., 10, 549–557.

    Ianonne,M.A., Taylor,J.D., Chen,J., Li,M.-S., Rivers,P., Slentz-Kesler,K.A. and Wigler,M.P. (2000) Multiplexed single nucleotide polymorphism genotyping by olignucleotide ligation and flow cytometry. Cytometry, 39, 131–140.

    Ye,F., Li,M.-S., Taylor,J.D.,Nguyen,Q., Colton,H.M., Casey,W.M., Wagner,M., Weiner,M.P. and Chen,J. (2001) Fluorescent microsphere-based readout technology for multiplexed human single nucleotide polymorphism analysis and bacterial identification. Hum. Mutat., 17, 305–316.

    Karas,M. and Hillenkamp,F. (1988) Laser desorption ionization of proteins with molecular weights exceeding 10000 Daltons. Anal. Chem., 60, 2299–2301.

    Laken,S.J., Jackson,P.E., Kinzler,K.W., Vogelstein,B., Strickland,P.T., Groopman,J.D. and Friesen,M.D. (1998) Genotyping by mass spectrometric analysis of short DNA fragments. Nat. Biotechnol., 16, 1352–1356.

    Buetow,K.H., Edmonson,M., MacDonald,R., Clifford,R., Yip,P., Kelley,J., Little,D.P., Strausberg,R., Koester,H., Cantor,C.R. et al. (2001) High-throughput development and characterization of a genome-wide collection of gene-based single nucleotide polymorphism markers by chip-based matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Proc. Natl Acad. Sci. USA, 98, 581–584.

    Bray,S.M., Boerwinkle,E. and Doris,P.A. (2001) High-throughput multiplex SNP genotyping with MALDI-TOF mass spectrometry: practice, problems and promise. Hum. Mutat., 17, 296–307.

    van Eijk,M.J.T. and van Schaik,C. (2003) Discrimination and detection of target nucleotide sequences using mass spectrometry. PCT WO 03/060163.

    van Eijk,M.J.T. and Hogers,R.C.J. (2003) High throughput analysis and detection of multiple target sequences. PCT WO 03/052140, WO 03/052141, WO 03/052142.(Michiel J. T. van Eijk*, José L. N. Broe)