Sequencing of Hypertrophic and Dilated Cardiomyopathy Genes
Sequencing of Hypertrophic and Dilated Cardiomyopathy Genes
This study comprises an initial pilot study followed by additional experiments in which a total of 63 patients were included. All patients have a confirmed diagnosis of HCM or DCM according to international criteria and were identified at or referred to the Department of Clinical Genetics, Academic Medical Center (AMC), Amsterdam, for screening of cardiomyopathy related genes offered at the Department of DNA diagnostics, AMC, the Netherlands.
For an initial pilot we included five HCM patients with a confirmed pathogenic mutation in either the MYBPC3 or MYH7 gene. In an additional experiment we included nine probands who were diagnosed with HCM but had no pathogenic mutation in eight HCM genes (MYBPC3, MYH7, MYL2, MYL3, TNNI3, TNNT2, TPM1, and GLA) screened in our laboratory by Sanger sequencing. Furthermore, we also included 19 probands diagnosed with DCM. However, DCM patients were not routinely screened for all these genes, in contrast to HCM patients. Finally, we included 30 cardiomyopathy index patients who were registered for routine DNA diagnostics for all HCM genes. These patients were used to validate the entire procedure with the intention to implement the procedure for diagnostics. Informed consent was obtained from all the patients.
We designed four custom 385K oligonucleotide microarrays according the Nimblegen Rebal algorithm to select unique probes (Roche NimbleGen). In the first design we targeted all exons, including 100 bp of the flanking upstream and downstream intronic sequence, of 18 genes with known involvement in HCM and/or DCM. The design comprises 236 exons, targeting 92.5 kb (Table 1). In the second design we rebalanced the capture probes by the addition of replicate probes at regions with low coverage in order to obtain a more evenly distributed exon coverage. For regions with a coverage between 75% and 61% of the mean coverage we added one additional probe, between 60% and 51% we added two additional probes, between 50% and 41% we added three additional probes, between 40% and 31% we added four additional probes, between 30% and 21% we added five additional probes, and for ≤20% we added six additional probes. In the third design we fine-tuned the balancing and added five more genes, making the total number of genes 23, containing 292 exons and targeting 117.5 kb. These 23 genes were selected because they were already used in diagnostics supplemented with candidate genes that were selected based on published evidence, with a focus on mutation detection rate. In the fourth design we rebalanced the last five genes that were added (Table 1).
A schematic overview of the different versions of the arrays used, the hybridisation protocol used, the array design used for a patient group and its improved performance as represented in coverage statistics is provided in online supplementary figure S1.
DNA was isolated from peripheral blood leucocytes using an automated DNA isolator (Gentra). Then 0.5–5 μg DNA was fragmented according to the manufacturer's instructions (Covaris). DNA quality was assessed by running the samples on a DNA7500 chip on the Bioanalyzer (Agilent). Each sample was bar coded by ligation of GSMID-adaptors or RL-multiplex identifiers (MID) adapters using standard Roche protocols. Libraries were amplified by linker mediated PCR to obtain sufficient amounts of starting material for the sequence capture; 3 μg of amplified library was loaded onto the array according to manufacturer's instructions (http://www.nimblegen.com/products/lit/SeqCap_UserGuide_Titanium_Delivery_v1p1.pdf). For on-array multiplexed samples, 5–10 amplified libraries were mixed equimolarly and then 3 μg of the mixture was loaded onto the array. For multiplex experiments an equimolar pool of bar code specific blocker oligo's was added in the same concentration as used for non-multiplexed hybridisations.
Hybridisation, post-hybridisation washes and elution of the enriched sample was performed according to the manufacturer's instructions (http://www.nimblegen.com/products/lit/SeqCap_UserGuide_Titanium_Delivery_v1p1.pdf). To increase target enrichment, the enriched samples were hybridised a second time on the same array used the first time. This second hybridisation reduces background sequences (off-target sequences) that are still present after the first hybridisation. It is the reduction of background that results in higher on-target percentages. To verify successful hybridisation capture, we performed quantitative PCR (qPCR) on four control loci before and after array enrichment. The relative fold enrichment was calculated using the values of delta crossing point (CP) (ie, the difference between average CP of non-captured and average CP of captured samples) according E where E is the efficiency of the qPCR assay for a particular amplicon.
The enriched library was diluted, annealed to capture beads, and clonally amplified by emulsion PCR. After emulsion PCR, beads with clonal amplicons were enriched and deposited on a picotitre plate and sequenced on the GS-FLX Titanium.
The obtained sequence reads were mapped against the human reference genome (hg19) with the Roche GS Reference Mapper (V.2.6) using the default software settings. Output was restricted to the targeted regions as defined by the sequence capture design. Coverage statistics were extracted from the mapping output files using custom scripts. Variants were automatically detected during mapping and annotated with known gene (refSeq genes from http://genome.ucsc.edu/) and single nucleotide polymorphism (SNP) information (dbSNP130 from http://genome.ucsc.edu/). They were denoted as high quality differences (HCDiffs) when the variation was present in at least three non-duplicate reads that included at least one forward and one reverse read, or when it was seen in at least five reads with quality scores over 20. Variants that did not meet the above criteria were collected in the AllDiff files. Occasionally, known variants end up in the AllDiff table but not in the HCDiff table. A custom script was written for the comparison of the HCDiff and AllDiff variants and selection of variants with a variant percentage ≥20%.
The script also identifies, at the single base resolution, regions with a coverage lower than 16×. These regions are additionally analysed by Sanger sequencing. This threshold has been reported by Hoischen et al to be sufficient for diagnostic testing. We also calculated the minimal number of reads needed statistically. For the statistics we have used the following criteria:
All individual chances that a variant is missed at a given coverage is calculated in R using:
x—seq(300) and c—pbinom((x*0.2),x,0.5).
With these criteria we calculated a 99% sensitivity (comparable to a Phred quality score of 20) at 16×. The frequency of each individual coverage given a mean coverage±SD is calculated in R using y <- pnorm(x, mean coverage, SD) with x <- seq(300).
Then the chance that a variant is missed in an experiment with mean coverage±SD is calculated as: sum of (chance variant missed at given coverage)×(frequency at given coverage). Since we use a 16× threshold, the chance that a variant is missed is calculated for the 16× to 300× coverage interval. At a coverage of 100±35× this chance is 4.03E-5 (0.004%). An example of the calculations is given in online supplementary figure 2.
The following criteria were used to classify variations/mutations. We use a list of mutation specific features based on in silico analysis using the mutation interpretation software AlaMut (V.1.5). A score is given depending on the outcome of a prediction test for each feature (ie, Grantham distance). Then, depending on the total score and the availability of the variant in at least 300 ethnically matched control alleles (data obtained from the literature and/or the internet, eg, 1000 Genomes: http://browser.1000genomes.org/index.html; Exome Variant Server: http://evs.gs.washington.edu/EVS, or from own control alleles), we classified them as: not pathogenic; as a variant of unknown clinical significance; VUS1, unlikely to be pathogenic; VUS2, uncertain; or VUS3, likely to be pathogenic. Family information (co-segregation), phenotypic features and/or functional analysis are needed to classify a variant as (putatively) pathogenic. The protocol used for the classification of variants has recently been published by van Spaendonck-Zwarts et al All variants of interest identified in our HCM and DCM group were confirmed by Sanger sequencing.
Methods
Subjects and Clinical Evaluation
This study comprises an initial pilot study followed by additional experiments in which a total of 63 patients were included. All patients have a confirmed diagnosis of HCM or DCM according to international criteria and were identified at or referred to the Department of Clinical Genetics, Academic Medical Center (AMC), Amsterdam, for screening of cardiomyopathy related genes offered at the Department of DNA diagnostics, AMC, the Netherlands.
For an initial pilot we included five HCM patients with a confirmed pathogenic mutation in either the MYBPC3 or MYH7 gene. In an additional experiment we included nine probands who were diagnosed with HCM but had no pathogenic mutation in eight HCM genes (MYBPC3, MYH7, MYL2, MYL3, TNNI3, TNNT2, TPM1, and GLA) screened in our laboratory by Sanger sequencing. Furthermore, we also included 19 probands diagnosed with DCM. However, DCM patients were not routinely screened for all these genes, in contrast to HCM patients. Finally, we included 30 cardiomyopathy index patients who were registered for routine DNA diagnostics for all HCM genes. These patients were used to validate the entire procedure with the intention to implement the procedure for diagnostics. Informed consent was obtained from all the patients.
Array Design
We designed four custom 385K oligonucleotide microarrays according the Nimblegen Rebal algorithm to select unique probes (Roche NimbleGen). In the first design we targeted all exons, including 100 bp of the flanking upstream and downstream intronic sequence, of 18 genes with known involvement in HCM and/or DCM. The design comprises 236 exons, targeting 92.5 kb (Table 1). In the second design we rebalanced the capture probes by the addition of replicate probes at regions with low coverage in order to obtain a more evenly distributed exon coverage. For regions with a coverage between 75% and 61% of the mean coverage we added one additional probe, between 60% and 51% we added two additional probes, between 50% and 41% we added three additional probes, between 40% and 31% we added four additional probes, between 30% and 21% we added five additional probes, and for ≤20% we added six additional probes. In the third design we fine-tuned the balancing and added five more genes, making the total number of genes 23, containing 292 exons and targeting 117.5 kb. These 23 genes were selected because they were already used in diagnostics supplemented with candidate genes that were selected based on published evidence, with a focus on mutation detection rate. In the fourth design we rebalanced the last five genes that were added (Table 1).
A schematic overview of the different versions of the arrays used, the hybridisation protocol used, the array design used for a patient group and its improved performance as represented in coverage statistics is provided in online supplementary figure S1.
Sample Preparation
DNA was isolated from peripheral blood leucocytes using an automated DNA isolator (Gentra). Then 0.5–5 μg DNA was fragmented according to the manufacturer's instructions (Covaris). DNA quality was assessed by running the samples on a DNA7500 chip on the Bioanalyzer (Agilent). Each sample was bar coded by ligation of GSMID-adaptors or RL-multiplex identifiers (MID) adapters using standard Roche protocols. Libraries were amplified by linker mediated PCR to obtain sufficient amounts of starting material for the sequence capture; 3 μg of amplified library was loaded onto the array according to manufacturer's instructions (http://www.nimblegen.com/products/lit/SeqCap_UserGuide_Titanium_Delivery_v1p1.pdf). For on-array multiplexed samples, 5–10 amplified libraries were mixed equimolarly and then 3 μg of the mixture was loaded onto the array. For multiplex experiments an equimolar pool of bar code specific blocker oligo's was added in the same concentration as used for non-multiplexed hybridisations.
Target Enrichment and Sequencing
Hybridisation, post-hybridisation washes and elution of the enriched sample was performed according to the manufacturer's instructions (http://www.nimblegen.com/products/lit/SeqCap_UserGuide_Titanium_Delivery_v1p1.pdf). To increase target enrichment, the enriched samples were hybridised a second time on the same array used the first time. This second hybridisation reduces background sequences (off-target sequences) that are still present after the first hybridisation. It is the reduction of background that results in higher on-target percentages. To verify successful hybridisation capture, we performed quantitative PCR (qPCR) on four control loci before and after array enrichment. The relative fold enrichment was calculated using the values of delta crossing point (CP) (ie, the difference between average CP of non-captured and average CP of captured samples) according E where E is the efficiency of the qPCR assay for a particular amplicon.
The enriched library was diluted, annealed to capture beads, and clonally amplified by emulsion PCR. After emulsion PCR, beads with clonal amplicons were enriched and deposited on a picotitre plate and sequenced on the GS-FLX Titanium.
Mapping, Variant Detection and Classification
The obtained sequence reads were mapped against the human reference genome (hg19) with the Roche GS Reference Mapper (V.2.6) using the default software settings. Output was restricted to the targeted regions as defined by the sequence capture design. Coverage statistics were extracted from the mapping output files using custom scripts. Variants were automatically detected during mapping and annotated with known gene (refSeq genes from http://genome.ucsc.edu/) and single nucleotide polymorphism (SNP) information (dbSNP130 from http://genome.ucsc.edu/). They were denoted as high quality differences (HCDiffs) when the variation was present in at least three non-duplicate reads that included at least one forward and one reverse read, or when it was seen in at least five reads with quality scores over 20. Variants that did not meet the above criteria were collected in the AllDiff files. Occasionally, known variants end up in the AllDiff table but not in the HCDiff table. A custom script was written for the comparison of the HCDiff and AllDiff variants and selection of variants with a variant percentage ≥20%.
The script also identifies, at the single base resolution, regions with a coverage lower than 16×. These regions are additionally analysed by Sanger sequencing. This threshold has been reported by Hoischen et al to be sufficient for diagnostic testing. We also calculated the minimal number of reads needed statistically. For the statistics we have used the following criteria:
for a heterozygous variant the allele frequency is 50%
a variant is reported when the variant percentage is ≥20%.
All individual chances that a variant is missed at a given coverage is calculated in R using:
x—seq(300) and c—pbinom((x*0.2),x,0.5).
With these criteria we calculated a 99% sensitivity (comparable to a Phred quality score of 20) at 16×. The frequency of each individual coverage given a mean coverage±SD is calculated in R using y <- pnorm(x, mean coverage, SD) with x <- seq(300).
Then the chance that a variant is missed in an experiment with mean coverage±SD is calculated as: sum of (chance variant missed at given coverage)×(frequency at given coverage). Since we use a 16× threshold, the chance that a variant is missed is calculated for the 16× to 300× coverage interval. At a coverage of 100±35× this chance is 4.03E-5 (0.004%). An example of the calculations is given in online supplementary figure 2.
The following criteria were used to classify variations/mutations. We use a list of mutation specific features based on in silico analysis using the mutation interpretation software AlaMut (V.1.5). A score is given depending on the outcome of a prediction test for each feature (ie, Grantham distance). Then, depending on the total score and the availability of the variant in at least 300 ethnically matched control alleles (data obtained from the literature and/or the internet, eg, 1000 Genomes: http://browser.1000genomes.org/index.html; Exome Variant Server: http://evs.gs.washington.edu/EVS, or from own control alleles), we classified them as: not pathogenic; as a variant of unknown clinical significance; VUS1, unlikely to be pathogenic; VUS2, uncertain; or VUS3, likely to be pathogenic. Family information (co-segregation), phenotypic features and/or functional analysis are needed to classify a variant as (putatively) pathogenic. The protocol used for the classification of variants has recently been published by van Spaendonck-Zwarts et al All variants of interest identified in our HCM and DCM group were confirmed by Sanger sequencing.
Source...