Advantages of a next generation sequencing targeted approach for the molecular diagnosis of retinoblastoma

Background Retinoblastoma (RB) is the most common malignant childhood tumor of the eye and results from inactivation of both alleles of the RB1 gene. Nowadays RB genetic diagnosis requires classical chromosome investigations, Multiplex Ligation-dependent Probe Amplification analysis (MLPA) and Sanger sequencing. Nevertheless, these techniques show some limitations. We report our experience on a cohort of RB patients using a combined approach of Next-Generation Sequencing (NGS) and RB1 custom array-Comparative Genomic Hybridization (aCGH). Methods A total of 65 patients with retinoblastoma were studied: 29 cases of bilateral RB and 36 cases of unilateral RB. All patients were previously tested with conventional cytogenetics and MLPA techniques. Fifty-three samples were then analysed using NGS. Eleven cases were analysed by RB1 custom aCGH. One last case was studied only by classic cytogenetics. Finally, it has been tested, in a lab sensitivity assay, the capability of NGS to detect artificial mosaicism series in previously recognized samples prepared at 3 different mosaicism frequencies: 10, 5, 1 %. Results Of the 29 cases of bilateral RB, 28 resulted positive (96.5 %) to the genetic investigation: 22 point mutations and 6 genomic rearrangements (four intragenic and two macrodeletion). A novel germline intragenic duplication, from exon18 to exon 23, was identified in a proband with bilateral RB. Of the 36 available cases of unilateral RB, 8 patients resulted positive (22 %) to the genetic investigation: 3 patients showed point mutations while 5 carried large deletion. Finally, we successfully validated, in a lab sensitivity assay, the capability of NGS to accurately measure level of artificial mosaicism down to 1 %. Conclusions NGS and RB1-custom aCGH have demonstrated to be an effective combined approach in order to optimize the overall diagnostic procedures of RB. Custom aCGH is able to accurately detect genomic rearrangements allowing the characterization of their extension. NGS is extremely accurate in detecting single nucleotide variants, relatively simple to perform, cost savings and efficient and has confirmed a high sensitivity and accuracy in identifying low levels of artificial mosaicisms.


Background
Retinoblastoma (RB, OMIM:180,200) is the most common malignant childhood tumor of the eye with an estimated incidence between 1 in 16,000 and 1 in 18,000 live births [1,2]. RB is the first disease for which a genetic etiology of cancer has been described [3] being caused by mutations in the first tumor suppressor gene identified (RB1, Genbank accession # L11910). Mutations in both alleles of the RB1 gene are required for the development of this neoplasm [4], and, depending on the germ-line or somatic origin of the defect, a heritable or sporadic form can be distinguished. RB is unilateral in 60 % of cases and only 15 % of these are heritable [5]; in contrast, 40 % of retinoblastomas are bilateral with risk of transmission to the offspring. Heritable retinoblastoma constitutes a cancer predisposition syndrome [6]. RB1 is located on chromosome 13 at band q14 and can be affected by a heterogeneous spectrum of genetic abnormalities, including chromosome translocation/deletion, genomic rearrangements, ranging from whole gene microdeletion to intragenic exons loss or duplication, and more than 900 different point mutations [7]. Mutational analysis is performed to search for the predisposing RB1 gene mutation in peripheral blood of patients with RB, but the molecular diagnosis requires several technical approaches to cover the entire field of oncogenic RB1 defects, frequently resulting in numerous, expensive and time consuming procedures. In particular, cytogenetic tools, such as classical chromosome investigations and Fluorescent In Situ Hybridization (FISH), in addition to Multiplex Ligation-dependent Probe Amplification (MLPA) technique, may account for detection of about 16 % of RB1 abnormalities [8], while the remaining large amount of point mutations need to be investigated using sequencing analysis. Since the 1970s, Sanger sequencing has been recognized as the gold standard for mutation analysis in molecular diagnostics; however, its low-throughput, long turnaround time and overall cost [9] have called for new paradigms. Next Generation Sequencing (NGS) can massively sequence millions of DNA segments, promising low costs, increased workflow speed and enhanced sensitivity in mutation detection [9][10][11]. On the other hand, conventional and molecular cytogenetic analysis, have been replaced by modern high-throughput investigations, such as array Comparative Genomic Hybridization (aCGH), that can reveal and measure cryptic genomic imbalances. In addition, aCGH can be focused on specific DNA segments or genes maximizing the resolution via a customized process. Based on these observations, we have recruited a cohort of retinoblastoma patients we previously investigated with conventional cytogenetics and MLPA. Patients diagnosed with RB but negative to the above standard screening have been tested with NGS to assess its ability in identifying RB causative mutations. On the other hand, patients positive to standard screening have been further investigated with RB1-custom array CGH analysis to characterize the genomic rearrangements with a better resolution compared to the conventional techniques.

Patient recruitment
In this study we enrolled 65 patients affected by RB from the Department of Pediatric Hematology-Oncology and Stem Cell Transplantation of the Bambino Gesù Children's Hospital in Rome. The study was approved by Ethical committee scientific board of Bambino Gesù Children's Hospital and was conducted in accordance with the Helsinki Declaration. Blood samples were drawn from 64 patients after obtaining written informed consent from parents/guardians of affected children. Genomic DNA was extracted from peripheral blood with Qiagen columns (QIAamp DNA minikit; Qiagen, Hilden, Germany) according to the manufacturer's instructions. Concentration and purity of DNA samples were quantified by ND-1000 spectrophotometer (NanoDrop; Thermo Scientific, Waltham, MA, USA). DNA samples were used either for NGS or aCGH technique. All 65 patients were previously tested with conventional cytogenetics and MLPA techniques. Fifty-three patients, resulted negative to the first screening, underwent molecular investigation. Eleven patients, where defects ranging from macroscopic deletions to intragenic rearrangements have been identified during the first study, were further characterized by RB1 custom aCGH. Among these, one patient, positive to MLPA analysis resulted negative to aCGH. This patient was then further investigated by single exon conventional Sanger sequencing. As last, one more patient, positive to the cytogenetic analysis could not be further studied by aCGH as no DNA was available at the time of the test (Table 1).

Targeted re-sequencing
Targeted resequencing was performed with a uniquely customized design: TruSeq® Custom Amplicon (Illumina, San Diego, CA) using the MiSeq® sequencing platform (Illumina). TruSeq Custom Amplicon (TSCA) is a fully integrated end-to-end amplicon sequencing solution, including online probe design and ordering through the Illumina website, assay, sequencing, automated data analysis and offline software for reviewing results. Online probe design was performed by entering into the Design Studio (DS) software (Illumina) the target genomic regions [12]. DesignStudio is a personalized, easy-to-use, webbased sequencing assay design tool that enables to move from project initiation to design, review, and ordering. DesignStudio provides dynamic feedback to optimize target region coverage, reducing the time required to design custom projects. Once the design is completed, a list of amplicons (short regions of DNA covering the full target region) is visualized and their quality is assessed on the basis of the predicted amplicon score provided by DS. The amplicon score is an estimate of the relative performance of a particular amplicon compared to all others in the pool. DesignStudio returns only candidate amplicons that are predicted to work well in the multiplex TruSeq Custom Amplicon assay. TSCA kit produces the required targeted amplicons with the necessary adapters and indices for sequencing on the MiSeq® system without any additional processing. Library preparation and sequencing runs have been performed according to the manufacturer's procedure. Two different TSCA panel designs have been generated to investigate the same regions of interest for RB1 gene: promoter, all coding regions, exon-intron boundaries, 5′UTR and 3′UTR of RB. A first panel of 43 amplicons, each of 250 bp was designed, with a total length of 5045 bp (Panel A). The total coverage obtained by DS across the entire region of interest was 97 % with amplicons showing scores in the range of 60-98 %. Amplicons with a score lower than 60 % were excluded from the TSCA panel (3 % of the entire region of interest). A second panel was designed with amplicons of 425 bp in length for a total of 36 amplicons (Panel B). In this case, the predicted coverage of the full region of interest was 100 % with amplicons showing scores in the range of 60-98 %. Of the 53 patients studied with NGS, 48 patients were analyzed using panel A while 5 patients were analyzed using Panel B.

Mosaicism detection rate assessment
To test the detection rate for mosaic mutations using the MiSeq, three different types of previously recognized mutations of RB patients, a substitution, an insertion and a double deletion, were diluted at different concentrations. DNA from normal individuals was mixed with the mutated DNA to obtain a final dilution of 10, 5 and 1 %. For this test all libraries were prepared using the TSCA Panel B. To compare the most appropriate protocol in terms of coverage required to discriminate a certain mosaicism frequency, these samples were sequenced at two different coverage levels: low coverage (600x) and high coverage (9000x).

Data analysis
The MiSeq® system provides fully integrated on-instrument data-analysis software. The MiSeq Reporter software performs secondary analysis on the base calls and quality scores generated by Real Time Analysis (RTA) during the sequencing run. The type of analysis performed is based on the analysis workflow selected. The TruSeq Amplicon workflow evaluates short regions of amplified DNA, or amplicons, for variants. The TruSeq Amplicon workflow performs demultiplexing of indexed reads, generates FASTQ files, aligns reads to a reference, identifies variants, and writes output files to the Alignment folder. SNPs and short indels are identified using the Genome Analysis Toolkit (GATK). GATK calls raw variants for each sample, analyzes variants against known variants, and then calculates a false discovery rate for each variant. Each single variant has been evaluated for the coverage and the Qscore, and visualized via Amplicon Viewer (AV) and Integrative Genome Viewer (IGV) software [13,14]. The Qscore is the prediction of the probability of an erroneous base call, in particular, a value of Q30 represents the probability to call an erroneous base out of 1000, reflecting an accuracy of the sequenced base of 99.9 %. All detected variants have been filtered based on their Qscore: only variants showing Qscore > 30 have been considered in this study. Coverage for a defined amplicon is the average number of sequencing reads representing a given nucleotide in that amplicon. All mutations identified by Miseq Reporter were validated by Sanger sequencing using standard protocols.

RB1 custom array CGH
Array-CGH was carried out using a 60-mer oligonucleotide-based microarray platform that allows molecular profiling of genomic aberrations with an overall median probe spatial resolution of 41 kb (60 K) (Agilent Technologies Array-CGH Kits, Santa Clara, CA) with an increased resolution of 1000 times in the customized region (88 bp median overall probe spacing) containing RB1. The design of the custom array slide was made using the Agilent website dedicated to this purpose [15]. In order to customize RB1, i.e., to get the maximum probe coverage of all the exonic and intronic

Results
Of the 65 patients, 64 were investigated either with NGS or aCGH. Fifty-three patients were analyzed with NGS: 22 were diagnosed with bilateral RB (BRB), while 31 with the unilateral form (URB). Indeed, 11 patients were studied with custom aCGH: 6 diagnosed with BRB and 5 with URB.
One last BRB patient, missing DNA for further investigation by aCGH, was analyzed by classic cytogenetics and showed a large deletion higher than 10 Mb.

NGS
Fifty-three patients were analyzed with NGS in two different sequencing runs. Sequencing data generated were evaluated on the basis of the Qscore and coverage. In the case of mosaicism experiments, variant frequency was also evaluated. As predicted by DS coverage indication, Panel A confirmed coverage of 97 % of the full target region for all 48 patients studied in this first sequencing run. Exon 2 was only partially sequenced, while exons 14 and 20 were not sequenced at all. To achieve a full coverage of the target region, the reported exons had to be investigated by conventional Sanger sequencing. In the second sequencing run, where Panel B was used, the full target region (coding regions, promoter and splicing junctions) was completely sequenced as predicted by DS (100 %). In this second case, Sanger sequencing was carried-out only to confirm previously recorded mutations. The mean coverage achieved for each sample was 1196 for Panel A and 1309 for Panel B. All detected variants showed a mean coverage of 592 and a mean Qscore of 39 (99.87 % accuracy). An example of performance of Panel B is reported in Table 2. All but one of the 22 BRB patients have been found mutated with NGS. The patient that did not show any mutation was further analyzed with conventional Sanger sequencing confirming the absence of any mutation. Of the 21 identified pathogenic mutations, one was associated with a rare case of trilateral retinoblastoma. As regards the 31 URB group, variants have been detected in 3 patients. The features and assortment of all the mutations found are summarized in Table 3.

CGH array
Eleven patients, showing genomic abnormalities, were properly characterized in length and position by RB1 custom aCGH analysis (Fig. 1). All five patients with URB showed only large deletion while in five out six BRB patients were found three small intragenic deletions, one extended intragenic duplication, unexpectedly presenting syndromic features, and one large deletion.
The sample found negative by a-CGH was further analysed by conventional Sanger sequencing focusing on the same exon recognised as deleted by MLPA. Sanger sequencing confirmed the presence of a point mutation. Genomic rearrangements and their characteristics are reported in Table 4. In conclusion, the overall number of RB patients with point mutations or genomic rearrangements identified by either NGS or aCGH was 28 out of a total of 29 BRB patients (96.5 %) and 8 out of 36 URB patients (22 %).

Mosaicism detection rate assessment
Dedicated experiments were carried out to investigate the lowest limit of the NGS method in detecting targeted mutational mosaicism rate. Results are summarized in Table 5. All variants were correctly identified at each mosaicism frequency for both sequencing runs (600x and 9000x). Patient negative to array-CGH, re-analysed by Sanger sequencing on the same exon previously identified positive by MLPA Only small differences from the expected frequency have been observed and this could be probably related to the variability associated to the handling, pipetting and preparation of the dilutions. For the 1 % mosaicism frequency, it has been evaluated the frequency of false positive calls in terms of erroneously called bases in the target site. In details, as regards all three types of variants studied, the frequency of false positive events has always been between 0 and 0.02 % for both sequencing runs. In particular, for the high coverage sequencing run, the false positive events never exceeded 0.02 %.

Availability of supporting data
The microarray and sequencing raw data are available in the ArrayExpress database (www.ebi.ac.uk/arrayexpress) under accession numbers respectively E-MTAB-3492 and E-M-TAB-3515.

Discussion
The molecular diagnosis of RB is a complex and articulate process that still represents an exciting challenge. Many resources and skills need to be involved to obtain satisfactory results. High-throughput technologies can actually offer new opportunities in relation to the amount of genes potentially analyzed, the number of samples examined and the quality of results. NGS is an innovative technology that is able to massive-parallel sequence millions of DNA segments with high definition capability. It has a wide diffusion in many fields of biomedical research, but diagnostic applications for genetic diseases are still in progress. We report our experience on a cohort of RB patients using a NGS approach on the Illumina MiSeq platform. The experiments required different timelines. The design of the target regions of RB1, carried out using DS, was performed in few hours. The preparation of the genomic library using the TSCA Illumina kit, was completed in two working days. One or two days were spent to run the samples on the MiSeq (48 samples were run all together in a first sequencing run using Panel A and the remaining 5 were run on a second experiment using Panel B). Few more days were required for results interpretation of the 53 RB patients using MiSeqReporter, AV and IGV2.3 software. Furthermore, all mutations identified by Miseq Reporter, were validated by Sanger sequencing using standard protocols. Of the two panels designed, Panel B has allowed to reach the full coverage of the target region, making the standard Sanger sequencing only a tool for confirming all detected variants. It was also calculated that the cost of NGS analysis for the entire RB1 gene, considering comparable devices cost, reagents expenses, operator's worktime, would be 7 times less than the cost of a protocol entirely based on Sanger sequencing, allowing a strong decrease in costs and a large increase in the number of samples processed for each experiment [9,17,18]. NGS has allowed identifying all variants found in patients with BRB except one sample in which the variant was identified neither by Sanger nor by NGS sequencing. In this case we can speculate that the variant may be located outside the region under investigation. In fact, literature data show that 5 % of cases with bilateral involvement may have translocations, deep intronic splice site mutations, or low-level mosaic mutations, which may or may not be germline [8]. Twenty-four mutations were identified in the patients with RB: twelve nonsense, five frameshift and seven splice site mutations. As expected, eleven out of the twenty-four mutations found were newly discovered mutations, never reported before. Among these a rare case of trilateral RB with a new frameshift mutation in exon 2 was identified, differently from the current data reporting macroscopic deletions as the most frequent defects in this unusual disease [19][20][21][22]. The nonsense mutation p.Arg787X was a known sequence variation found in the group of URB. The carrier was a female presenting, at the age of 17 months, with a left eye RB with loco-regional metastasis also involving lymphnodes and bone marrow. She was eye enucleated and treated with conventional and high-dose chemotherapy, followed by autologous bone marrow transplantation and radiotherapy. To date, she is alive and in good clinical conditions. p.Arg787X is a recurrent mutation commonly found in BRB as germline sequence variation, while in URB is more frequent as somatic mutation. Only four cases of URB carrying this germline mutation have been reported [23] including a patient with metastatic presentation [24]. These findings suggest that the phenotypic expression of p.Arg787X may reflect the variable penetrance of this defect, leading to the different pictures of the disease. Among the genomic abnormalities identified with RB1-custom aCGH method, four intragenic rearrangements and six large deletions involving genes adjacent to RB1 were revealed. Interestingly, the patients belonging to the first group had BRB, while the patients of the second group had mainly URB. These data fortify the hypothesis that deletion of genes essential for cell survival, adjacent to RB1, may cause less invasive tumors and, therefore, result in a higher frequency of unilateral disease [25,26] Mutational mosaicism is an exciting challenge regarding molecular diagnostics as well as it is important in the genetic counseling setting. Low levels of mutational mosaicism have been identified in probands with bilateral disease and in individuals with unilateral disease who have affected children inheriting the mutation [8,27]. Conventional investigations are unable to routinely detect lowrate mutated cells: currently, Sanger sequencing is able to disclose mosaicism only for rates above 20 %. Targeted mutation analysis is useful to study mosaic recurrent mutations in blood and can detect DNA variations below the limit of standard Sanger sequence analysis. This type of analysis, based on Allele Specific PCR (AS-PCR), however, investigates, only a limited number of recurrent point mutations [26]. A more recent study demonstrated that, using a deep semiconductor sequencing approach (Ion Torrent, Life Technology), the detection rate of targeted mutational mosaics can be revealed at a frequency down to 5 % [28]. In our study the capability of NGS in detecting low mosaicism frequency has been tested. Due to the absence of patients with RB1 mosaicism, three previously recognized samples, carriers of single-base substitution, single-base insertion and a complex rearrangement involving five-base and one-base double deletion respectively, were diluted with normal DNA at different concentration (10, 5, 1 %) and tested by NGS with MiSeq platform. As reported, all three mutations have been correctly detected at each different frequency for both coverage levels, independently of the variant type. When leading studies aimed at identifying low mosaicism frequencies, the major difficulty lies in accurately discriminating between a somatic variant and a false positive episode. Based on this, for all three studied mutations, it has been evaluated the frequency of false positive calls measured as the percentage of erroneously called bases at the target site. As shown, for all three types of variants studied, the frequency of false positive events has always been between 0 and 0.02 % for both sequencing runs. In particular, for the high coverage sequencing run, the false positive events never exceeded 0.02 %, far below the 1 % mosaicism variant frequency detected. This achievement, accompanied by a good coverage of the region of interest can accurately detect low mosaicism frequencies in biological samples, providing a reliable and sensitive method of screening. Validation experiments on mosaic biological samples are currently in progress.

Conclusions
NGS and RB1-custom array CGH demonstrated to be an effective association in order to optimize the overall diagnostic procedures of RB. The major advantages provided by NGS are the high performance capacity and the elevated accuracy in the data generated. Quality and quantity of the results acquired in months of traditional work, are achieved in a single experiment and this contributes to an extraordinary abatement of the global cost.
NGS has also allowed the identification of artificial mosaicism frequencies down to 1 %, providing consistent data, high accuracy and extremely low frequency of false positive events (0.02 %). The possibility to analyze hundreds of samples per experiment and to sequence different genes simultaneously makes NGS a powerful and innovative tool for a modern approach to study rare diseases.