Structural variant calling using third-generation sequencing data Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.17169/refubium-28995
· OA: W3123916363
Structural variants, commonly defined as genomic differences larger than 50 bp, are an important research target due to their large size and great impact on human phenotype and disease. Their unique properties and the weaknesses of traditional short-read sequencing technologies, however, complicate their detection and comprehensive characterization. Third-generation sequencing technologies, such as PacBio SMRT sequencing and ONT Nanopore sequencing, have the potential to resolve some of these problems through the generation of considerably longer reads. Despite their higher error rate and sequencing cost, they offer many advantages for the detection of structural variants and the complete reconstruction of personal genome sequences. Yet, available software tools for the detection of SVs from long reads and genome assemblies still do not fully exploit the possibilities. Here we present two new computational methods, SVIM and SVIM-asm, for the detection and genotype estimation of structural variants using third-generation sequencing data. The methods can be applied to long, error-prone reads or genome assemblies and distinguish six canonical classes of structural variation. We apply both tools on simulated and real sequencing datasets and demonstrate that they outperform existing methods on the detection of genotyped SVs. In the context of a larger research project, we apply SVIM for the detection of both canonical SVs and long-range novel adjacencies in a set of highly rearranged genomes. After a stringent filtering process, the final callset of long-range novel adjacencies is validated with orthogonal Hi-C data. We show the completeness and precision of the callset demonstrating its suitability for downstream analyses, such as chromosome reconstruction.