Giovanni Manzini
YOU?
Author Swipe
View article: Scalable Compression of Massive Data Collections on HPC Systems
Scalable Compression of Massive Data Collections on HPC Systems Open
View article: Compressing Suffix Trees by Path Decompositions
Compressing Suffix Trees by Path Decompositions Open
In this paper, we solve the long-standing problem of designing I/O-efficient compressed indexes. Our solution broadly consists of generalizing suffix sorting and revisiting suffix tree path compression. In classic suffix trees, path compre…
View article: Prefix-free parsing for merging big BWTs
Prefix-free parsing for merging big BWTs Open
When building Burrows-Wheeler Transforms (BWTs) of truly huge datasets, prefix-free parsing (PFP) can use an unreasonable amount of memory. In this paper we show how if a dataset can be broken down into small datasets that are not very sim…
View article: Generalization of Repetitiveness Measures for Two-Dimensional Strings
Generalization of Repetitiveness Measures for Two-Dimensional Strings Open
The problem of detecting and measuring the repetitiveness of one-dimensional strings has been extensively studied in data compression and text indexing. Our understanding of these issues has been significantly improved by the introduction …
View article: On the compressibility of large-scale source code datasets
On the compressibility of large-scale source code datasets Open
View article: Toward Greener Matrix Operations by Lossless Compressed Formats
Toward Greener Matrix Operations by Lossless Compressed Formats Open
Sparse matrix-vector multiplication (SpMV) is a fundamental operation in machine learning, scientific computing, and graph algorithms. In this paper, we investigate the space, time, and energy efficiency of SpMV using various compressed fo…
View article: Toward Greener Matrix Operations by Lossless Compressed Formats
Toward Greener Matrix Operations by Lossless Compressed Formats Open
Sparse matrix-vector multiplication (SpMV) is a fundamental operation in machine learning, scientific computing, and graph algorithms. In this paper, we investigate the space, time, and energy efficiency of SpMV using various compressed fo…
View article: Faster run-length compressed suffix arrays
Faster run-length compressed suffix arrays Open
We first review how we can store a run-length compressed suffix array (RLCSA) for a text $T$ of length $n$ over an alphabet of size $σ$ whose Burrows-Wheeler Transform (BWT) consists of $r$ runs in $O \left( \rule{0ex}{2ex} r \log (n / r) …
View article: Computing the LCP Array of a Labeled Graph
Computing the LCP Array of a Labeled Graph Open
The LCP array is an important tool in stringology, allowing to speed up pattern matching algorithms and enabling compact representations of the suffix tree. Recently, Conte et al. [DCC 2023] and Cotumaccio et al. [SPIRE 2023] extended the …
View article: Taxonomic classification with maximal exact matches in KATKA kernels and minimizer digests
Taxonomic classification with maximal exact matches in KATKA kernels and minimizer digests Open
For taxonomic classification, we are asked to index the genomes in a phylogenetic tree such that later, given a DNA read, we can quickly choose a small subtree likely to contain the genome from which that read was drawn. Although popular c…
View article: Taxonomic Classification with Maximal Exact Matches in KATKA Kernels and Minimizer Digests
Taxonomic Classification with Maximal Exact Matches in KATKA Kernels and Minimizer Digests Open
For taxonomic classification, we are asked to index the genomes in a phylogenetic tree such that later, given a DNA read, we can quickly choose a small subtree likely to contain the genome from which that read was drawn. Although popular c…
View article: The Landscape of Compressibility Measures for Two-Dimensional Data
The Landscape of Compressibility Measures for Two-Dimensional Data Open
In this paper we extend to two-dimensional data two recently introduced one-dimensional compressibility measures: the measure defined in terms of the smallest string attractor, and the measure defined in terms of the number of distinct s…
View article: A new class of string transformations for compressed text indexing
A new class of string transformations for compressed text indexing Open
View article: The landscape of compressibility measures for two-dimensional data
The landscape of compressibility measures for two-dimensional data Open
In this paper we extend to two-dimensional data two recently introduced one-dimensional compressibility measures: the $γ$ measure defined in terms of the smallest string attractor, and the $δ$ measure defined in terms of the number of dist…
View article: Computing matching statistics on Wheeler DFAs
Computing matching statistics on Wheeler DFAs Open
Matching statistics were introduced to solve the approximate string matching problem, which is a recurrent subroutine in bioinformatics applications. In 2010, Ohlebusch et al. [SPIRE 2010] proposed a time and space efficient algorithm for …
View article: Computing matching statistics on Wheeler DFAs
Computing matching statistics on Wheeler DFAs Open
Matching statistics were introduced to solve the approximate string matching problem, which is a recurrent subroutine in bioinformatics applications. In 2010, Ohlebusch et al. [SPIRE 2010] proposed a time and space efficient algorithm for …
View article: Practical Random Access to SLP-Compressed Texts
Practical Random Access to SLP-Compressed Texts Open
Grammar-based compression is a popular and powerful approach to compressing repetitive texts but until recently its relatively poor time-space trade-offs during real-life construction made it impractical for truly massive datasets such as …
View article: Teaching the Burrows-Wheeler Transform via the Positional Burrows-Wheeler Transform
Teaching the Burrows-Wheeler Transform via the Positional Burrows-Wheeler Transform Open
The Burrows-Wheeler Transform (BWT) is often taught in undergraduate courses on algorithmic bioinformatics, because it underlies the FM-index and thus important tools such as Bowtie and BWA. Its admirers consider the BWT a thing of beauty …
View article: Improving matrix-vector multiplication via lossless grammar-compressed matrices
Improving matrix-vector multiplication via lossless grammar-compressed matrices Open
As nowadays Machine Learning (ML) techniques are generating huge data collections, the problem of how to efficiently engineer their storage and operations is becoming of paramount importance. In this article we propose a new lossless compr…
View article: A New Class of String Transformations for Compressed Text Indexing
A New Class of String Transformations for Compressed Text Indexing Open
Introduced about thirty years ago in the field of Data Compression, the Burrows-Wheeler Transform (BWT) is a string transformation that, besides being a booster of the performance of memoryless compressors, plays a fundamental role in the …
View article: Improving Matrix-vector Multiplication via Lossless Grammar-Compressed Matrices
Improving Matrix-vector Multiplication via Lossless Grammar-Compressed Matrices Open
As nowadays Machine Learning (ML) techniques are generating huge data collections, the problem of how to efficiently engineer their storage and operations is becoming of paramount importance. In this article we propose a new lossless compr…
View article: Space Efficient Merging of de Bruijn Graphs and Wheeler Graphs
Space Efficient Merging of de Bruijn Graphs and Wheeler Graphs Open
The merging of succinct data structures is a well established technique for the space efficient construction of large succinct indexes. In the first part of the paper we propose a new algorithm for merging succinct representations of de Br…
View article: Compressing and Querying Integer Dictionaries Under Linearities and Repetitions
Compressing and Querying Integer Dictionaries Under Linearities and Repetitions Open
We revisit the fundamental problem of compressing an integer dictionary that supports efficient rank and select operations by exploiting simultaneously two kinds of regularities arising in real data: repetitiveness and approximate linearit…
View article: Space Efficient Merging of de Bruijn Graphs and Wheeler Graphs
Space Efficient Merging of de Bruijn Graphs and Wheeler Graphs Open
View article: PHONI: Streamed Matching Statistics with Multi-Genome References
PHONI: Streamed Matching Statistics with Multi-Genome References Open
Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this cas…
View article: Efficiently Merging r-indexes
Efficiently Merging r-indexes Open
Large sequencing projects, such as GenomeTrakr and MetaSub, are updated frequently (sometimes daily, in the case of GenomeTrakr) with new data. Therefore, it is imperative that any data structure indexing such data supports efficient updat…
View article: Repetition- and Linearity-Aware Rank/Select Dictionaries
Repetition- and Linearity-Aware Rank/Select Dictionaries Open
We revisit the fundamental problem of compressing an integer dictionary that supports efficient rank and select operations by exploiting two kinds of regularities arising in real data: repetitiveness and approximate linearity. Our first co…
View article: PFP Compressed Suffix Trees
PFP Compressed Suffix Trees Open
Prefix-free parsing (PFP) was introduced by Boucher et al. (2019) as a preprocessing step to ease the computation of Burrows-Wheeler Transforms (BWTs) of genomic databases. Given a string S, it produces a dictionary D and a p…
View article: Compressing and indexing aligned readsets
Compressing and indexing aligned readsets Open
Compressed full-text indexes are one of the main success stories of bioinformatics data structures but even they struggle to handle some DNA readsets. This may seem surprising since, at least when dealing with short reads from the same ind…
View article: PFP Data Structures
PFP Data Structures Open
Prefix-free parsing (PFP) was introduced by Boucher et al. (2019) as a preprocessing step to ease the computation of Burrows-Wheeler Transforms (BWTs) of genomic databases. Given a string $S$, it produces a dictionary $D$ and a parse $P$ o…