Grigorios Loukides
YOU?
Author Swipe
Fast Assessment of Eulerian Trails in Graphs with Applications Open
Enumerating or counting combinatorial objects in graphs is a fundamental data mining task. We consider the problem of assessing the number of Eulerian trails in directed graphs, which is formalized as follows: Given a directed graph \(G=(V…
Text indexing for long patterns using locally consistent anchors Open
In many real-world database systems, a large fraction of the data is represented by strings: sequences of letters over some alphabet. This is because strings can easily encode data arising from different sources. It is often crucial to rep…
Resilient Pattern Mining Open
Frequent pattern mining is a flagship problem in data mining. In its most basic form, it asks for the set of substrings of a given string $S$ of length $n$ that occur at least $τ$ times in $S$, for some integer $τ\in[1,n]$. We introduce a …
Indexing Strings with Utilities Open
Applications in domains ranging from bioinformatics to advertising feature strings (sequences of letters over some alphabet) that come with numerical scores (utilities). The utilities quantify the importance, interest, profit, or risk of t…
Indexing Strings with Utilities Open
Applications in domains ranging from bioinformatics to advertising feature strings that come with numerical scores (utilities). The utilities quantify the importance, interest, profit, or risk of the letters occurring at every position of …
View article: U-index: A Universal Indexing Framework for Matching Long Patterns
U-index: A Universal Indexing Framework for Matching Long Patterns Open
Text indexing is a fundamental and well-studied problem. Classic solutions either replace the original text with a compressed representation, e.g., the FM-index and its variants, or keep it uncompressed but attach some redundancy - an inde…
Heavy Nodes in a Small Neighborhood: Exact and Peeling Algorithms With Applications Open
We introduce a weighted and unconstrained variant of the well-known minimum κ union problem: Given a bipartite graph G(U,V,E) with weights for all nodes in V, find a set S⊆ V such that the ratio between the total weight of the nodes in S a…
Scalable Order-Preserving Pattern Mining Open
Time series are ubiquitous in domains ranging from medicine to marketing and finance. Frequent Pattern Mining (FPM) from a time series has thus received much attention. This general problem has been studied under different matching relatio…
Scalable Order-Preserving Pattern Mining Open
Time series are ubiquitous in domains ranging from medicine to marketing and finance. Frequent Pattern Mining (FPM) from a time series has thus received much attention. Recently, it has been studied under the order-preserving (OP) matching…
Text Indexing for Long Patterns using Locally Consistent Anchors Open
In many real-world database systems, a large fraction of the data is represented by strings: sequences of letters over some alphabet. This is because strings can easily encode data arising from different sources. It is often crucial to rep…
Space-Efficient Indexes for Uncertain Strings Open
Strings in the real world are often encoded with some level of uncertainty, for example, due to: unreliable data measurements; flexible sequence modeling; or noise introduced for privacy protection. In the character-level uncertainty model…
Minimizing the Minimizers via Alphabet Reordering Open
Minimizers sampling is one of the most widely-used mechanisms for sampling strings [Roberts et al., Bioinformatics 2004]. Let $S=S[1]\ldots S[n]$ be a string over a totally ordered alphabet $Σ$. Further let $w\geq 2$ and $k\geq 1$ be two i…
Space-Efficient Indexes for Uncertain Strings Open
Strings in the real world are often encoded with some level of uncertainty. In the character-level uncertainty model, an uncertain string $X$ of length $n$ on an alphabet $Σ$ is a sequence of $n$ probability distributions over $Σ$. Given a…
Pattern Masking for Dictionary Matching: Theory and Practice Open
Data masking is a common technique for sanitizing sensitive data maintained in database systems which is becoming increasingly important in various application areas, such as in record linkage of personal data. This work formalizes the Pat…
Ego-Network Segmentation via (Weighted) Jaccard Median Open
An ego-network is a graph representing the interactions of a node (ego) with its neighbors and the interactions among those neighbors. A sequence of ego-networks having the same ego can thus model the evolution of these interactions over t…
On Breaking Truss-based and Core-based Communities Open
We introduce the general problem of identifying a smallest edge subset of a given graph whose deletion makes the graph community-free. We consider this problem under two community notions that have attracted significant attention: k -truss…
Connecting de Bruijn Graphs Open
We study the problem of making a de Bruijn graph (dBG), constructed from a collection of strings, weakly connected while minimizing the total cost of edge additions. The input graph is a dBG that can be made weakly connected by adding edge…
Sparse Suffix and LCP Array: Simple, Direct, Small, and Fast Open
Sparse suffix sorting is the problem of sorting $b=o(n)$ suffixes of a string of length $n$. Efficient sparse suffix sorting algorithms have existed for more than a decade. Despite the multitude of works and their justified claims for appl…
Text Indexing for Long Patterns: Anchors are All you Need Open
In many real-world database systems, a large fraction of the data is represented by strings: sequences of letters over some alphabet. This is because strings can easily encode data arising from different sources. It is often crucial to rep…
Bidirectional String Anchors for Improved Text Indexing and Top-$K$ Similarity Search Open
The minimizers sampling mechanism is a popular mechanism for string sampling. However, minimizers sampling mechanisms lack good guarantees on the expected size of their samples for different combinations of their input parameters. Furtherm…
Suffix-Prefix Queries on a Dictionary Open
In the all-pairs suffix-prefix (APSP) problem, we are given a dictionary R of k strings, S_1,…,S_k, of total length n, and we are asked to find the length SPL_{i,j} of the longest string that is both a suffix of S_i and a prefix of S_j, fo…
Jaccard Median for Ego-network Segmentation Open
An ego-network is a graph representing the interactions of a node (ego) with its neighbors and the interactions among those neighbors. A sequence of ego-networks having the same ego can thus model the evolution of these interactions over t…
Differentially Private Top-k Selection via Canonical Lipschitz Mechanism Open
Selecting the top-$k$ highest scoring items under differential privacy (DP) is a fundamental task with many applications. This work presents three new results. First, the exponential mechanism, permute-and-flip and report-noisy-max, as wel…
Hide and Mine in Strings: Hardness, Algorithms, and Experiments Open
Data sanitization and frequent pattern mining are two well-studied topics in data mining. Our work initiates a study on the fundamental relation between data sanitization and frequent pattern mining in the context of sequential (string) da…
On Strings Having the Same Length- k Substrings Open
Let Substr_k(X) denote the set of length-k substrings of a given string X for a given integer k > 0. We study the following basic string problem, called z-Shortest _k-Equivalent Strings: Given a set _k of n length-k strings and an integer …