Tandy Warnow
YOU?
Author Swipe
View article: Using stochastic block models for community detection
Using stochastic block models for community detection Open
A recent study reported by Park et al. (Improved community detection using stochastic block models, Springer, Heidelberg, 2025) in Complex Networks and their Applications 2024 showed that clusterings from three Stochastic Block Models (SBM…
View article: FastEnsemble: Scalable ensemble clustering on large networks
FastEnsemble: Scalable ensemble clustering on large networks Open
Many community detection algorithms are inherently stochastic, leading to variations in their output depending on input parameters and random seeds. This variability makes the results of a single run of these algorithms less reliable. More…
View article: The Earth BioGenome Project Phase II: illuminating the eukaryotic tree of life
The Earth BioGenome Project Phase II: illuminating the eukaryotic tree of life Open
The Earth BioGenome Project (EBP) aims to “sequence life for the future of life” by generating high-quality reference genome sequences for all recognized eukaryotic species, thereby building a rich knowledge base to inform conservation, in…
View article: TIPP-SD: A New Method for Species Detection in Microbiomes
TIPP-SD: A New Method for Species Detection in Microbiomes Open
In this study, we present TIPP-SD (i.e., TIPP for Species Detection), a new technique for species detection in a microbiome sample. TIPP-SD uses a modified version of TIPP3, which is a recently developed abundance profiling tool based on m…
View article: Dense Subgraph Clustering and a New Cluster Ensemble Method
Dense Subgraph Clustering and a New Cluster Ensemble Method Open
We propose DSC-Flow-Iter, a new community detection algorithm that is based on iterative extraction of dense subgraphs. Although DSC-Flow-Iter leaves many nodes unclustered, it is competitive with leading methods and has high-precision and…
View article: EC-SBM synthetic network generator
EC-SBM synthetic network generator Open
Generating high-quality synthetic networks with realistic community structure is vital to effectively evaluate community detection algorithms. In this study, we propose a new synthetic network generator called the Edge-Connected Stochastic…
View article: BSCAMPP: Batch-Scaled Phylogenetic Placement on Large Trees
BSCAMPP: Batch-Scaled Phylogenetic Placement on Large Trees Open
Phylogenetic placement is the problem of placing sequences into a given phylogenetic tree, called a "backbone tree". EPA-ng and pplacer are the two most accurate phylogenetic placement methods, but both can fail to complete when the backbo…
View article: TIPP3 and TIPP3-fast: Improved abundance profiling in metagenomics
TIPP3 and TIPP3-fast: Improved abundance profiling in metagenomics Open
We present TIPP3 and TIPP3-fast, new tools for abundance profiling in metagenomic datasets. Like its predecessor, TIPP2, the TIPP3 pipeline uses a maximum likelihood approach to place reads into labeled taxonomies using marker genes, but i…
View article: EC-SBM Synthetic Network Generator
EC-SBM Synthetic Network Generator Open
Generating high-quality synthetic networks with realistic community structure is vital to effectively evaluate community detection algorithms. In this study, we propose a new synthetic network generator called the Edge-Connected Stochastic…
View article: RECCS: Realistic Cluster Connectivity Simulator for Synthetic Network Generation
RECCS: Realistic Cluster Connectivity Simulator for Synthetic Network Generation Open
The limited availability of useful ground-truth communities in real-world networks presents a challenge to evaluating and selecting a "best" community detection method for a given network or family of networks. The use of synthetic network…
View article: Improved Community Detection using Stochastic Block Models
Improved Community Detection using Stochastic Block Models Open
Identifying edge-dense communities that are also well-connected is an important aspect of understanding community structure. Prior work has shown that community detection methods can produce poorly connected communities, and some can even …
View article: Biological databases in the age of generative artificial intelligence
Biological databases in the age of generative artificial intelligence Open
Summary Modern biological research critically depends on public databases. The introduction and propagation of errors within and across databases can lead to wasted resources as scientists are led astray by bad data or have to conduct expe…
View article: Well-connectedness and community detection
Well-connectedness and community detection Open
Community detection methods help reveal the meso-scale structure of complex networks. Integral to detecting communities is the expectation that communities in a network are edge-dense and “well-connected”. Surprisingly, we find that five d…
View article: TIPP3 and TIPP3-fast: Improved Abundance Profiling in Metagenomics
TIPP3 and TIPP3-fast: Improved Abundance Profiling in Metagenomics Open
We present TIPP3 and TIPP3-fast, new tools for abundance profiling in metagenomic datasets. Like its predecessor, TIPP2, the TIPP3 pipeline uses a maximum likelihood approach to place reads into labeled taxonomies using marker genes, but i…
View article: Axioms for clustering simple unweighted graphs: No impossibility result
Axioms for clustering simple unweighted graphs: No impossibility result Open
In 2002, Kleinberg proposed three axioms for distance-based clustering, and proved that it was impossible for a clustering method to satisfy all three. While there has been much subsequent work examining and modifying these axioms for dist…
View article: FastEnsemble: scalable ensemble clustering on large networks
FastEnsemble: scalable ensemble clustering on large networks Open
Many community detection algorithms are inherently stochastic, leading to variations in their output depending on input parameters and random seeds. This variability makes the results of a single run of these algorithms less reliable. More…
View article: Synthetic Networks That Preserve Edge Connectivity
Synthetic Networks That Preserve Edge Connectivity Open
Since true communities within real-world networks are rarely known, synthetic networks with planted ground truths are valuable for evaluating the performance of community detection methods. Of the synthetic network generation tools availab…
View article: Improved Community Detection using Stochastic Block Models
Improved Community Detection using Stochastic Block Models Open
Community detection approaches resolve complex networks into smaller groups (communities) that are expected to be relatively edge-dense and well-connected. The stochastic block model (SBM) is one of several approaches used to uncover commu…
View article: Addressing Polymorphism in Linguistic Phylogenetics
Addressing Polymorphism in Linguistic Phylogenetics Open
Understanding how languages change is important not only for the reconstruction of protolanguages and for estimating diversification dates (i.e. the dates when languages split), but also for the inference of evolutionary trees (or phylogen…
View article: Complexity of avian evolution revealed by family-level genomes
Complexity of avian evolution revealed by family-level genomes Open
View article: CM++ - A Meta-method for Well-Connected CommunityDetection
CM++ - A Meta-method for Well-Connected CommunityDetection Open
View article: EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment
EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment Open
View article: Axioms for Distanceless Graph Partitioning
Axioms for Distanceless Graph Partitioning Open
In 2002, Kleinberg proposed three axioms for distance-based clustering, and proved that it was impossible for a clustering method to satisfy all three. While there has been much subsequent work examining and modifying these axioms for dist…
View article: Weighted ASTRID: fast and accurate species trees from weighted internode distances
Weighted ASTRID: fast and accurate species trees from weighted internode distances Open
Background Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplic…
View article: EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment
EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment Open
Background Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowle…
View article: Progress on Constructing Phylogenetic Networks for Languages
Progress on Constructing Phylogenetic Networks for Languages Open
In 2006, Warnow, Evans, Ringe, and Nakhleh proposed a stochastic model (hereafter, the WERN 2006 model) of multi-state linguistic character evolution that allowed for homoplasy and borrowing. They proved that if there is no borrowing betwe…
View article: Phylogenomic branch length estimation using quartets
Phylogenomic branch length estimation using quartets Open
Motivation Branch lengths and topology of a species tree are essential in most downstream analyses, including estimation of diversification dates, characterization of selection, understanding adaptation, and comparative genomics. Modern ph…
View article: Well-Connected Communities in Real-World and Synthetic Networks
Well-Connected Communities in Real-World and Synthetic Networks Open
Integral to the problem of detecting communities through graph clustering is the expectation that they are "well connected". In this respect, we examine five different community detection approaches optimizing different criteria: the Leide…
View article: DISCO+QR: Rooting Species Trees in the Presence of GDL and ILS
DISCO+QR: Rooting Species Trees in the Presence of GDL and ILS Open
A bstract Genes evolve under processes such as gene duplication and loss (GDL), so that gene family trees are multi-copy, as well as incomplete lineage sorting (ILS); both processes produce gene trees that differ from the species tree. The…
View article: DISCO+QR: rooting species trees in the presence of GDL and ILS
DISCO+QR: rooting species trees in the presence of GDL and ILS Open
Motivation Genes evolve under processes such as gene duplication and loss (GDL), so that gene family trees are multi-copy, as well as incomplete lineage sorting (ILS); both processes produce gene trees that differ from the species tree. Th…