Erik S. Wright
YOU?
Author Swipe
View article: Accurate detection of tandem repeats exposes ubiquitous reuse of biological sequences
Accurate detection of tandem repeats exposes ubiquitous reuse of biological sequences Open
Tandem repetition is one of the major processes underlying genome evolution and phenotypic diversification. While newly formed tandem repeats are often easy to identify, it is more challenging to detect repeat copies as they diverge over e…
View article: EvoWeaver: large-scale prediction of gene functional associations from coevolutionary signals
EvoWeaver: large-scale prediction of gene functional associations from coevolutionary signals Open
The known universe of uncharacterized proteins is expanding far faster than our ability to annotate their functions through laboratory study. Computational annotation approaches rely on similarity to previously studied proteins, thereby ig…
View article: Tandem Repeats Provide Evidence for Convergent Evolution to Similar Protein Structures
Tandem Repeats Provide Evidence for Convergent Evolution to Similar Protein Structures Open
Homology is a key concept underpinning the comparison of sequences across organisms. Sequence-level homology is based on a statistical framework optimized over decades of work. Recently, computational protein structure prediction has enabl…
View article: EvoWeaver Supplemental Datafiles
EvoWeaver Supplemental Datafiles Open
This dataset contains additional files related to EvoWeaver. The following are included: ProteinComplexTrees.RData: All phylogenetic trees for Complexes benchmark. These are stored in a list object with one tree per gene group. ModulesEvoW…
View article: Vancomycin-resistant Staphylococcus aureus (VRSA) can overcome the cost of antibiotic resistance and may threaten vancomycin’s clinical durability
Vancomycin-resistant Staphylococcus aureus (VRSA) can overcome the cost of antibiotic resistance and may threaten vancomycin’s clinical durability Open
Vancomycin has proven remarkably durable to resistance evolution by Staphylococcus aureus despite widespread treatment with vancomycin in the clinic. Only 16 cases of vancomycin-resistant S . aureus (VRSA) have been documented in the Unite…
View article: Applications of Machine Learning on Electronic Health Record Data to Combat Antibiotic Resistance
Applications of Machine Learning on Electronic Health Record Data to Combat Antibiotic Resistance Open
There is growing excitement about the clinical use of artificial intelligence and machine learning (ML) technologies. Advancements in computing and the accessibility of ML frameworks enable researchers to easily train predictive models usi…
View article: Many purported pseudogenes in bacterial genomes are bona fide genes
Many purported pseudogenes in bacterial genomes are bona fide genes Open
Background Microbial genomes are largely comprised of protein coding sequences, yet some genomes contain many pseudogenes caused by frameshifts or internal stop codons. These pseudogenes are believed to result from gene degradation during …
View article: Accurately clustering biological sequences in linear time by relatedness sorting
Accurately clustering biological sequences in linear time by relatedness sorting Open
Clustering biological sequences into similar groups is an increasingly important task as the number of available sequences continues to grow exponentially. Search-based approaches to clustering scale super-linearly with the number of input…
View article: EvoWeaver Supplemental Datafiles
EvoWeaver Supplemental Datafiles Open
This dataset contains additional files related to EvoWeaver. The following are included: ComplexSpeciesTree.RData: Species tree for Complexes benchmark ModulesSpeciesTree.RData: Species tree for Modules benchmark ProteinComplexTrees.RData:…
View article: RefSeq bacterial protein (amino acid) sequences
RefSeq bacterial protein (amino acid) sequences Open
Bacteria_Protein.fas.gz 151,835,459 protein (amino acid) sequences extracted from 44,831 randomly selected bacterial genomes from NCBI's RefSeq (release 220). Sequences are named by their accession number, followed by "|" and their PGAP pr…
View article: RefSeq bacterial protein coding (nucleotide) sequences
RefSeq bacterial protein coding (nucleotide) sequences Open
Bacteria_Nucleotide.fas.gz 151,835,459 protein coding (nucleotide) sequences extracted from 44,831 randomly selected bacterial genomes from NCBI's RefSeq (release 220). Sequences are named by their accession number, followed by "|" and the…
View article: RefSeq bacterial protein (amino acid) sequences
RefSeq bacterial protein (amino acid) sequences Open
Bacteria_Protein.fas.gz 151,835,459 protein (amino acid) sequences extracted from 44,831 randomly selected bacterial genomes from NCBI's RefSeq (release 220). Sequences are named by their accession number, followed by "|" and their PGAP pr…
View article: RefSeq bacterial protein coding (nucleotide) sequences
RefSeq bacterial protein coding (nucleotide) sequences Open
Bacteria_Nucleotide.fas.gz 151,835,459 protein coding (nucleotide) sequences extracted from 44,831 randomly selected bacterial genomes from NCBI's RefSeq (release 220). Sequences are named by their accession number, followed by "|" and the…
View article: Protein sequences matching TIGRFAM models
Protein sequences matching TIGRFAM models Open
A set of 411 large and 3,001 small TIGRFAM protein families originally used for benchmarking sequence clustering programs. Large families contain at least 20,000 sequences with a genus label, whereas small contain fewer than 20,000. Sequen…
View article: Protein sequences matching TIGRFAM models
Protein sequences matching TIGRFAM models Open
A set of 411 large and 3,001 small TIGRFAM protein families originally used for benchmarking sequence clustering programs. Large families contain at least 20,000 sequences with a genus label, whereas small contain fewer than 20,000. Sequen…
View article: EvoWeaver Supplemental Datafiles
EvoWeaver Supplemental Datafiles Open
This dataset contains additional files related to EvoWeaver. The following are included: ComplexSpeciesTree.RData: Species tree for Complexes benchmark ModulesSpeciesTree.RData: Species tree for Modules benchmark ProteinComplexTrees.RData:…
View article: Many purported pseudogenes in bacterial genomes are bonafide genes - Multirun assemblies part 2
Many purported pseudogenes in bacterial genomes are bonafide genes - Multirun assemblies part 2 Open
GFFs and parsed data associated with re-assemblies of single SRA runs where a Biosample had multiple associated sequencing runs. Also includes assemblies, annotations, and parsed data for simulated reads.
View article: Many purported pseudogenes in bacterial genomes are bonafide genes - Multirun assemblies part 1
Many purported pseudogenes in bacterial genomes are bonafide genes - Multirun assemblies part 1 Open
FASTA files of assemblies generated from separate runs for the same Biosample.
View article: Many purported pseudogenes in bacterial genomes are bonafide genes - Factorial reassembly part 2
Many purported pseudogenes in bacterial genomes are bonafide genes - Factorial reassembly part 2 Open
Annotations and parsed data for a factorial reassembly of Refseq associated SRA reads.
View article: Many purported pseudogenes in bacterial genomes are bonafide genes - Factorial data part 1
Many purported pseudogenes in bacterial genomes are bonafide genes - Factorial data part 1 Open
Reassemblies of short reads associated with RefSeq biosamples under a variety of conditions.
View article: Many purported pseudogenes in bacterial genomes are bonafide genes
Many purported pseudogenes in bacterial genomes are bonafide genes Open
These files are summary files from metadata searches that are too large to be homed on a github repository. Some are required to knit the github README file and recreate the figures generated there, while others are included for completene…
View article: Evaluating the long-term portrayal of antibiotic resistance in major U.S. newspapers
Evaluating the long-term portrayal of antibiotic resistance in major U.S. newspapers Open
Background Popular media play a critical role in informing the public about antibiotic resistance, which has remained a health concern for over seven decades. Media attention increases the notoriety of antibiotic resistance and shapes the …
View article: Annotated sequences extracted from bacterial genomes
Annotated sequences extracted from bacterial genomes Open
Three files containing sequences extracted from 1,049,210 bacterial genomes available from GenBank (release 252). Protein coding sequences were annotated with IDTAXA (PMID: 34541527) using taxon-specific KEGG groups (Bacteria_Protein_subse…
View article: Annotated sequences extracted from bacterial genomes
Annotated sequences extracted from bacterial genomes Open
Three files containing sequences extracted from 1,049,210 bacterial genomes available from GenBank (release 252). Protein coding sequences were annotated with IDTAXA (PMID: 34541527) using taxon-specific KEGG groups (Bacteria_Protein_subse…
View article: Multi-factorial examination of amplicon sequencing workflows from sample preparation to bioinformatic analysis
Multi-factorial examination of amplicon sequencing workflows from sample preparation to bioinformatic analysis Open
Background The development of sequencing technologies to evaluate bacterial microbiota composition has allowed new insights into the importance of microbial ecology. However, the variety of methodologies used among amplicon sequencing work…
View article: EvoWeaver Supplemental Datafiles
EvoWeaver Supplemental Datafiles Open
This dataset contains additional files related to EvoWeaver. The following are included: ComplexSpeciesTree.RData: Species tree for Complexes benchmark ModulesSpeciesTree.RData: Species tree for Modules benchmark ProteinComplexTrees.RData:…
View article: Additional file 1 of Multi-factorial examination of amplicon sequencing workflows from sample preparation to bioinformatic analysis
Additional file 1 of Multi-factorial examination of amplicon sequencing workflows from sample preparation to bioinformatic analysis Open
Additional file 1.
View article: TIGRFAM protein sequences named by taxonomy
TIGRFAM protein sequences named by taxonomy Open
A set of 411 TIGRFAM protein families originally used for benchmarking sequence clustering programs. Sequences were downloaded from NCBI and renamed by their original name (accession number) followed by their semi-colon separated NCBI taxo…
View article: TIGRFAM protein sequences named by taxonomy
TIGRFAM protein sequences named by taxonomy Open
A set of 411 TIGRFAM protein families originally used for benchmarking sequence clustering programs. Sequences were downloaded from NCBI and renamed by their original name (accession number) followed by their semi-colon separated NCBI taxo…