EvoWeaver Supplemental Datafiles Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.5281/zenodo.8423024
· OA: W4394051926
This dataset contains additional files related to EvoWeaver. The following are included: ProteinComplexTrees.RData: All phylogenetic trees for Complexes benchmark. These are stored in a list object with one tree per gene group. ModulesEvoWeaver.RData: EvoWeaver object for Modules benchmark, containing phylogenetic trees for the Modules and Multiclass benchmark. `ModulePredAllPairs.RData` was made using this object. CORUM_Blast_Results.RData: All results from pairwise BLAST of proteomes against human reference genes. CORUM_proteomes.zip: all proteomes for all organisms used. Some of these are length 0, if an assembly could not be programatically found or retrieval failed to work. CORUMOrthogroupsWithIndices.RData: Orthogroups for the CORUM benchmark with gene index data included KOsWithPositions.RData: Gene index data for KO groups used in this study ModsWithPositions.RData: Gene index data for modules used in this study AllKEGGModules.RData: KEGG module taxonomy, names, and pathways for all modules used at time of download KEGGModuleComplexes.RData: All complexes in a KEGG module at time of download KEGGModuleDefinitions.RData: All KEGG module definitions at time of download ModulesPositionData.RData: Gene index information for Modules benchmark COG.links.detailed.v12.0.txt.tar.gz: STRING evidence streams between COGs (compressed, 1.44GB uncompressed) COG.mappings.v12.0.txt.tar.gz: STRING COG definitions (compressed, 5.80GB uncompressed) AllHumanGenes.fa: Human gene sequences used for BLASTing against in the CORUM benchmark AllKEGGCDSs.RData: all sets of available genes from all genomes used in KEGG. corum_bitscores.tsv: corum bitscores used as inputs to CladeOScope corum_npp.tsv: corum normalized phylogenetic profiles used as inputs to CladeOScope EukaryoteEWData.RData: Same as ModulesEvoWeaver.RData, but restricted to only eukaryotic sequences Note that internal algorithm names may not exactly match those in published material due to computational requirements (e.g., difficulty naming functions/variables with special characters). See the GitHub page for a description of which internal names correspond to algorithms in the text.