BERTax: taxonomic classification of DNA sequences with Deep Neural Networks Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.1101/2021.07.09.451778
Taxonomic classification, i.e., the identification and assignment to groups of biological organisms with the same origin and characteristics, is a common task in genetics. Nowadays, taxonomic classification is mainly based on genome similarity search to large genome databases. In this process, the classification quality depends heavily on the database since representative relatives have to be known already. Many genomic sequences cannot be classified at all or only with a high misclassification rate. Here we present BERTax , a program that uses a deep neural network to pre-cisely classify the superkingdom, phylum, and genus of DNA sequences taxonomically without the need for a known representative relative from a database. For this, BERTax uses the natural language processing model BERT trained to represent DNA. We show BERTax to be at least on par with the state-of-the-art approaches when taxonomically similar species are part of the training data. In case of an entirely novel organism, however, BERTax clearly outperforms any existing approach. Finally, we show that BERTax can also be combined with database approaches to further increase the prediction quality. Since BERTax is not based on homologous entries in databases, it allows precise taxonomic classification of a broader range of genomic sequences. This leads to a higher number of correctly classified sequences and thus increases the overall information gain.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- https://doi.org/10.1101/2021.07.09.451778
- https://www.biorxiv.org/content/biorxiv/early/2021/07/10/2021.07.09.451778.full.pdf
- OA Status
- green
- Cited By
- 21
- References
- 35
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W3179640609
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W3179640609Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1101/2021.07.09.451778Digital Object Identifier
- Title
-
BERTax: taxonomic classification of DNA sequences with Deep Neural NetworksWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2021Year of publication
- Publication date
-
2021-07-10Full publication date if available
- Authors
-
Florian Mock, Fleming Kretschmer, Anton Kriese, Sebastian Böcker, Manja MarzList of authors in order
- Landing page
-
https://doi.org/10.1101/2021.07.09.451778Publisher landing page
- PDF URL
-
https://www.biorxiv.org/content/biorxiv/early/2021/07/10/2021.07.09.451778.full.pdfDirect link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://www.biorxiv.org/content/biorxiv/early/2021/07/10/2021.07.09.451778.full.pdfDirect OA link when available
- Concepts
-
Taxonomic rank, Identification (biology), Artificial intelligence, Genome, Range (aeronautics), Artificial neural network, Similarity (geometry), DNA sequencing, Computer science, Biological classification, Biology, Evolutionary biology, DNA, Gene, Genetics, Ecology, Taxon, Composite material, Image (mathematics), Materials scienceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
21Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 3, 2024: 9, 2023: 5, 2022: 3, 2021: 1Per-year citation counts (last 5 years)
- References (count)
-
35Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W3179640609 |
|---|---|
| doi | https://doi.org/10.1101/2021.07.09.451778 |
| ids.doi | https://doi.org/10.1101/2021.07.09.451778 |
| ids.mag | 3179640609 |
| ids.openalex | https://openalex.org/W3179640609 |
| fwci | 1.75558369 |
| type | preprint |
| title | BERTax: taxonomic classification of DNA sequences with Deep Neural Networks |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10015 |
| topics[0].field.id | https://openalex.org/fields/13 |
| topics[0].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[0].score | 0.9998999834060669 |
| topics[0].domain.id | https://openalex.org/domains/1 |
| topics[0].domain.display_name | Life Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1312 |
| topics[0].subfield.display_name | Molecular Biology |
| topics[0].display_name | Genomics and Phylogenetic Studies |
| topics[1].id | https://openalex.org/T10012 |
| topics[1].field.id | https://openalex.org/fields/13 |
| topics[1].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[1].score | 0.9983999729156494 |
| topics[1].domain.id | https://openalex.org/domains/1 |
| topics[1].domain.display_name | Life Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1311 |
| topics[1].subfield.display_name | Genetics |
| topics[1].display_name | Genetic diversity and population structure |
| topics[2].id | https://openalex.org/T12254 |
| topics[2].field.id | https://openalex.org/fields/13 |
| topics[2].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[2].score | 0.9969000220298767 |
| topics[2].domain.id | https://openalex.org/domains/1 |
| topics[2].domain.display_name | Life Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1312 |
| topics[2].subfield.display_name | Molecular Biology |
| topics[2].display_name | Machine Learning in Bioinformatics |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C189592816 |
| concepts[0].level | 3 |
| concepts[0].score | 0.6071515083312988 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q427626 |
| concepts[0].display_name | Taxonomic rank |
| concepts[1].id | https://openalex.org/C116834253 |
| concepts[1].level | 2 |
| concepts[1].score | 0.5663026571273804 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q2039217 |
| concepts[1].display_name | Identification (biology) |
| concepts[2].id | https://openalex.org/C154945302 |
| concepts[2].level | 1 |
| concepts[2].score | 0.5141063332557678 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[2].display_name | Artificial intelligence |
| concepts[3].id | https://openalex.org/C141231307 |
| concepts[3].level | 3 |
| concepts[3].score | 0.49791526794433594 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q7020 |
| concepts[3].display_name | Genome |
| concepts[4].id | https://openalex.org/C204323151 |
| concepts[4].level | 2 |
| concepts[4].score | 0.49775150418281555 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q905424 |
| concepts[4].display_name | Range (aeronautics) |
| concepts[5].id | https://openalex.org/C50644808 |
| concepts[5].level | 2 |
| concepts[5].score | 0.4867672920227051 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q192776 |
| concepts[5].display_name | Artificial neural network |
| concepts[6].id | https://openalex.org/C103278499 |
| concepts[6].level | 3 |
| concepts[6].score | 0.4711659848690033 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q254465 |
| concepts[6].display_name | Similarity (geometry) |
| concepts[7].id | https://openalex.org/C51679486 |
| concepts[7].level | 3 |
| concepts[7].score | 0.44907575845718384 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q380546 |
| concepts[7].display_name | DNA sequencing |
| concepts[8].id | https://openalex.org/C41008148 |
| concepts[8].level | 0 |
| concepts[8].score | 0.4478500783443451 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[8].display_name | Computer science |
| concepts[9].id | https://openalex.org/C48702757 |
| concepts[9].level | 2 |
| concepts[9].score | 0.4469425678253174 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q8269924 |
| concepts[9].display_name | Biological classification |
| concepts[10].id | https://openalex.org/C86803240 |
| concepts[10].level | 0 |
| concepts[10].score | 0.3684895634651184 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[10].display_name | Biology |
| concepts[11].id | https://openalex.org/C78458016 |
| concepts[11].level | 1 |
| concepts[11].score | 0.24680450558662415 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q840400 |
| concepts[11].display_name | Evolutionary biology |
| concepts[12].id | https://openalex.org/C552990157 |
| concepts[12].level | 2 |
| concepts[12].score | 0.18455567955970764 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q7430 |
| concepts[12].display_name | DNA |
| concepts[13].id | https://openalex.org/C104317684 |
| concepts[13].level | 2 |
| concepts[13].score | 0.16328668594360352 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q7187 |
| concepts[13].display_name | Gene |
| concepts[14].id | https://openalex.org/C54355233 |
| concepts[14].level | 1 |
| concepts[14].score | 0.14736396074295044 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q7162 |
| concepts[14].display_name | Genetics |
| concepts[15].id | https://openalex.org/C18903297 |
| concepts[15].level | 1 |
| concepts[15].score | 0.08151617646217346 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q7150 |
| concepts[15].display_name | Ecology |
| concepts[16].id | https://openalex.org/C71640776 |
| concepts[16].level | 2 |
| concepts[16].score | 0.0791935920715332 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q16521 |
| concepts[16].display_name | Taxon |
| concepts[17].id | https://openalex.org/C159985019 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q181790 |
| concepts[17].display_name | Composite material |
| concepts[18].id | https://openalex.org/C115961682 |
| concepts[18].level | 2 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q860623 |
| concepts[18].display_name | Image (mathematics) |
| concepts[19].id | https://openalex.org/C192562407 |
| concepts[19].level | 0 |
| concepts[19].score | 0.0 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q228736 |
| concepts[19].display_name | Materials science |
| keywords[0].id | https://openalex.org/keywords/taxonomic-rank |
| keywords[0].score | 0.6071515083312988 |
| keywords[0].display_name | Taxonomic rank |
| keywords[1].id | https://openalex.org/keywords/identification |
| keywords[1].score | 0.5663026571273804 |
| keywords[1].display_name | Identification (biology) |
| keywords[2].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[2].score | 0.5141063332557678 |
| keywords[2].display_name | Artificial intelligence |
| keywords[3].id | https://openalex.org/keywords/genome |
| keywords[3].score | 0.49791526794433594 |
| keywords[3].display_name | Genome |
| keywords[4].id | https://openalex.org/keywords/range |
| keywords[4].score | 0.49775150418281555 |
| keywords[4].display_name | Range (aeronautics) |
| keywords[5].id | https://openalex.org/keywords/artificial-neural-network |
| keywords[5].score | 0.4867672920227051 |
| keywords[5].display_name | Artificial neural network |
| keywords[6].id | https://openalex.org/keywords/similarity |
| keywords[6].score | 0.4711659848690033 |
| keywords[6].display_name | Similarity (geometry) |
| keywords[7].id | https://openalex.org/keywords/dna-sequencing |
| keywords[7].score | 0.44907575845718384 |
| keywords[7].display_name | DNA sequencing |
| keywords[8].id | https://openalex.org/keywords/computer-science |
| keywords[8].score | 0.4478500783443451 |
| keywords[8].display_name | Computer science |
| keywords[9].id | https://openalex.org/keywords/biological-classification |
| keywords[9].score | 0.4469425678253174 |
| keywords[9].display_name | Biological classification |
| keywords[10].id | https://openalex.org/keywords/biology |
| keywords[10].score | 0.3684895634651184 |
| keywords[10].display_name | Biology |
| keywords[11].id | https://openalex.org/keywords/evolutionary-biology |
| keywords[11].score | 0.24680450558662415 |
| keywords[11].display_name | Evolutionary biology |
| keywords[12].id | https://openalex.org/keywords/dna |
| keywords[12].score | 0.18455567955970764 |
| keywords[12].display_name | DNA |
| keywords[13].id | https://openalex.org/keywords/gene |
| keywords[13].score | 0.16328668594360352 |
| keywords[13].display_name | Gene |
| keywords[14].id | https://openalex.org/keywords/genetics |
| keywords[14].score | 0.14736396074295044 |
| keywords[14].display_name | Genetics |
| keywords[15].id | https://openalex.org/keywords/ecology |
| keywords[15].score | 0.08151617646217346 |
| keywords[15].display_name | Ecology |
| keywords[16].id | https://openalex.org/keywords/taxon |
| keywords[16].score | 0.0791935920715332 |
| keywords[16].display_name | Taxon |
| language | en |
| locations[0].id | doi:10.1101/2021.07.09.451778 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306402567 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | False |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | bioRxiv (Cold Spring Harbor Laboratory) |
| locations[0].source.host_organization | https://openalex.org/I2750212522 |
| locations[0].source.host_organization_name | Cold Spring Harbor Laboratory |
| locations[0].source.host_organization_lineage | https://openalex.org/I2750212522 |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://www.biorxiv.org/content/biorxiv/early/2021/07/10/2021.07.09.451778.full.pdf |
| locations[0].version | acceptedVersion |
| locations[0].raw_type | posted-content |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | True |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.1101/2021.07.09.451778 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5022326153 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-1791-4437 |
| authorships[0].author.display_name | Florian Mock |
| authorships[0].countries | DE |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I76198965 |
| authorships[0].affiliations[0].raw_affiliation_string | RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Germany |
| authorships[0].institutions[0].id | https://openalex.org/I76198965 |
| authorships[0].institutions[0].ror | https://ror.org/05qpz1x62 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I76198965 |
| authorships[0].institutions[0].country_code | DE |
| authorships[0].institutions[0].display_name | Friedrich Schiller University Jena |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Florian Mock |
| authorships[0].is_corresponding | True |
| authorships[0].raw_affiliation_strings | RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Germany |
| authorships[1].author.id | https://openalex.org/A5070388747 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-8523-6546 |
| authorships[1].author.display_name | Fleming Kretschmer |
| authorships[1].countries | DE |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I76198965 |
| authorships[1].affiliations[0].raw_affiliation_string | Chair for Bioinformatics, Friedrich Schiller University Jena, Germany |
| authorships[1].institutions[0].id | https://openalex.org/I76198965 |
| authorships[1].institutions[0].ror | https://ror.org/05qpz1x62 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I76198965 |
| authorships[1].institutions[0].country_code | DE |
| authorships[1].institutions[0].display_name | Friedrich Schiller University Jena |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Fleming Kretschmer |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | Chair for Bioinformatics, Friedrich Schiller University Jena, Germany |
| authorships[2].author.id | https://openalex.org/A5031101805 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-3917-3385 |
| authorships[2].author.display_name | Anton Kriese |
| authorships[2].countries | DE |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I75951250 |
| authorships[2].affiliations[0].raw_affiliation_string | Freie Universität Berlin, Berlin, Germany |
| authorships[2].institutions[0].id | https://openalex.org/I75951250 |
| authorships[2].institutions[0].ror | https://ror.org/046ak2485 |
| authorships[2].institutions[0].type | education |
| authorships[2].institutions[0].lineage | https://openalex.org/I75951250 |
| authorships[2].institutions[0].country_code | DE |
| authorships[2].institutions[0].display_name | Freie Universität Berlin |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Anton Kriese |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | Freie Universität Berlin, Berlin, Germany |
| authorships[3].author.id | https://openalex.org/A5033025664 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-9304-8091 |
| authorships[3].author.display_name | Sebastian Böcker |
| authorships[3].countries | DE |
| authorships[3].affiliations[0].institution_ids | https://openalex.org/I76198965 |
| authorships[3].affiliations[0].raw_affiliation_string | Chair for Bioinformatics, Friedrich Schiller University Jena, Germany |
| authorships[3].institutions[0].id | https://openalex.org/I76198965 |
| authorships[3].institutions[0].ror | https://ror.org/05qpz1x62 |
| authorships[3].institutions[0].type | education |
| authorships[3].institutions[0].lineage | https://openalex.org/I76198965 |
| authorships[3].institutions[0].country_code | DE |
| authorships[3].institutions[0].display_name | Friedrich Schiller University Jena |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Sebastian Böcker |
| authorships[3].is_corresponding | False |
| authorships[3].raw_affiliation_strings | Chair for Bioinformatics, Friedrich Schiller University Jena, Germany |
| authorships[4].author.id | https://openalex.org/A5041475009 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-4783-8823 |
| authorships[4].author.display_name | Manja Marz |
| authorships[4].countries | DE |
| authorships[4].affiliations[0].institution_ids | https://openalex.org/I76198965 |
| authorships[4].affiliations[0].raw_affiliation_string | Bioinformatics Core Facility, Friedrich Schiller University Jena, Jena, Germany |
| authorships[4].affiliations[1].institution_ids | https://openalex.org/I315704651 |
| authorships[4].affiliations[1].raw_affiliation_string | FLI Leibniz Institute for Age Research, Jena, Germany |
| authorships[4].affiliations[2].institution_ids | https://openalex.org/I76198965 |
| authorships[4].affiliations[2].raw_affiliation_string | RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Germany |
| authorships[4].affiliations[3].raw_affiliation_string | European Virus Bioinformatics Center, Jena, Germany |
| authorships[4].institutions[0].id | https://openalex.org/I76198965 |
| authorships[4].institutions[0].ror | https://ror.org/05qpz1x62 |
| authorships[4].institutions[0].type | education |
| authorships[4].institutions[0].lineage | https://openalex.org/I76198965 |
| authorships[4].institutions[0].country_code | DE |
| authorships[4].institutions[0].display_name | Friedrich Schiller University Jena |
| authorships[4].institutions[1].id | https://openalex.org/I315704651 |
| authorships[4].institutions[1].ror | https://ror.org/01n6r0e97 |
| authorships[4].institutions[1].type | government |
| authorships[4].institutions[1].lineage | https://openalex.org/I315704651 |
| authorships[4].institutions[1].country_code | DE |
| authorships[4].institutions[1].display_name | Leibniz Association |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Manja Marz |
| authorships[4].is_corresponding | False |
| authorships[4].raw_affiliation_strings | Bioinformatics Core Facility, Friedrich Schiller University Jena, Jena, Germany, European Virus Bioinformatics Center, Jena, Germany, FLI Leibniz Institute for Age Research, Jena, Germany, RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Germany |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://www.biorxiv.org/content/biorxiv/early/2021/07/10/2021.07.09.451778.full.pdf |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | BERTax: taxonomic classification of DNA sequences with Deep Neural Networks |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10015 |
| primary_topic.field.id | https://openalex.org/fields/13 |
| primary_topic.field.display_name | Biochemistry, Genetics and Molecular Biology |
| primary_topic.score | 0.9998999834060669 |
| primary_topic.domain.id | https://openalex.org/domains/1 |
| primary_topic.domain.display_name | Life Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1312 |
| primary_topic.subfield.display_name | Molecular Biology |
| primary_topic.display_name | Genomics and Phylogenetic Studies |
| related_works | https://openalex.org/W4308364444, https://openalex.org/W2058576683, https://openalex.org/W1987638767, https://openalex.org/W2055400017, https://openalex.org/W4254024861, https://openalex.org/W3159161400, https://openalex.org/W3166190405, https://openalex.org/W2993609935, https://openalex.org/W2466494120, https://openalex.org/W1949578167 |
| cited_by_count | 21 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 3 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 9 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 5 |
| counts_by_year[3].year | 2022 |
| counts_by_year[3].cited_by_count | 3 |
| counts_by_year[4].year | 2021 |
| counts_by_year[4].cited_by_count | 1 |
| locations_count | 1 |
| best_oa_location.id | doi:10.1101/2021.07.09.451778 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306402567 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | False |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | bioRxiv (Cold Spring Harbor Laboratory) |
| best_oa_location.source.host_organization | https://openalex.org/I2750212522 |
| best_oa_location.source.host_organization_name | Cold Spring Harbor Laboratory |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I2750212522 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://www.biorxiv.org/content/biorxiv/early/2021/07/10/2021.07.09.451778.full.pdf |
| best_oa_location.version | acceptedVersion |
| best_oa_location.raw_type | posted-content |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.1101/2021.07.09.451778 |
| primary_location.id | doi:10.1101/2021.07.09.451778 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306402567 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | False |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | bioRxiv (Cold Spring Harbor Laboratory) |
| primary_location.source.host_organization | https://openalex.org/I2750212522 |
| primary_location.source.host_organization_name | Cold Spring Harbor Laboratory |
| primary_location.source.host_organization_lineage | https://openalex.org/I2750212522 |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://www.biorxiv.org/content/biorxiv/early/2021/07/10/2021.07.09.451778.full.pdf |
| primary_location.version | acceptedVersion |
| primary_location.raw_type | posted-content |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | True |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.1101/2021.07.09.451778 |
| publication_date | 2021-07-10 |
| publication_year | 2021 |
| referenced_works | https://openalex.org/W1996816368, https://openalex.org/W2128964206, https://openalex.org/W2045204781, https://openalex.org/W1916919897, https://openalex.org/W3026538346, https://openalex.org/W2951160681, https://openalex.org/W2337747100, https://openalex.org/W2562388580, https://openalex.org/W2950954328, https://openalex.org/W2990618091, https://openalex.org/W2519890620, https://openalex.org/W2789843538, https://openalex.org/W2968450569, https://openalex.org/W3008594856, https://openalex.org/W2946417913, https://openalex.org/W2995514860, https://openalex.org/W3004304303, https://openalex.org/W3156869386, https://openalex.org/W2953008890, https://openalex.org/W2912817485, https://openalex.org/W3003257820, https://openalex.org/W3035965352, https://openalex.org/W3138272361, https://openalex.org/W2809485291, https://openalex.org/W2795104011, https://openalex.org/W2900694973, https://openalex.org/W3166142427, https://openalex.org/W2855293443, https://openalex.org/W3103145119, https://openalex.org/W3099878876, https://openalex.org/W2748591825, https://openalex.org/W3127238141, https://openalex.org/W3093759954, https://openalex.org/W2003347102, https://openalex.org/W1260748775 |
| referenced_works_count | 35 |
| abstract_inverted_index., | 77 |
| abstract_inverted_index.a | 20, 69, 78, 82, 102, 107, 194, 203 |
| abstract_inverted_index.In | 39, 146 |
| abstract_inverted_index.We | 123 |
| abstract_inverted_index.an | 149 |
| abstract_inverted_index.at | 64, 128 |
| abstract_inverted_index.be | 55, 62, 127, 167 |
| abstract_inverted_index.in | 23, 186 |
| abstract_inverted_index.is | 19, 28, 180 |
| abstract_inverted_index.it | 188 |
| abstract_inverted_index.of | 10, 94, 142, 148, 193, 197, 206 |
| abstract_inverted_index.on | 31, 47, 130, 183 |
| abstract_inverted_index.or | 66 |
| abstract_inverted_index.to | 8, 35, 54, 86, 120, 126, 172, 202 |
| abstract_inverted_index.we | 74, 161 |
| abstract_inverted_index.DNA | 95 |
| abstract_inverted_index.For | 109 |
| abstract_inverted_index.all | 65 |
| abstract_inverted_index.and | 6, 17, 92, 210 |
| abstract_inverted_index.any | 157 |
| abstract_inverted_index.are | 140 |
| abstract_inverted_index.can | 165 |
| abstract_inverted_index.for | 101 |
| abstract_inverted_index.not | 181 |
| abstract_inverted_index.par | 131 |
| abstract_inverted_index.the | 4, 14, 42, 48, 89, 99, 113, 133, 143, 175, 213 |
| abstract_inverted_index.BERT | 118 |
| abstract_inverted_index.DNA. | 122 |
| abstract_inverted_index.Here | 73 |
| abstract_inverted_index.Many | 58 |
| abstract_inverted_index.This | 200 |
| abstract_inverted_index.also | 166 |
| abstract_inverted_index.case | 147 |
| abstract_inverted_index.deep | 83 |
| abstract_inverted_index.from | 106 |
| abstract_inverted_index.have | 53 |
| abstract_inverted_index.high | 70 |
| abstract_inverted_index.need | 100 |
| abstract_inverted_index.only | 67 |
| abstract_inverted_index.part | 141 |
| abstract_inverted_index.same | 15 |
| abstract_inverted_index.show | 124, 162 |
| abstract_inverted_index.task | 22 |
| abstract_inverted_index.that | 80, 163 |
| abstract_inverted_index.this | 40 |
| abstract_inverted_index.thus | 211 |
| abstract_inverted_index.uses | 81, 112 |
| abstract_inverted_index.when | 136 |
| abstract_inverted_index.with | 13, 68, 132, 169 |
| abstract_inverted_index.Since | 178 |
| abstract_inverted_index.based | 30, 182 |
| abstract_inverted_index.data. | 145 |
| abstract_inverted_index.gain. | 216 |
| abstract_inverted_index.genus | 93 |
| abstract_inverted_index.i.e., | 3 |
| abstract_inverted_index.known | 56, 103 |
| abstract_inverted_index.large | 36 |
| abstract_inverted_index.leads | 201 |
| abstract_inverted_index.least | 129 |
| abstract_inverted_index.model | 117 |
| abstract_inverted_index.novel | 151 |
| abstract_inverted_index.range | 196 |
| abstract_inverted_index.rate. | 72 |
| abstract_inverted_index.since | 50 |
| abstract_inverted_index.this, | 110 |
| abstract_inverted_index.BERTax | 76, 111, 125, 154, 164, 179 |
| abstract_inverted_index.allows | 189 |
| abstract_inverted_index.cannot | 61 |
| abstract_inverted_index.common | 21 |
| abstract_inverted_index.genome | 32, 37 |
| abstract_inverted_index.groups | 9 |
| abstract_inverted_index.higher | 204 |
| abstract_inverted_index.mainly | 29 |
| abstract_inverted_index.neural | 84 |
| abstract_inverted_index.number | 205 |
| abstract_inverted_index.origin | 16 |
| abstract_inverted_index.search | 34 |
| abstract_inverted_index.broader | 195 |
| abstract_inverted_index.clearly | 155 |
| abstract_inverted_index.depends | 45 |
| abstract_inverted_index.entries | 185 |
| abstract_inverted_index.further | 173 |
| abstract_inverted_index.genomic | 59, 198 |
| abstract_inverted_index.heavily | 46 |
| abstract_inverted_index.natural | 114 |
| abstract_inverted_index.network | 85 |
| abstract_inverted_index.overall | 214 |
| abstract_inverted_index.phylum, | 91 |
| abstract_inverted_index.precise | 190 |
| abstract_inverted_index.present | 75 |
| abstract_inverted_index.program | 79 |
| abstract_inverted_index.quality | 44 |
| abstract_inverted_index.similar | 138 |
| abstract_inverted_index.species | 139 |
| abstract_inverted_index.trained | 119 |
| abstract_inverted_index.without | 98 |
| abstract_inverted_index.Abstract | 0 |
| abstract_inverted_index.Finally, | 160 |
| abstract_inverted_index.already. | 57 |
| abstract_inverted_index.classify | 88 |
| abstract_inverted_index.combined | 168 |
| abstract_inverted_index.database | 49, 170 |
| abstract_inverted_index.entirely | 150 |
| abstract_inverted_index.existing | 158 |
| abstract_inverted_index.however, | 153 |
| abstract_inverted_index.increase | 174 |
| abstract_inverted_index.language | 115 |
| abstract_inverted_index.process, | 41 |
| abstract_inverted_index.quality. | 177 |
| abstract_inverted_index.relative | 105 |
| abstract_inverted_index.training | 144 |
| abstract_inverted_index.Nowadays, | 25 |
| abstract_inverted_index.Taxonomic | 1 |
| abstract_inverted_index.approach. | 159 |
| abstract_inverted_index.correctly | 207 |
| abstract_inverted_index.database. | 108 |
| abstract_inverted_index.genetics. | 24 |
| abstract_inverted_index.increases | 212 |
| abstract_inverted_index.organism, | 152 |
| abstract_inverted_index.organisms | 12 |
| abstract_inverted_index.relatives | 52 |
| abstract_inverted_index.represent | 121 |
| abstract_inverted_index.sequences | 60, 96, 209 |
| abstract_inverted_index.taxonomic | 26, 191 |
| abstract_inverted_index.approaches | 135, 171 |
| abstract_inverted_index.assignment | 7 |
| abstract_inverted_index.biological | 11 |
| abstract_inverted_index.classified | 63, 208 |
| abstract_inverted_index.databases, | 187 |
| abstract_inverted_index.databases. | 38 |
| abstract_inverted_index.homologous | 184 |
| abstract_inverted_index.pre-cisely | 87 |
| abstract_inverted_index.prediction | 176 |
| abstract_inverted_index.processing | 116 |
| abstract_inverted_index.sequences. | 199 |
| abstract_inverted_index.similarity | 33 |
| abstract_inverted_index.information | 215 |
| abstract_inverted_index.outperforms | 156 |
| abstract_inverted_index.superkingdom, | 90 |
| abstract_inverted_index.taxonomically | 97, 137 |
| abstract_inverted_index.classification | 27, 43, 192 |
| abstract_inverted_index.identification | 5 |
| abstract_inverted_index.representative | 51, 104 |
| abstract_inverted_index.classification, | 2 |
| abstract_inverted_index.characteristics, | 18 |
| abstract_inverted_index.state-of-the-art | 134 |
| abstract_inverted_index.misclassification | 71 |
| cited_by_percentile_year.max | 99 |
| cited_by_percentile_year.min | 89 |
| corresponding_author_ids | https://openalex.org/A5022326153 |
| countries_distinct_count | 1 |
| institutions_distinct_count | 5 |
| corresponding_institution_ids | https://openalex.org/I76198965 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.550000011920929 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile.value | 0.83278257 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |