Frequent subgraph mining for biologically meaningful structural motifs Article Swipe
YOU?
·
· 2020
· Open Access
·
· DOI: https://doi.org/10.1101/2020.05.14.095695
Identification of biologically relevant motifs in proteins is a long-standing problem in bioinformatics, especially when considering distantly related proteins where sequence analysis alone becomes increasingly difficult. Here we present a novel approach to identify such motifs in protein three-dimensional structures without depending on sequence alignment by representing structures as graphs in the form of residue interaction networks and employing a modified frequent subgraph mining algorithm. These networks represent residues as vertices while contacts between residues are denoted by edges labeled with Euclidean distances. We use frequent subgraph mining to determine all subgraphs that are subgraph isomorphic to, i.e. are contained in, at least a given number of such networks generated from structures in the same protein family. For this we introduce two extensions of the classical frequent subgraph mining: approximate matching of distance-based labels to account for small variations between protein structures and scoring as well as score-based filtering of subgraphs in order to identify structurally conserved motifs and to counteract the expanding size of the search space. This approach was then validated by demonstrating that it can rediscover previously characterized functionally important structural motifs in selected protein families. For further validation we show that it is also able to identify motifs that correspond to patterns in the PROSITE database. We then applied our approach to all superfamilies in the SCOP database and found an enrichment of residues in the ligand binding site in the discovered motifs evidencing their functional importance. Finally we use the approach to discover a novel structural motif in jelly-roll capsid proteins found in members of the picornavirus-like superfamily. This is presented together with an efficient open source implementation of the algorithm called RINminer. Author summary As the evolutionary distance between proteins increases, their sequence identity drops rapidly, whereas functionally important sequence motifs and three-dimensional (3D) structural scaffold, in which they are embedded, are more conserved. We developed an approach that automatically identifies such motifs by converting protein 3D structures into a set of graphs and then employing the frequent subgraph mining framework. In these graphs, residues are represented as vertices, and if two residues interact in the corresponding protein 3D structure, they are connected by an edge labeled with the Euclidean distance between the residues. In the classical setting of frequent subgraph mining, all subgraphs from a database of graphs are enumerated and the ones that are exactly found, i.e. are subgraph isomorphic, in more than a certain number of graphs are listed as supported. Our approach introduces two new concepts: approximately isomorphic subgraphs and an efficient scoring scheme that allows to retain only biologically relevant subgraph in the enumeration step. Approximate isomorphism allows edge labels not to match exactly, and thus account for natural deviations between 3D structures of related proteins. With our approach, we were able to automatically rediscover known motifs from PROSITE, as well as in three well-studied extremely diverse protein families. We predicted functionally important residues in SCOP superfamilies and demonstrated that they tend to lie in structurally meaningful regions: ligand-binding sites and protein core. Additionally, we present a previously unreported structural motif in jelly-roll viral capsids.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- https://doi.org/10.1101/2020.05.14.095695
- https://www.biorxiv.org/content/biorxiv/early/2020/05/14/2020.05.14.095695.full.pdf
- OA Status
- green
- Cited By
- 2
- References
- 58
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W3025868365
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W3025868365Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1101/2020.05.14.095695Digital Object Identifier
- Title
-
Frequent subgraph mining for biologically meaningful structural motifsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2020Year of publication
- Publication date
-
2020-05-14Full publication date if available
- Authors
-
Sebastian Keller, Pauli Miettinen, Olga V. KalininaList of authors in order
- Landing page
-
https://doi.org/10.1101/2020.05.14.095695Publisher landing page
- PDF URL
-
https://www.biorxiv.org/content/biorxiv/early/2020/05/14/2020.05.14.095695.full.pdfDirect link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://www.biorxiv.org/content/biorxiv/early/2020/05/14/2020.05.14.095695.full.pdfDirect OA link when available
- Concepts
-
Computational biology, Structural motif, Network motif, Euclidean geometry, Computer science, Biological network, Sequence (biology), Euclidean space, Motif (music), Identification (biology), Euclidean distance, Matching (statistics), Combinatorics, Biology, Data mining, Mathematics, Artificial intelligence, Genetics, Biochemistry, Botany, Physics, Acoustics, Geometry, StatisticsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
2Total citation count in OpenAlex
- Citations by year (recent)
-
2023: 1, 2022: 1Per-year citation counts (last 5 years)
- References (count)
-
58Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W3025868365 |
|---|---|
| doi | https://doi.org/10.1101/2020.05.14.095695 |
| ids.doi | https://doi.org/10.1101/2020.05.14.095695 |
| ids.mag | 3025868365 |
| ids.openalex | https://openalex.org/W3025868365 |
| fwci | 0.19505395 |
| type | preprint |
| title | Frequent subgraph mining for biologically meaningful structural motifs |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10015 |
| topics[0].field.id | https://openalex.org/fields/13 |
| topics[0].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[0].score | 0.9965000152587891 |
| topics[0].domain.id | https://openalex.org/domains/1 |
| topics[0].domain.display_name | Life Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1312 |
| topics[0].subfield.display_name | Molecular Biology |
| topics[0].display_name | Genomics and Phylogenetic Studies |
| topics[1].id | https://openalex.org/T10887 |
| topics[1].field.id | https://openalex.org/fields/13 |
| topics[1].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[1].score | 0.9934999942779541 |
| topics[1].domain.id | https://openalex.org/domains/1 |
| topics[1].domain.display_name | Life Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1312 |
| topics[1].subfield.display_name | Molecular Biology |
| topics[1].display_name | Bioinformatics and Genomic Networks |
| topics[2].id | https://openalex.org/T12254 |
| topics[2].field.id | https://openalex.org/fields/13 |
| topics[2].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[2].score | 0.9922000169754028 |
| topics[2].domain.id | https://openalex.org/domains/1 |
| topics[2].domain.display_name | Life Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1312 |
| topics[2].subfield.display_name | Molecular Biology |
| topics[2].display_name | Machine Learning in Bioinformatics |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C70721500 |
| concepts[0].level | 1 |
| concepts[0].score | 0.5243992209434509 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q177005 |
| concepts[0].display_name | Computational biology |
| concepts[1].id | https://openalex.org/C132677234 |
| concepts[1].level | 2 |
| concepts[1].score | 0.5240566730499268 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q3273544 |
| concepts[1].display_name | Structural motif |
| concepts[2].id | https://openalex.org/C60723933 |
| concepts[2].level | 3 |
| concepts[2].score | 0.5181359052658081 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q7001080 |
| concepts[2].display_name | Network motif |
| concepts[3].id | https://openalex.org/C129782007 |
| concepts[3].level | 2 |
| concepts[3].score | 0.48993033170700073 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q162886 |
| concepts[3].display_name | Euclidean geometry |
| concepts[4].id | https://openalex.org/C41008148 |
| concepts[4].level | 0 |
| concepts[4].score | 0.4862779378890991 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[4].display_name | Computer science |
| concepts[5].id | https://openalex.org/C28225019 |
| concepts[5].level | 2 |
| concepts[5].score | 0.4460144340991974 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q4915005 |
| concepts[5].display_name | Biological network |
| concepts[6].id | https://openalex.org/C2778112365 |
| concepts[6].level | 2 |
| concepts[6].score | 0.43742144107818604 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q3511065 |
| concepts[6].display_name | Sequence (biology) |
| concepts[7].id | https://openalex.org/C186450821 |
| concepts[7].level | 2 |
| concepts[7].score | 0.42431414127349854 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q17295 |
| concepts[7].display_name | Euclidean space |
| concepts[8].id | https://openalex.org/C32276052 |
| concepts[8].level | 2 |
| concepts[8].score | 0.4242319166660309 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q908349 |
| concepts[8].display_name | Motif (music) |
| concepts[9].id | https://openalex.org/C116834253 |
| concepts[9].level | 2 |
| concepts[9].score | 0.42362987995147705 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q2039217 |
| concepts[9].display_name | Identification (biology) |
| concepts[10].id | https://openalex.org/C120174047 |
| concepts[10].level | 2 |
| concepts[10].score | 0.4200848937034607 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q847073 |
| concepts[10].display_name | Euclidean distance |
| concepts[11].id | https://openalex.org/C165064840 |
| concepts[11].level | 2 |
| concepts[11].score | 0.4185880422592163 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q1321061 |
| concepts[11].display_name | Matching (statistics) |
| concepts[12].id | https://openalex.org/C114614502 |
| concepts[12].level | 1 |
| concepts[12].score | 0.3570038378238678 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q76592 |
| concepts[12].display_name | Combinatorics |
| concepts[13].id | https://openalex.org/C86803240 |
| concepts[13].level | 0 |
| concepts[13].score | 0.35099318623542786 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[13].display_name | Biology |
| concepts[14].id | https://openalex.org/C124101348 |
| concepts[14].level | 1 |
| concepts[14].score | 0.3419811427593231 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q172491 |
| concepts[14].display_name | Data mining |
| concepts[15].id | https://openalex.org/C33923547 |
| concepts[15].level | 0 |
| concepts[15].score | 0.28450435400009155 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[15].display_name | Mathematics |
| concepts[16].id | https://openalex.org/C154945302 |
| concepts[16].level | 1 |
| concepts[16].score | 0.2509664297103882 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[16].display_name | Artificial intelligence |
| concepts[17].id | https://openalex.org/C54355233 |
| concepts[17].level | 1 |
| concepts[17].score | 0.19379916787147522 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q7162 |
| concepts[17].display_name | Genetics |
| concepts[18].id | https://openalex.org/C55493867 |
| concepts[18].level | 1 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q7094 |
| concepts[18].display_name | Biochemistry |
| concepts[19].id | https://openalex.org/C59822182 |
| concepts[19].level | 1 |
| concepts[19].score | 0.0 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q441 |
| concepts[19].display_name | Botany |
| concepts[20].id | https://openalex.org/C121332964 |
| concepts[20].level | 0 |
| concepts[20].score | 0.0 |
| concepts[20].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[20].display_name | Physics |
| concepts[21].id | https://openalex.org/C24890656 |
| concepts[21].level | 1 |
| concepts[21].score | 0.0 |
| concepts[21].wikidata | https://www.wikidata.org/wiki/Q82811 |
| concepts[21].display_name | Acoustics |
| concepts[22].id | https://openalex.org/C2524010 |
| concepts[22].level | 1 |
| concepts[22].score | 0.0 |
| concepts[22].wikidata | https://www.wikidata.org/wiki/Q8087 |
| concepts[22].display_name | Geometry |
| concepts[23].id | https://openalex.org/C105795698 |
| concepts[23].level | 1 |
| concepts[23].score | 0.0 |
| concepts[23].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[23].display_name | Statistics |
| keywords[0].id | https://openalex.org/keywords/computational-biology |
| keywords[0].score | 0.5243992209434509 |
| keywords[0].display_name | Computational biology |
| keywords[1].id | https://openalex.org/keywords/structural-motif |
| keywords[1].score | 0.5240566730499268 |
| keywords[1].display_name | Structural motif |
| keywords[2].id | https://openalex.org/keywords/network-motif |
| keywords[2].score | 0.5181359052658081 |
| keywords[2].display_name | Network motif |
| keywords[3].id | https://openalex.org/keywords/euclidean-geometry |
| keywords[3].score | 0.48993033170700073 |
| keywords[3].display_name | Euclidean geometry |
| keywords[4].id | https://openalex.org/keywords/computer-science |
| keywords[4].score | 0.4862779378890991 |
| keywords[4].display_name | Computer science |
| keywords[5].id | https://openalex.org/keywords/biological-network |
| keywords[5].score | 0.4460144340991974 |
| keywords[5].display_name | Biological network |
| keywords[6].id | https://openalex.org/keywords/sequence |
| keywords[6].score | 0.43742144107818604 |
| keywords[6].display_name | Sequence (biology) |
| keywords[7].id | https://openalex.org/keywords/euclidean-space |
| keywords[7].score | 0.42431414127349854 |
| keywords[7].display_name | Euclidean space |
| keywords[8].id | https://openalex.org/keywords/motif |
| keywords[8].score | 0.4242319166660309 |
| keywords[8].display_name | Motif (music) |
| keywords[9].id | https://openalex.org/keywords/identification |
| keywords[9].score | 0.42362987995147705 |
| keywords[9].display_name | Identification (biology) |
| keywords[10].id | https://openalex.org/keywords/euclidean-distance |
| keywords[10].score | 0.4200848937034607 |
| keywords[10].display_name | Euclidean distance |
| keywords[11].id | https://openalex.org/keywords/matching |
| keywords[11].score | 0.4185880422592163 |
| keywords[11].display_name | Matching (statistics) |
| keywords[12].id | https://openalex.org/keywords/combinatorics |
| keywords[12].score | 0.3570038378238678 |
| keywords[12].display_name | Combinatorics |
| keywords[13].id | https://openalex.org/keywords/biology |
| keywords[13].score | 0.35099318623542786 |
| keywords[13].display_name | Biology |
| keywords[14].id | https://openalex.org/keywords/data-mining |
| keywords[14].score | 0.3419811427593231 |
| keywords[14].display_name | Data mining |
| keywords[15].id | https://openalex.org/keywords/mathematics |
| keywords[15].score | 0.28450435400009155 |
| keywords[15].display_name | Mathematics |
| keywords[16].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[16].score | 0.2509664297103882 |
| keywords[16].display_name | Artificial intelligence |
| keywords[17].id | https://openalex.org/keywords/genetics |
| keywords[17].score | 0.19379916787147522 |
| keywords[17].display_name | Genetics |
| language | en |
| locations[0].id | doi:10.1101/2020.05.14.095695 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306402567 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | False |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | bioRxiv (Cold Spring Harbor Laboratory) |
| locations[0].source.host_organization | https://openalex.org/I2750212522 |
| locations[0].source.host_organization_name | Cold Spring Harbor Laboratory |
| locations[0].source.host_organization_lineage | https://openalex.org/I2750212522 |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://www.biorxiv.org/content/biorxiv/early/2020/05/14/2020.05.14.095695.full.pdf |
| locations[0].version | acceptedVersion |
| locations[0].raw_type | posted-content |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | True |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.1101/2020.05.14.095695 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5054366368 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-4182-5474 |
| authorships[0].author.display_name | Sebastian Keller |
| authorships[0].countries | DE |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I4210109712 |
| authorships[0].affiliations[0].raw_affiliation_string | International Max Planck Research School for Computer Science, Max Planck Institute for Informatics, Saarbrücken, Germany |
| authorships[0].affiliations[1].institution_ids | https://openalex.org/I4210142777 |
| authorships[0].affiliations[1].raw_affiliation_string | Research Group Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland, Saarbrücken, Germany |
| authorships[0].affiliations[2].institution_ids | https://openalex.org/I4210109712 |
| authorships[0].affiliations[2].raw_affiliation_string | Research Group Computational Biology, Max Planck Institute for Informatics, Saarbrücken, Germany |
| authorships[0].affiliations[3].institution_ids | https://openalex.org/I91712215 |
| authorships[0].affiliations[3].raw_affiliation_string | Graduate School of Computer Science, University of Saarland, Saarbrücken, Germany |
| authorships[0].institutions[0].id | https://openalex.org/I4210142777 |
| authorships[0].institutions[0].ror | https://ror.org/042dsac10 |
| authorships[0].institutions[0].type | government |
| authorships[0].institutions[0].lineage | https://openalex.org/I1305996414, https://openalex.org/I4210124929, https://openalex.org/I4210142777 |
| authorships[0].institutions[0].country_code | DE |
| authorships[0].institutions[0].display_name | Helmholtz Institute for Pharmaceutical Research Saarland |
| authorships[0].institutions[1].id | https://openalex.org/I4210109712 |
| authorships[0].institutions[1].ror | https://ror.org/01w19ak89 |
| authorships[0].institutions[1].type | facility |
| authorships[0].institutions[1].lineage | https://openalex.org/I149899117, https://openalex.org/I4210109712 |
| authorships[0].institutions[1].country_code | DE |
| authorships[0].institutions[1].display_name | Max Planck Institute for Informatics |
| authorships[0].institutions[2].id | https://openalex.org/I91712215 |
| authorships[0].institutions[2].ror | https://ror.org/01jdpyv68 |
| authorships[0].institutions[2].type | education |
| authorships[0].institutions[2].lineage | https://openalex.org/I91712215 |
| authorships[0].institutions[2].country_code | DE |
| authorships[0].institutions[2].display_name | Saarland University |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Sebastian Keller |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | Graduate School of Computer Science, University of Saarland, Saarbrücken, Germany, International Max Planck Research School for Computer Science, Max Planck Institute for Informatics, Saarbrücken, Germany, Research Group Computational Biology, Max Planck Institute for Informatics, Saarbrücken, Germany, Research Group Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland, Saarbrücken, Germany |
| authorships[1].author.id | https://openalex.org/A5011206838 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-2271-316X |
| authorships[1].author.display_name | Pauli Miettinen |
| authorships[1].countries | FI |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I175532246 |
| authorships[1].affiliations[0].raw_affiliation_string | School of Computing, University of Eastern Finland, Kuopio, Finland |
| authorships[1].institutions[0].id | https://openalex.org/I175532246 |
| authorships[1].institutions[0].ror | https://ror.org/00cyydd11 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I175532246 |
| authorships[1].institutions[0].country_code | FI |
| authorships[1].institutions[0].display_name | University of Eastern Finland |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Pauli Miettinen |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | School of Computing, University of Eastern Finland, Kuopio, Finland |
| authorships[2].author.id | https://openalex.org/A5101632496 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-9445-477X |
| authorships[2].author.display_name | Olga V. Kalinina |
| authorships[2].countries | DE |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I4210142777 |
| authorships[2].affiliations[0].raw_affiliation_string | Research Group Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland, Saarbrücken, Germany |
| authorships[2].affiliations[1].institution_ids | https://openalex.org/I91712215 |
| authorships[2].affiliations[1].raw_affiliation_string | Faculty of Medicine, Saarland University, Saarbrücken, Germany |
| authorships[2].institutions[0].id | https://openalex.org/I4210142777 |
| authorships[2].institutions[0].ror | https://ror.org/042dsac10 |
| authorships[2].institutions[0].type | government |
| authorships[2].institutions[0].lineage | https://openalex.org/I1305996414, https://openalex.org/I4210124929, https://openalex.org/I4210142777 |
| authorships[2].institutions[0].country_code | DE |
| authorships[2].institutions[0].display_name | Helmholtz Institute for Pharmaceutical Research Saarland |
| authorships[2].institutions[1].id | https://openalex.org/I91712215 |
| authorships[2].institutions[1].ror | https://ror.org/01jdpyv68 |
| authorships[2].institutions[1].type | education |
| authorships[2].institutions[1].lineage | https://openalex.org/I91712215 |
| authorships[2].institutions[1].country_code | DE |
| authorships[2].institutions[1].display_name | Saarland University |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Olga V. Kalinina |
| authorships[2].is_corresponding | True |
| authorships[2].raw_affiliation_strings | Faculty of Medicine, Saarland University, Saarbrücken, Germany, Research Group Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland, Saarbrücken, Germany |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://www.biorxiv.org/content/biorxiv/early/2020/05/14/2020.05.14.095695.full.pdf |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Frequent subgraph mining for biologically meaningful structural motifs |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10015 |
| primary_topic.field.id | https://openalex.org/fields/13 |
| primary_topic.field.display_name | Biochemistry, Genetics and Molecular Biology |
| primary_topic.score | 0.9965000152587891 |
| primary_topic.domain.id | https://openalex.org/domains/1 |
| primary_topic.domain.display_name | Life Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1312 |
| primary_topic.subfield.display_name | Molecular Biology |
| primary_topic.display_name | Genomics and Phylogenetic Studies |
| related_works | https://openalex.org/W3079957389, https://openalex.org/W2796517744, https://openalex.org/W15348386, https://openalex.org/W2807926763, https://openalex.org/W2951915565, https://openalex.org/W4205962317, https://openalex.org/W2090574217, https://openalex.org/W2142272337, https://openalex.org/W2123854512, https://openalex.org/W782572168 |
| cited_by_count | 2 |
| counts_by_year[0].year | 2023 |
| counts_by_year[0].cited_by_count | 1 |
| counts_by_year[1].year | 2022 |
| counts_by_year[1].cited_by_count | 1 |
| locations_count | 1 |
| best_oa_location.id | doi:10.1101/2020.05.14.095695 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306402567 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | False |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | bioRxiv (Cold Spring Harbor Laboratory) |
| best_oa_location.source.host_organization | https://openalex.org/I2750212522 |
| best_oa_location.source.host_organization_name | Cold Spring Harbor Laboratory |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I2750212522 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://www.biorxiv.org/content/biorxiv/early/2020/05/14/2020.05.14.095695.full.pdf |
| best_oa_location.version | acceptedVersion |
| best_oa_location.raw_type | posted-content |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.1101/2020.05.14.095695 |
| primary_location.id | doi:10.1101/2020.05.14.095695 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306402567 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | False |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | bioRxiv (Cold Spring Harbor Laboratory) |
| primary_location.source.host_organization | https://openalex.org/I2750212522 |
| primary_location.source.host_organization_name | Cold Spring Harbor Laboratory |
| primary_location.source.host_organization_lineage | https://openalex.org/I2750212522 |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://www.biorxiv.org/content/biorxiv/early/2020/05/14/2020.05.14.095695.full.pdf |
| primary_location.version | acceptedVersion |
| primary_location.raw_type | posted-content |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | True |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.1101/2020.05.14.095695 |
| publication_date | 2020-05-14 |
| publication_year | 2020 |
| referenced_works | https://openalex.org/W2097270746, https://openalex.org/W2061408603, https://openalex.org/W1582561043, https://openalex.org/W2107158607, https://openalex.org/W4256395558, https://openalex.org/W2133452849, https://openalex.org/W2032991899, https://openalex.org/W1966339102, https://openalex.org/W1986154009, https://openalex.org/W2140468273, https://openalex.org/W2025298168, https://openalex.org/W2157236640, https://openalex.org/W2109521274, https://openalex.org/W2109508628, https://openalex.org/W1997168556, https://openalex.org/W1970492290, https://openalex.org/W2066757437, https://openalex.org/W2171172158, https://openalex.org/W2112574017, https://openalex.org/W2123070828, https://openalex.org/W2542218810, https://openalex.org/W2131374321, https://openalex.org/W1984695931, https://openalex.org/W2149026172, https://openalex.org/W2115339329, https://openalex.org/W2089848513, https://openalex.org/W1979871945, https://openalex.org/W2072486135, https://openalex.org/W1976287380, https://openalex.org/W2164281374, https://openalex.org/W2145388307, https://openalex.org/W1970259423, https://openalex.org/W4230616144, https://openalex.org/W2110226069, https://openalex.org/W2022058405, https://openalex.org/W2153153865, https://openalex.org/W2112654127, https://openalex.org/W2346167176, https://openalex.org/W2760622726, https://openalex.org/W2010745002, https://openalex.org/W2052862209, https://openalex.org/W2014316343, https://openalex.org/W2038663942, https://openalex.org/W124504601, https://openalex.org/W2091795018, https://openalex.org/W2040837844, https://openalex.org/W2149322829, https://openalex.org/W2137523774, https://openalex.org/W2800625055, https://openalex.org/W2065832641, https://openalex.org/W2114850508, https://openalex.org/W3003257820, https://openalex.org/W2045119694, https://openalex.org/W2107462316, https://openalex.org/W2085277871, https://openalex.org/W3103145119, https://openalex.org/W2118349699, https://openalex.org/W2170726034 |
| referenced_works_count | 58 |
| abstract_inverted_index.a | 9, 30, 60, 104, 249, 326, 382, 402, 508 |
| abstract_inverted_index.3D | 323, 355, 453 |
| abstract_inverted_index.As | 281 |
| abstract_inverted_index.In | 338, 371 |
| abstract_inverted_index.We | 84, 211, 311, 481 |
| abstract_inverted_index.an | 225, 269, 313, 361, 421 |
| abstract_inverted_index.as | 49, 70, 145, 147, 344, 409, 471, 473 |
| abstract_inverted_index.at | 102 |
| abstract_inverted_index.by | 46, 78, 174, 320, 360 |
| abstract_inverted_index.if | 347 |
| abstract_inverted_index.in | 6, 12, 37, 51, 113, 152, 186, 207, 219, 229, 234, 253, 258, 303, 351, 399, 433, 474, 486, 496, 513 |
| abstract_inverted_index.is | 8, 197, 265 |
| abstract_inverted_index.it | 177, 196 |
| abstract_inverted_index.of | 2, 54, 107, 124, 132, 150, 165, 227, 260, 274, 328, 375, 384, 405, 455 |
| abstract_inverted_index.on | 43 |
| abstract_inverted_index.to | 33, 89, 135, 154, 160, 200, 205, 216, 247, 427, 443, 464, 494 |
| abstract_inverted_index.we | 28, 120, 193, 243, 461, 506 |
| abstract_inverted_index.For | 118, 190 |
| abstract_inverted_index.Our | 411 |
| abstract_inverted_index.all | 91, 217, 379 |
| abstract_inverted_index.and | 58, 143, 159, 223, 298, 330, 346, 388, 420, 446, 489, 502 |
| abstract_inverted_index.are | 76, 94, 99, 306, 308, 342, 358, 386, 392, 396, 407 |
| abstract_inverted_index.can | 178 |
| abstract_inverted_index.for | 137, 449 |
| abstract_inverted_index.in, | 101 |
| abstract_inverted_index.lie | 495 |
| abstract_inverted_index.new | 415 |
| abstract_inverted_index.not | 442 |
| abstract_inverted_index.our | 214, 459 |
| abstract_inverted_index.set | 327 |
| abstract_inverted_index.the | 52, 114, 125, 162, 166, 208, 220, 230, 235, 245, 261, 275, 282, 333, 352, 365, 369, 372, 389, 434 |
| abstract_inverted_index.to, | 97 |
| abstract_inverted_index.two | 122, 348, 414 |
| abstract_inverted_index.use | 85, 244 |
| abstract_inverted_index.was | 171 |
| abstract_inverted_index.(3D) | 300 |
| abstract_inverted_index.Here | 27 |
| abstract_inverted_index.SCOP | 221, 487 |
| abstract_inverted_index.This | 169, 264 |
| abstract_inverted_index.With | 458 |
| abstract_inverted_index.able | 199, 463 |
| abstract_inverted_index.also | 198 |
| abstract_inverted_index.edge | 362, 440 |
| abstract_inverted_index.form | 53 |
| abstract_inverted_index.from | 111, 381, 469 |
| abstract_inverted_index.i.e. | 98, 395 |
| abstract_inverted_index.into | 325 |
| abstract_inverted_index.more | 309, 400 |
| abstract_inverted_index.ones | 390 |
| abstract_inverted_index.only | 429 |
| abstract_inverted_index.open | 271 |
| abstract_inverted_index.same | 115 |
| abstract_inverted_index.show | 194 |
| abstract_inverted_index.site | 233 |
| abstract_inverted_index.size | 164 |
| abstract_inverted_index.such | 35, 108, 318 |
| abstract_inverted_index.tend | 493 |
| abstract_inverted_index.than | 401 |
| abstract_inverted_index.that | 93, 176, 195, 203, 315, 391, 425, 491 |
| abstract_inverted_index.then | 172, 212, 331 |
| abstract_inverted_index.they | 305, 357, 492 |
| abstract_inverted_index.this | 119 |
| abstract_inverted_index.thus | 447 |
| abstract_inverted_index.well | 146, 472 |
| abstract_inverted_index.were | 462 |
| abstract_inverted_index.when | 15 |
| abstract_inverted_index.with | 81, 268, 364 |
| abstract_inverted_index.These | 66 |
| abstract_inverted_index.alone | 23 |
| abstract_inverted_index.core. | 504 |
| abstract_inverted_index.drops | 291 |
| abstract_inverted_index.edges | 79 |
| abstract_inverted_index.found | 224, 257 |
| abstract_inverted_index.given | 105 |
| abstract_inverted_index.known | 467 |
| abstract_inverted_index.least | 103 |
| abstract_inverted_index.match | 444 |
| abstract_inverted_index.motif | 252, 512 |
| abstract_inverted_index.novel | 31, 250 |
| abstract_inverted_index.order | 153 |
| abstract_inverted_index.sites | 501 |
| abstract_inverted_index.small | 138 |
| abstract_inverted_index.step. | 436 |
| abstract_inverted_index.their | 239, 288 |
| abstract_inverted_index.these | 339 |
| abstract_inverted_index.three | 475 |
| abstract_inverted_index.viral | 515 |
| abstract_inverted_index.where | 20 |
| abstract_inverted_index.which | 304 |
| abstract_inverted_index.while | 72 |
| abstract_inverted_index.Author | 279 |
| abstract_inverted_index.allows | 426, 439 |
| abstract_inverted_index.called | 277 |
| abstract_inverted_index.capsid | 255 |
| abstract_inverted_index.found, | 394 |
| abstract_inverted_index.graphs | 50, 329, 385, 406 |
| abstract_inverted_index.labels | 134, 441 |
| abstract_inverted_index.ligand | 231 |
| abstract_inverted_index.listed | 408 |
| abstract_inverted_index.mining | 64, 88, 336 |
| abstract_inverted_index.motifs | 5, 36, 158, 185, 202, 237, 297, 319, 468 |
| abstract_inverted_index.number | 106, 404 |
| abstract_inverted_index.retain | 428 |
| abstract_inverted_index.scheme | 424 |
| abstract_inverted_index.search | 167 |
| abstract_inverted_index.source | 272 |
| abstract_inverted_index.space. | 168 |
| abstract_inverted_index.Finally | 242 |
| abstract_inverted_index.PROSITE | 209 |
| abstract_inverted_index.account | 136, 448 |
| abstract_inverted_index.applied | 213 |
| abstract_inverted_index.becomes | 24 |
| abstract_inverted_index.between | 74, 140, 285, 368, 452 |
| abstract_inverted_index.binding | 232 |
| abstract_inverted_index.certain | 403 |
| abstract_inverted_index.denoted | 77 |
| abstract_inverted_index.diverse | 478 |
| abstract_inverted_index.exactly | 393 |
| abstract_inverted_index.family. | 117 |
| abstract_inverted_index.further | 191 |
| abstract_inverted_index.graphs, | 340 |
| abstract_inverted_index.labeled | 80, 363 |
| abstract_inverted_index.members | 259 |
| abstract_inverted_index.mining, | 378 |
| abstract_inverted_index.mining: | 129 |
| abstract_inverted_index.natural | 450 |
| abstract_inverted_index.present | 29, 507 |
| abstract_inverted_index.problem | 11 |
| abstract_inverted_index.protein | 38, 116, 141, 188, 322, 354, 479, 503 |
| abstract_inverted_index.related | 18, 456 |
| abstract_inverted_index.residue | 55 |
| abstract_inverted_index.scoring | 144, 423 |
| abstract_inverted_index.setting | 374 |
| abstract_inverted_index.summary | 280 |
| abstract_inverted_index.whereas | 293 |
| abstract_inverted_index.without | 41 |
| abstract_inverted_index.Abstract | 0 |
| abstract_inverted_index.PROSITE, | 470 |
| abstract_inverted_index.analysis | 22 |
| abstract_inverted_index.approach | 32, 170, 215, 246, 314, 412 |
| abstract_inverted_index.capsids. | 516 |
| abstract_inverted_index.contacts | 73 |
| abstract_inverted_index.database | 222, 383 |
| abstract_inverted_index.discover | 248 |
| abstract_inverted_index.distance | 284, 367 |
| abstract_inverted_index.exactly, | 445 |
| abstract_inverted_index.frequent | 62, 86, 127, 334, 376 |
| abstract_inverted_index.identify | 34, 155, 201 |
| abstract_inverted_index.identity | 290 |
| abstract_inverted_index.interact | 350 |
| abstract_inverted_index.matching | 131 |
| abstract_inverted_index.modified | 61 |
| abstract_inverted_index.networks | 57, 67, 109 |
| abstract_inverted_index.patterns | 206 |
| abstract_inverted_index.proteins | 7, 19, 256, 286 |
| abstract_inverted_index.rapidly, | 292 |
| abstract_inverted_index.regions: | 499 |
| abstract_inverted_index.relevant | 4, 431 |
| abstract_inverted_index.residues | 69, 75, 228, 341, 349, 485 |
| abstract_inverted_index.selected | 187 |
| abstract_inverted_index.sequence | 21, 44, 289, 296 |
| abstract_inverted_index.subgraph | 63, 87, 95, 128, 335, 377, 397, 432 |
| abstract_inverted_index.together | 267 |
| abstract_inverted_index.vertices | 71 |
| abstract_inverted_index.Euclidean | 82, 366 |
| abstract_inverted_index.RINminer. | 278 |
| abstract_inverted_index.algorithm | 276 |
| abstract_inverted_index.alignment | 45 |
| abstract_inverted_index.approach, | 460 |
| abstract_inverted_index.classical | 126, 373 |
| abstract_inverted_index.concepts: | 416 |
| abstract_inverted_index.connected | 359 |
| abstract_inverted_index.conserved | 157 |
| abstract_inverted_index.contained | 100 |
| abstract_inverted_index.database. | 210 |
| abstract_inverted_index.depending | 42 |
| abstract_inverted_index.determine | 90 |
| abstract_inverted_index.developed | 312 |
| abstract_inverted_index.distantly | 17 |
| abstract_inverted_index.efficient | 270, 422 |
| abstract_inverted_index.embedded, | 307 |
| abstract_inverted_index.employing | 59, 332 |
| abstract_inverted_index.expanding | 163 |
| abstract_inverted_index.extremely | 477 |
| abstract_inverted_index.families. | 189, 480 |
| abstract_inverted_index.filtering | 149 |
| abstract_inverted_index.generated | 110 |
| abstract_inverted_index.important | 183, 295, 484 |
| abstract_inverted_index.introduce | 121 |
| abstract_inverted_index.predicted | 482 |
| abstract_inverted_index.presented | 266 |
| abstract_inverted_index.proteins. | 457 |
| abstract_inverted_index.represent | 68 |
| abstract_inverted_index.residues. | 370 |
| abstract_inverted_index.scaffold, | 302 |
| abstract_inverted_index.subgraphs | 92, 151, 380, 419 |
| abstract_inverted_index.validated | 173 |
| abstract_inverted_index.vertices, | 345 |
| abstract_inverted_index.algorithm. | 65 |
| abstract_inverted_index.conserved. | 310 |
| abstract_inverted_index.converting | 321 |
| abstract_inverted_index.correspond | 204 |
| abstract_inverted_index.counteract | 161 |
| abstract_inverted_index.deviations | 451 |
| abstract_inverted_index.difficult. | 26 |
| abstract_inverted_index.discovered | 236 |
| abstract_inverted_index.distances. | 83 |
| abstract_inverted_index.enrichment | 226 |
| abstract_inverted_index.enumerated | 387 |
| abstract_inverted_index.especially | 14 |
| abstract_inverted_index.evidencing | 238 |
| abstract_inverted_index.extensions | 123 |
| abstract_inverted_index.framework. | 337 |
| abstract_inverted_index.functional | 240 |
| abstract_inverted_index.identifies | 317 |
| abstract_inverted_index.increases, | 287 |
| abstract_inverted_index.introduces | 413 |
| abstract_inverted_index.isomorphic | 96, 418 |
| abstract_inverted_index.jelly-roll | 254, 514 |
| abstract_inverted_index.meaningful | 498 |
| abstract_inverted_index.previously | 180, 509 |
| abstract_inverted_index.rediscover | 179, 466 |
| abstract_inverted_index.structural | 184, 251, 301, 511 |
| abstract_inverted_index.structure, | 356 |
| abstract_inverted_index.structures | 40, 48, 112, 142, 324, 454 |
| abstract_inverted_index.supported. | 410 |
| abstract_inverted_index.unreported | 510 |
| abstract_inverted_index.validation | 192 |
| abstract_inverted_index.variations | 139 |
| abstract_inverted_index.Approximate | 437 |
| abstract_inverted_index.approximate | 130 |
| abstract_inverted_index.considering | 16 |
| abstract_inverted_index.enumeration | 435 |
| abstract_inverted_index.importance. | 241 |
| abstract_inverted_index.interaction | 56 |
| abstract_inverted_index.isomorphic, | 398 |
| abstract_inverted_index.isomorphism | 438 |
| abstract_inverted_index.represented | 343 |
| abstract_inverted_index.score-based | 148 |
| abstract_inverted_index.biologically | 3, 430 |
| abstract_inverted_index.demonstrated | 490 |
| abstract_inverted_index.evolutionary | 283 |
| abstract_inverted_index.functionally | 182, 294, 483 |
| abstract_inverted_index.increasingly | 25 |
| abstract_inverted_index.representing | 47 |
| abstract_inverted_index.structurally | 156, 497 |
| abstract_inverted_index.superfamily. | 263 |
| abstract_inverted_index.well-studied | 476 |
| abstract_inverted_index.Additionally, | 505 |
| abstract_inverted_index.approximately | 417 |
| abstract_inverted_index.automatically | 316, 465 |
| abstract_inverted_index.characterized | 181 |
| abstract_inverted_index.corresponding | 353 |
| abstract_inverted_index.demonstrating | 175 |
| abstract_inverted_index.long-standing | 10 |
| abstract_inverted_index.superfamilies | 218, 488 |
| abstract_inverted_index.Identification | 1 |
| abstract_inverted_index.distance-based | 133 |
| abstract_inverted_index.implementation | 273 |
| abstract_inverted_index.ligand-binding | 500 |
| abstract_inverted_index.bioinformatics, | 13 |
| abstract_inverted_index.picornavirus-like | 262 |
| abstract_inverted_index.three-dimensional | 39, 299 |
| cited_by_percentile_year.max | 94 |
| cited_by_percentile_year.min | 89 |
| corresponding_author_ids | https://openalex.org/A5101632496 |
| countries_distinct_count | 2 |
| institutions_distinct_count | 3 |
| corresponding_institution_ids | https://openalex.org/I4210142777, https://openalex.org/I91712215 |
| citation_normalized_percentile.value | 0.50692972 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |