Transformers models for interpretable and multilevel prediction of protein functions from sequences Article Swipe
Automatic annotation of protein sequences is on the rise to manage the increasing number of experimentally unannotated sequences. First, we investigated the application of the Transformer for enzymatic function prediction. The EnzBert model improves macro-F1 from 41% to 54% compared to the previous state-of-the-art. Furthermore, a comparison of interpretability methods shows that an attention-based approach achieves an F-Gain score of 96.05%, surpassing classical methods (91.44%). Second, the integration of Gene Ontology into function prediction models was explored. Two approaches were tested: integration in the labeling process and the use of hyperbolic embeddings. The results confirm both the effectiveness of the True Path Rule and the superiority of hyperbolic embeddings (mean WFmax: 0.36) compared to the Euclidean model (0.34) in low dimensions (32). They maintain greater consistency with the Gene Ontology (correctly ordered relations: 99.25%-99.28% vs. 78.48%-91.41% for the Euclidean model).
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://www.theses.fr/2023URENS040/document
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4393364811
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4393364811Canonical identifier for this work in OpenAlex
- Title
-
Transformers models for interpretable and multilevel prediction of protein functions from sequencesWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-10-18Full publication date if available
- Authors
-
Nicolas ButonList of authors in order
- Landing page
-
https://www.theses.fr/2023URENS040/documentPublisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://www.theses.fr/2023URENS040/documentDirect OA link when available
- Concepts
-
Humanities, Physics, PhilosophyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4393364811 |
|---|---|
| doi | |
| ids.openalex | https://openalex.org/W4393364811 |
| fwci | |
| type | preprint |
| title | Transformers models for interpretable and multilevel prediction of protein functions from sequences |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11710 |
| topics[0].field.id | https://openalex.org/fields/13 |
| topics[0].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[0].score | 0.8073999881744385 |
| topics[0].domain.id | https://openalex.org/domains/1 |
| topics[0].domain.display_name | Life Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1312 |
| topics[0].subfield.display_name | Molecular Biology |
| topics[0].display_name | Biomedical Text Mining and Ontologies |
| topics[1].id | https://openalex.org/T12254 |
| topics[1].field.id | https://openalex.org/fields/13 |
| topics[1].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[1].score | 0.7681999802589417 |
| topics[1].domain.id | https://openalex.org/domains/1 |
| topics[1].domain.display_name | Life Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1312 |
| topics[1].subfield.display_name | Molecular Biology |
| topics[1].display_name | Machine Learning in Bioinformatics |
| topics[2].id | https://openalex.org/T11063 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.6809999942779541 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1703 |
| topics[2].subfield.display_name | Computational Theory and Mathematics |
| topics[2].display_name | Rough Sets and Fuzzy Logic |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C15708023 |
| concepts[0].level | 1 |
| concepts[0].score | 0.45355090498924255 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q80083 |
| concepts[0].display_name | Humanities |
| concepts[1].id | https://openalex.org/C121332964 |
| concepts[1].level | 0 |
| concepts[1].score | 0.3941269516944885 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[1].display_name | Physics |
| concepts[2].id | https://openalex.org/C138885662 |
| concepts[2].level | 0 |
| concepts[2].score | 0.34784016013145447 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[2].display_name | Philosophy |
| keywords[0].id | https://openalex.org/keywords/humanities |
| keywords[0].score | 0.45355090498924255 |
| keywords[0].display_name | Humanities |
| keywords[1].id | https://openalex.org/keywords/physics |
| keywords[1].score | 0.3941269516944885 |
| keywords[1].display_name | Physics |
| keywords[2].id | https://openalex.org/keywords/philosophy |
| keywords[2].score | 0.34784016013145447 |
| keywords[2].display_name | Philosophy |
| language | en |
| locations[0].id | pmh:2023URENS040 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400553 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | False |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | Munich Personal RePEc Archive (Ludwig Maximilian University of Munich) |
| locations[0].source.host_organization | https://openalex.org/I8204097 |
| locations[0].source.host_organization_name | Ludwig-Maximilians-Universität München |
| locations[0].source.host_organization_lineage | https://openalex.org/I8204097 |
| locations[0].license | other-oa |
| locations[0].pdf_url | |
| locations[0].version | submittedVersion |
| locations[0].raw_type | Electronic Thesis or Dissertation |
| locations[0].license_id | https://openalex.org/licenses/other-oa |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://www.theses.fr/2023URENS040/document |
| locations[1].id | pmh:oai:HAL:tel-04347632v1 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306402512 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | False |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | HAL (Le Centre pour la Communication Scientifique Directe) |
| locations[1].source.host_organization | https://openalex.org/I1294671590 |
| locations[1].source.host_organization_name | Centre National de la Recherche Scientifique |
| locations[1].source.host_organization_lineage | https://openalex.org/I1294671590 |
| locations[1].license | other-oa |
| locations[1].pdf_url | |
| locations[1].version | submittedVersion |
| locations[1].raw_type | Theses |
| locations[1].license_id | https://openalex.org/licenses/other-oa |
| locations[1].is_accepted | False |
| locations[1].is_published | False |
| locations[1].raw_source_name | Bioinformatics [q-bio.QM]. Université de Rennes, 2023. English. ⟨NNT : 2023URENS040⟩ |
| locations[1].landing_page_url | https://theses.hal.science/tel-04347632 |
| locations[2].id | pmh:oai:ori-oai-repository.univ-rennes1.fr:rennes1-ori-wf-1-18435 |
| locations[2].is_oa | False |
| locations[2].source.id | https://openalex.org/S4377196828 |
| locations[2].source.issn | |
| locations[2].source.type | repository |
| locations[2].source.is_oa | False |
| locations[2].source.issn_l | |
| locations[2].source.is_core | False |
| locations[2].source.is_in_doaj | False |
| locations[2].source.display_name | ORI-OAI warehouse (University of Rennes) |
| locations[2].source.host_organization | https://openalex.org/I56067802 |
| locations[2].source.host_organization_name | Université de Rennes |
| locations[2].source.host_organization_lineage | https://openalex.org/I56067802 |
| locations[2].license | |
| locations[2].pdf_url | |
| locations[2].version | submittedVersion |
| locations[2].raw_type | Electronic Thesis or Dissertation |
| locations[2].license_id | |
| locations[2].is_accepted | False |
| locations[2].is_published | False |
| locations[2].raw_source_name | |
| locations[2].landing_page_url | https://ged.univ-rennes1.fr/nuxeo/site/esupversions/2c1f732e-0e20-4b8f-821e-b0ac0f0a3276 |
| authorships[0].author.id | https://openalex.org/A5030004467 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-4079-5501 |
| authorships[0].author.display_name | Nicolas Buton |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Buton, Nicolas |
| authorships[0].is_corresponding | True |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | http://www.theses.fr/2023URENS040/document |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-04-01T00:00:00 |
| display_name | Transformers models for interpretable and multilevel prediction of protein functions from sequences |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T04:12:42.849631 |
| primary_topic.id | https://openalex.org/T11710 |
| primary_topic.field.id | https://openalex.org/fields/13 |
| primary_topic.field.display_name | Biochemistry, Genetics and Molecular Biology |
| primary_topic.score | 0.8073999881744385 |
| primary_topic.domain.id | https://openalex.org/domains/1 |
| primary_topic.domain.display_name | Life Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1312 |
| primary_topic.subfield.display_name | Molecular Biology |
| primary_topic.display_name | Biomedical Text Mining and Ontologies |
| related_works | https://openalex.org/W2748952813, https://openalex.org/W2935759653, https://openalex.org/W3105167352, https://openalex.org/W54078636, https://openalex.org/W2954470139, https://openalex.org/W1501425562, https://openalex.org/W2902782467, https://openalex.org/W3084825885, https://openalex.org/W2298861036, https://openalex.org/W3148032049 |
| cited_by_count | 0 |
| locations_count | 3 |
| best_oa_location.id | pmh:2023URENS040 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400553 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | False |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | Munich Personal RePEc Archive (Ludwig Maximilian University of Munich) |
| best_oa_location.source.host_organization | https://openalex.org/I8204097 |
| best_oa_location.source.host_organization_name | Ludwig-Maximilians-Universität München |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I8204097 |
| best_oa_location.license | other-oa |
| best_oa_location.pdf_url | |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | Electronic Thesis or Dissertation |
| best_oa_location.license_id | https://openalex.org/licenses/other-oa |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://www.theses.fr/2023URENS040/document |
| primary_location.id | pmh:2023URENS040 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400553 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | False |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | Munich Personal RePEc Archive (Ludwig Maximilian University of Munich) |
| primary_location.source.host_organization | https://openalex.org/I8204097 |
| primary_location.source.host_organization_name | Ludwig-Maximilians-Universität München |
| primary_location.source.host_organization_lineage | https://openalex.org/I8204097 |
| primary_location.license | other-oa |
| primary_location.pdf_url | |
| primary_location.version | submittedVersion |
| primary_location.raw_type | Electronic Thesis or Dissertation |
| primary_location.license_id | https://openalex.org/licenses/other-oa |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://www.theses.fr/2023URENS040/document |
| publication_date | 2023-10-18 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 45 |
| abstract_inverted_index.an | 52, 56 |
| abstract_inverted_index.in | 82, 118 |
| abstract_inverted_index.is | 5 |
| abstract_inverted_index.of | 2, 14, 23, 47, 59, 68, 89, 98, 106 |
| abstract_inverted_index.on | 6 |
| abstract_inverted_index.to | 9, 37, 40, 113 |
| abstract_inverted_index.we | 19 |
| abstract_inverted_index.41% | 36 |
| abstract_inverted_index.54% | 38 |
| abstract_inverted_index.The | 30, 92 |
| abstract_inverted_index.Two | 77 |
| abstract_inverted_index.and | 86, 103 |
| abstract_inverted_index.for | 26, 136 |
| abstract_inverted_index.low | 119 |
| abstract_inverted_index.the | 7, 11, 21, 24, 41, 66, 83, 87, 96, 99, 104, 114, 127, 137 |
| abstract_inverted_index.use | 88 |
| abstract_inverted_index.vs. | 134 |
| abstract_inverted_index.was | 75 |
| abstract_inverted_index.Gene | 69, 128 |
| abstract_inverted_index.Path | 101 |
| abstract_inverted_index.Rule | 102 |
| abstract_inverted_index.They | 122 |
| abstract_inverted_index.True | 100 |
| abstract_inverted_index.both | 95 |
| abstract_inverted_index.from | 35 |
| abstract_inverted_index.into | 71 |
| abstract_inverted_index.rise | 8 |
| abstract_inverted_index.that | 51 |
| abstract_inverted_index.were | 79 |
| abstract_inverted_index.with | 126 |
| abstract_inverted_index.(32). | 121 |
| abstract_inverted_index.(mean | 109 |
| abstract_inverted_index.0.36) | 111 |
| abstract_inverted_index.model | 32, 116 |
| abstract_inverted_index.score | 58 |
| abstract_inverted_index.shows | 50 |
| abstract_inverted_index.(0.34) | 117 |
| abstract_inverted_index.F-Gain | 57 |
| abstract_inverted_index.First, | 18 |
| abstract_inverted_index.WFmax: | 110 |
| abstract_inverted_index.manage | 10 |
| abstract_inverted_index.models | 74 |
| abstract_inverted_index.number | 13 |
| abstract_inverted_index.96.05%, | 60 |
| abstract_inverted_index.EnzBert | 31 |
| abstract_inverted_index.Second, | 65 |
| abstract_inverted_index.confirm | 94 |
| abstract_inverted_index.greater | 124 |
| abstract_inverted_index.methods | 49, 63 |
| abstract_inverted_index.model). | 139 |
| abstract_inverted_index.ordered | 131 |
| abstract_inverted_index.process | 85 |
| abstract_inverted_index.protein | 3 |
| abstract_inverted_index.results | 93 |
| abstract_inverted_index.tested: | 80 |
| abstract_inverted_index.Ontology | 70, 129 |
| abstract_inverted_index.achieves | 55 |
| abstract_inverted_index.approach | 54 |
| abstract_inverted_index.compared | 39, 112 |
| abstract_inverted_index.function | 28, 72 |
| abstract_inverted_index.improves | 33 |
| abstract_inverted_index.labeling | 84 |
| abstract_inverted_index.macro-F1 | 34 |
| abstract_inverted_index.maintain | 123 |
| abstract_inverted_index.previous | 42 |
| abstract_inverted_index.(91.44%). | 64 |
| abstract_inverted_index.Automatic | 0 |
| abstract_inverted_index.Euclidean | 115, 138 |
| abstract_inverted_index.classical | 62 |
| abstract_inverted_index.enzymatic | 27 |
| abstract_inverted_index.explored. | 76 |
| abstract_inverted_index.sequences | 4 |
| abstract_inverted_index.(correctly | 130 |
| abstract_inverted_index.annotation | 1 |
| abstract_inverted_index.approaches | 78 |
| abstract_inverted_index.comparison | 46 |
| abstract_inverted_index.dimensions | 120 |
| abstract_inverted_index.embeddings | 108 |
| abstract_inverted_index.hyperbolic | 90, 107 |
| abstract_inverted_index.increasing | 12 |
| abstract_inverted_index.prediction | 73 |
| abstract_inverted_index.relations: | 132 |
| abstract_inverted_index.sequences. | 17 |
| abstract_inverted_index.surpassing | 61 |
| abstract_inverted_index.Transformer | 25 |
| abstract_inverted_index.application | 22 |
| abstract_inverted_index.consistency | 125 |
| abstract_inverted_index.embeddings. | 91 |
| abstract_inverted_index.integration | 67, 81 |
| abstract_inverted_index.prediction. | 29 |
| abstract_inverted_index.superiority | 105 |
| abstract_inverted_index.unannotated | 16 |
| abstract_inverted_index.Furthermore, | 44 |
| abstract_inverted_index.investigated | 20 |
| abstract_inverted_index.78.48%-91.41% | 135 |
| abstract_inverted_index.99.25%-99.28% | 133 |
| abstract_inverted_index.effectiveness | 97 |
| abstract_inverted_index.experimentally | 15 |
| abstract_inverted_index.attention-based | 53 |
| abstract_inverted_index.interpretability | 48 |
| abstract_inverted_index.state-of-the-art. | 43 |
| cited_by_percentile_year | |
| corresponding_author_ids | https://openalex.org/A5030004467 |
| countries_distinct_count | 0 |
| institutions_distinct_count | 1 |
| citation_normalized_percentile |