Benchmarking Protein Language Models for Protein Crystallization Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.1101/2024.09.02.610758
The problem of protein structure determination is usually solved by X-ray crystallography. Several in silico deep learning methods have been developed to overcome the high attrition rate, cost of experiments and extensive trial-and-error settings, for the predicting the crystallization propensities of proteins based on their sequences. In this work, we benchmark the power of open protein language models (PLMs) through the TRILL platform, a bespoke framework democratizing the usage of PLMs for the task of predicting crystallization propensities of proteins. By comparing LightGBM / XGBoost classifiers built on the embedding representations learned by different PLMs, such as ESM2, Ankh, ProtT5- XL, ProstT5, with the performance of state-of-the-art sequence-based methods like DeepCrystal, ATTCrys and CLPred, we identify the most effective methods for predicting crystallization outcomes. The LightGBM classifiers utilizing embeddings from ESM2 model with 30 and 36 transformer layers and 150 and 3,000 million parameters respectively have performance gains by 3 - 5% then all compared models for various evaluation metrics, including AUPR (Area Under Precision-Recall Curve), AUC (Area Under the Receiver Operating Characteristic Curve), and F1 score on independent test sets. Furthermore, we fine-tune the ProtGPT2 model available via TRILL to generate crystallizable proteins. Starting with 3, 000 generated proteins and through a step of filtration processes including consensus of all open PLM- based classifiers, sequence identity through CD-HIT, secondary structure compatibility, aggregation screening, homology search and foldability evaluation, we identified a set of 5 novel proteins as potentially crystallizable.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- https://doi.org/10.1101/2024.09.02.610758
- https://www.biorxiv.org/content/biorxiv/early/2024/09/03/2024.09.02.610758.full.pdf
- OA Status
- green
- References
- 57
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4402179645
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4402179645Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1101/2024.09.02.610758Digital Object Identifier
- Title
-
Benchmarking Protein Language Models for Protein CrystallizationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-09-03Full publication date if available
- Authors
-
Raghvendra Mall, Rahul Kaushik, Zachary A. Martinez, Matt Thomson, Filippo CastiglioneList of authors in order
- Landing page
-
https://doi.org/10.1101/2024.09.02.610758Publisher landing page
- PDF URL
-
https://www.biorxiv.org/content/biorxiv/early/2024/09/03/2024.09.02.610758.full.pdfDirect link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://www.biorxiv.org/content/biorxiv/early/2024/09/03/2024.09.02.610758.full.pdfDirect OA link when available
- Concepts
-
Benchmarking, Crystallization, Computer science, Natural language processing, Chemistry, Business, Organic chemistry, MarketingTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- References (count)
-
57Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4402179645 |
|---|---|
| doi | https://doi.org/10.1101/2024.09.02.610758 |
| ids.doi | https://doi.org/10.1101/2024.09.02.610758 |
| ids.openalex | https://openalex.org/W4402179645 |
| fwci | 0.0 |
| type | preprint |
| title | Benchmarking Protein Language Models for Protein Crystallization |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11710 |
| topics[0].field.id | https://openalex.org/fields/13 |
| topics[0].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[0].score | 0.9222999811172485 |
| topics[0].domain.id | https://openalex.org/domains/1 |
| topics[0].domain.display_name | Life Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1312 |
| topics[0].subfield.display_name | Molecular Biology |
| topics[0].display_name | Biomedical Text Mining and Ontologies |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C86251818 |
| concepts[0].level | 2 |
| concepts[0].score | 0.798560619354248 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q816754 |
| concepts[0].display_name | Benchmarking |
| concepts[1].id | https://openalex.org/C203036418 |
| concepts[1].level | 2 |
| concepts[1].score | 0.5800126791000366 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q284256 |
| concepts[1].display_name | Crystallization |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.42859119176864624 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C204321447 |
| concepts[3].level | 1 |
| concepts[3].score | 0.3492257595062256 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[3].display_name | Natural language processing |
| concepts[4].id | https://openalex.org/C185592680 |
| concepts[4].level | 0 |
| concepts[4].score | 0.28289544582366943 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q2329 |
| concepts[4].display_name | Chemistry |
| concepts[5].id | https://openalex.org/C144133560 |
| concepts[5].level | 0 |
| concepts[5].score | 0.20183336734771729 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q4830453 |
| concepts[5].display_name | Business |
| concepts[6].id | https://openalex.org/C178790620 |
| concepts[6].level | 1 |
| concepts[6].score | 0.06323468685150146 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11351 |
| concepts[6].display_name | Organic chemistry |
| concepts[7].id | https://openalex.org/C162853370 |
| concepts[7].level | 1 |
| concepts[7].score | 0.0 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q39809 |
| concepts[7].display_name | Marketing |
| keywords[0].id | https://openalex.org/keywords/benchmarking |
| keywords[0].score | 0.798560619354248 |
| keywords[0].display_name | Benchmarking |
| keywords[1].id | https://openalex.org/keywords/crystallization |
| keywords[1].score | 0.5800126791000366 |
| keywords[1].display_name | Crystallization |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.42859119176864624 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/natural-language-processing |
| keywords[3].score | 0.3492257595062256 |
| keywords[3].display_name | Natural language processing |
| keywords[4].id | https://openalex.org/keywords/chemistry |
| keywords[4].score | 0.28289544582366943 |
| keywords[4].display_name | Chemistry |
| keywords[5].id | https://openalex.org/keywords/business |
| keywords[5].score | 0.20183336734771729 |
| keywords[5].display_name | Business |
| keywords[6].id | https://openalex.org/keywords/organic-chemistry |
| keywords[6].score | 0.06323468685150146 |
| keywords[6].display_name | Organic chemistry |
| language | en |
| locations[0].id | doi:10.1101/2024.09.02.610758 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306402567 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | False |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | bioRxiv (Cold Spring Harbor Laboratory) |
| locations[0].source.host_organization | https://openalex.org/I2750212522 |
| locations[0].source.host_organization_name | Cold Spring Harbor Laboratory |
| locations[0].source.host_organization_lineage | https://openalex.org/I2750212522 |
| locations[0].license | cc-by-nc-nd |
| locations[0].pdf_url | https://www.biorxiv.org/content/biorxiv/early/2024/09/03/2024.09.02.610758.full.pdf |
| locations[0].version | acceptedVersion |
| locations[0].raw_type | posted-content |
| locations[0].license_id | https://openalex.org/licenses/cc-by-nc-nd |
| locations[0].is_accepted | True |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.1101/2024.09.02.610758 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5055826525 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-1779-3150 |
| authorships[0].author.display_name | Raghvendra Mall |
| authorships[0].countries | AE |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I4210087059 |
| authorships[0].affiliations[0].raw_affiliation_string | Technology Innovation Institute |
| authorships[0].institutions[0].id | https://openalex.org/I4210087059 |
| authorships[0].institutions[0].ror | https://ror.org/001kv2y39 |
| authorships[0].institutions[0].type | facility |
| authorships[0].institutions[0].lineage | https://openalex.org/I4210087059 |
| authorships[0].institutions[0].country_code | AE |
| authorships[0].institutions[0].display_name | Technology Innovation Institute |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Raghvendra Mall |
| authorships[0].is_corresponding | True |
| authorships[0].raw_affiliation_strings | Technology Innovation Institute |
| authorships[1].author.id | https://openalex.org/A5025172190 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-6489-6913 |
| authorships[1].author.display_name | Rahul Kaushik |
| authorships[1].countries | AE |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I4210087059 |
| authorships[1].affiliations[0].raw_affiliation_string | Technology Innovation Institute Abu Dhabi UAE |
| authorships[1].institutions[0].id | https://openalex.org/I4210087059 |
| authorships[1].institutions[0].ror | https://ror.org/001kv2y39 |
| authorships[1].institutions[0].type | facility |
| authorships[1].institutions[0].lineage | https://openalex.org/I4210087059 |
| authorships[1].institutions[0].country_code | AE |
| authorships[1].institutions[0].display_name | Technology Innovation Institute |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Rahul Kaushik |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | Technology Innovation Institute Abu Dhabi UAE |
| authorships[2].author.id | https://openalex.org/A5078042210 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-7830-3162 |
| authorships[2].author.display_name | Zachary A. Martinez |
| authorships[2].countries | US |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I122411786 |
| authorships[2].affiliations[0].raw_affiliation_string | California Institute of Technology, U.S.A. |
| authorships[2].institutions[0].id | https://openalex.org/I122411786 |
| authorships[2].institutions[0].ror | https://ror.org/05dxps055 |
| authorships[2].institutions[0].type | education |
| authorships[2].institutions[0].lineage | https://openalex.org/I122411786 |
| authorships[2].institutions[0].country_code | US |
| authorships[2].institutions[0].display_name | California Institute of Technology |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Zachary A. Martinez |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | California Institute of Technology, U.S.A. |
| authorships[3].author.id | https://openalex.org/A5014506392 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-1021-1234 |
| authorships[3].author.display_name | Matt Thomson |
| authorships[3].countries | US |
| authorships[3].affiliations[0].institution_ids | https://openalex.org/I122411786 |
| authorships[3].affiliations[0].raw_affiliation_string | California Institute of Technology, U.S.A. |
| authorships[3].institutions[0].id | https://openalex.org/I122411786 |
| authorships[3].institutions[0].ror | https://ror.org/05dxps055 |
| authorships[3].institutions[0].type | education |
| authorships[3].institutions[0].lineage | https://openalex.org/I122411786 |
| authorships[3].institutions[0].country_code | US |
| authorships[3].institutions[0].display_name | California Institute of Technology |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Matt Thomson |
| authorships[3].is_corresponding | False |
| authorships[3].raw_affiliation_strings | California Institute of Technology, U.S.A. |
| authorships[4].author.id | https://openalex.org/A5010454141 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-1442-3552 |
| authorships[4].author.display_name | Filippo Castiglione |
| authorships[4].countries | AE |
| authorships[4].affiliations[0].institution_ids | https://openalex.org/I4210087059 |
| authorships[4].affiliations[0].raw_affiliation_string | Technology Innovation Institute, U.A.E. |
| authorships[4].institutions[0].id | https://openalex.org/I4210087059 |
| authorships[4].institutions[0].ror | https://ror.org/001kv2y39 |
| authorships[4].institutions[0].type | facility |
| authorships[4].institutions[0].lineage | https://openalex.org/I4210087059 |
| authorships[4].institutions[0].country_code | AE |
| authorships[4].institutions[0].display_name | Technology Innovation Institute |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Filippo Castiglione |
| authorships[4].is_corresponding | False |
| authorships[4].raw_affiliation_strings | Technology Innovation Institute, U.A.E. |
| has_content.pdf | True |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://www.biorxiv.org/content/biorxiv/early/2024/09/03/2024.09.02.610758.full.pdf |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Benchmarking Protein Language Models for Protein Crystallization |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T11710 |
| primary_topic.field.id | https://openalex.org/fields/13 |
| primary_topic.field.display_name | Biochemistry, Genetics and Molecular Biology |
| primary_topic.score | 0.9222999811172485 |
| primary_topic.domain.id | https://openalex.org/domains/1 |
| primary_topic.domain.display_name | Life Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1312 |
| primary_topic.subfield.display_name | Molecular Biology |
| primary_topic.display_name | Biomedical Text Mining and Ontologies |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W4238897586, https://openalex.org/W435179959, https://openalex.org/W2619091065, https://openalex.org/W2059640416, https://openalex.org/W1490753184, https://openalex.org/W2284465472, https://openalex.org/W2291782699, https://openalex.org/W1993948687 |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.1101/2024.09.02.610758 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306402567 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | False |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | bioRxiv (Cold Spring Harbor Laboratory) |
| best_oa_location.source.host_organization | https://openalex.org/I2750212522 |
| best_oa_location.source.host_organization_name | Cold Spring Harbor Laboratory |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I2750212522 |
| best_oa_location.license | cc-by-nc-nd |
| best_oa_location.pdf_url | https://www.biorxiv.org/content/biorxiv/early/2024/09/03/2024.09.02.610758.full.pdf |
| best_oa_location.version | acceptedVersion |
| best_oa_location.raw_type | posted-content |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by-nc-nd |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.1101/2024.09.02.610758 |
| primary_location.id | doi:10.1101/2024.09.02.610758 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306402567 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | False |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | bioRxiv (Cold Spring Harbor Laboratory) |
| primary_location.source.host_organization | https://openalex.org/I2750212522 |
| primary_location.source.host_organization_name | Cold Spring Harbor Laboratory |
| primary_location.source.host_organization_lineage | https://openalex.org/I2750212522 |
| primary_location.license | cc-by-nc-nd |
| primary_location.pdf_url | https://www.biorxiv.org/content/biorxiv/early/2024/09/03/2024.09.02.610758.full.pdf |
| primary_location.version | acceptedVersion |
| primary_location.raw_type | posted-content |
| primary_location.license_id | https://openalex.org/licenses/cc-by-nc-nd |
| primary_location.is_accepted | True |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.1101/2024.09.02.610758 |
| publication_date | 2024-09-03 |
| publication_year | 2024 |
| referenced_works | https://openalex.org/W4390660695, https://openalex.org/W1965791574, https://openalex.org/W2509245132, https://openalex.org/W3177828909, https://openalex.org/W1678356000, https://openalex.org/W3177500196, https://openalex.org/W4301501800, https://openalex.org/W2158714788, https://openalex.org/W2137226992, https://openalex.org/W2112796928, https://openalex.org/W4391652655, https://openalex.org/W4392651426, https://openalex.org/W4309633647, https://openalex.org/W2786672974, https://openalex.org/W2082698293, https://openalex.org/W4387499716, https://openalex.org/W4290546426, https://openalex.org/W3195801729, https://openalex.org/W3114115943, https://openalex.org/W2794004073, https://openalex.org/W2900903885, https://openalex.org/W4388189340, https://openalex.org/W2169041564, https://openalex.org/W2440774392, https://openalex.org/W4392725875, https://openalex.org/W4287724045, https://openalex.org/W3200775447, https://openalex.org/W2768348081, https://openalex.org/W1554093359, https://openalex.org/W4399555188, https://openalex.org/W2776730476, https://openalex.org/W1998917415, https://openalex.org/W2328398973, https://openalex.org/W2937569020, https://openalex.org/W4256301573, https://openalex.org/W2766578745, https://openalex.org/W2170240176, https://openalex.org/W3183475563, https://openalex.org/W4317374308, https://openalex.org/W2098101628, https://openalex.org/W2980279728, https://openalex.org/W2156125289, https://openalex.org/W2102993309, https://openalex.org/W3133057805, https://openalex.org/W2136299394, https://openalex.org/W2963374347, https://openalex.org/W2743980493, https://openalex.org/W4385245566, https://openalex.org/W2295598076, https://openalex.org/W2102461176, https://openalex.org/W2946620820, https://openalex.org/W4311677627, https://openalex.org/W4327550249, https://openalex.org/W2941112903, https://openalex.org/W2166744107, https://openalex.org/W1970247646, https://openalex.org/W4288066876 |
| referenced_works_count | 57 |
| abstract_inverted_index.- | 150 |
| abstract_inverted_index./ | 83 |
| abstract_inverted_index.3 | 149 |
| abstract_inverted_index.5 | 234 |
| abstract_inverted_index.a | 63, 202, 231 |
| abstract_inverted_index.3, | 196 |
| abstract_inverted_index.30 | 133 |
| abstract_inverted_index.36 | 135 |
| abstract_inverted_index.5% | 151 |
| abstract_inverted_index.By | 80 |
| abstract_inverted_index.F1 | 175 |
| abstract_inverted_index.In | 46 |
| abstract_inverted_index.as | 96, 237 |
| abstract_inverted_index.by | 9, 92, 148 |
| abstract_inverted_index.in | 13 |
| abstract_inverted_index.is | 6 |
| abstract_inverted_index.of | 2, 28, 40, 53, 69, 74, 78, 105, 204, 209, 233 |
| abstract_inverted_index.on | 43, 87, 177 |
| abstract_inverted_index.to | 21, 190 |
| abstract_inverted_index.we | 49, 114, 182, 229 |
| abstract_inverted_index.000 | 197 |
| abstract_inverted_index.150 | 139 |
| abstract_inverted_index.AUC | 166 |
| abstract_inverted_index.The | 0, 124 |
| abstract_inverted_index.XL, | 100 |
| abstract_inverted_index.all | 153, 210 |
| abstract_inverted_index.and | 30, 112, 134, 138, 140, 174, 200, 226 |
| abstract_inverted_index.for | 34, 71, 120, 156 |
| abstract_inverted_index.set | 232 |
| abstract_inverted_index.the | 23, 35, 37, 51, 60, 67, 72, 88, 103, 116, 169, 184 |
| abstract_inverted_index.via | 188 |
| abstract_inverted_index.AUPR | 161 |
| abstract_inverted_index.ESM2 | 130 |
| abstract_inverted_index.PLM- | 212 |
| abstract_inverted_index.PLMs | 70 |
| abstract_inverted_index.been | 19 |
| abstract_inverted_index.cost | 27 |
| abstract_inverted_index.deep | 15 |
| abstract_inverted_index.from | 129 |
| abstract_inverted_index.have | 18, 145 |
| abstract_inverted_index.high | 24 |
| abstract_inverted_index.like | 109 |
| abstract_inverted_index.most | 117 |
| abstract_inverted_index.open | 54, 211 |
| abstract_inverted_index.step | 203 |
| abstract_inverted_index.such | 95 |
| abstract_inverted_index.task | 73 |
| abstract_inverted_index.test | 179 |
| abstract_inverted_index.then | 152 |
| abstract_inverted_index.this | 47 |
| abstract_inverted_index.with | 102, 132, 195 |
| abstract_inverted_index.(Area | 162, 167 |
| abstract_inverted_index.3,000 | 141 |
| abstract_inverted_index.Ankh, | 98 |
| abstract_inverted_index.ESM2, | 97 |
| abstract_inverted_index.PLMs, | 94 |
| abstract_inverted_index.TRILL | 61, 189 |
| abstract_inverted_index.Under | 163, 168 |
| abstract_inverted_index.X-ray | 10 |
| abstract_inverted_index.based | 42, 213 |
| abstract_inverted_index.built | 86 |
| abstract_inverted_index.gains | 147 |
| abstract_inverted_index.model | 131, 186 |
| abstract_inverted_index.novel | 235 |
| abstract_inverted_index.power | 52 |
| abstract_inverted_index.rate, | 26 |
| abstract_inverted_index.score | 176 |
| abstract_inverted_index.sets. | 180 |
| abstract_inverted_index.their | 44 |
| abstract_inverted_index.usage | 68 |
| abstract_inverted_index.work, | 48 |
| abstract_inverted_index.(PLMs) | 58 |
| abstract_inverted_index.layers | 137 |
| abstract_inverted_index.models | 57, 155 |
| abstract_inverted_index.search | 225 |
| abstract_inverted_index.silico | 14 |
| abstract_inverted_index.solved | 8 |
| abstract_inverted_index.ATTCrys | 111 |
| abstract_inverted_index.CD-HIT, | 218 |
| abstract_inverted_index.CLPred, | 113 |
| abstract_inverted_index.Curve), | 165, 173 |
| abstract_inverted_index.ProtT5- | 99 |
| abstract_inverted_index.Several | 12 |
| abstract_inverted_index.XGBoost | 84 |
| abstract_inverted_index.bespoke | 64 |
| abstract_inverted_index.learned | 91 |
| abstract_inverted_index.methods | 17, 108, 119 |
| abstract_inverted_index.million | 142 |
| abstract_inverted_index.problem | 1 |
| abstract_inverted_index.protein | 3, 55 |
| abstract_inverted_index.through | 59, 201, 217 |
| abstract_inverted_index.usually | 7 |
| abstract_inverted_index.various | 157 |
| abstract_inverted_index.LightGBM | 82, 125 |
| abstract_inverted_index.ProstT5, | 101 |
| abstract_inverted_index.ProtGPT2 | 185 |
| abstract_inverted_index.Receiver | 170 |
| abstract_inverted_index.Starting | 194 |
| abstract_inverted_index.compared | 154 |
| abstract_inverted_index.generate | 191 |
| abstract_inverted_index.homology | 224 |
| abstract_inverted_index.identify | 115 |
| abstract_inverted_index.identity | 216 |
| abstract_inverted_index.language | 56 |
| abstract_inverted_index.learning | 16 |
| abstract_inverted_index.metrics, | 159 |
| abstract_inverted_index.overcome | 22 |
| abstract_inverted_index.proteins | 41, 199, 236 |
| abstract_inverted_index.sequence | 215 |
| abstract_inverted_index.Operating | 171 |
| abstract_inverted_index.attrition | 25 |
| abstract_inverted_index.available | 187 |
| abstract_inverted_index.benchmark | 50 |
| abstract_inverted_index.comparing | 81 |
| abstract_inverted_index.consensus | 208 |
| abstract_inverted_index.developed | 20 |
| abstract_inverted_index.different | 93 |
| abstract_inverted_index.effective | 118 |
| abstract_inverted_index.embedding | 89 |
| abstract_inverted_index.extensive | 31 |
| abstract_inverted_index.fine-tune | 183 |
| abstract_inverted_index.framework | 65 |
| abstract_inverted_index.generated | 198 |
| abstract_inverted_index.including | 160, 207 |
| abstract_inverted_index.outcomes. | 123 |
| abstract_inverted_index.platform, | 62 |
| abstract_inverted_index.processes | 206 |
| abstract_inverted_index.proteins. | 79, 193 |
| abstract_inverted_index.secondary | 219 |
| abstract_inverted_index.settings, | 33 |
| abstract_inverted_index.structure | 4, 220 |
| abstract_inverted_index.utilizing | 127 |
| abstract_inverted_index.embeddings | 128 |
| abstract_inverted_index.evaluation | 158 |
| abstract_inverted_index.filtration | 205 |
| abstract_inverted_index.identified | 230 |
| abstract_inverted_index.parameters | 143 |
| abstract_inverted_index.predicting | 36, 75, 121 |
| abstract_inverted_index.screening, | 223 |
| abstract_inverted_index.sequences. | 45 |
| abstract_inverted_index.aggregation | 222 |
| abstract_inverted_index.classifiers | 85, 126 |
| abstract_inverted_index.evaluation, | 228 |
| abstract_inverted_index.experiments | 29 |
| abstract_inverted_index.foldability | 227 |
| abstract_inverted_index.independent | 178 |
| abstract_inverted_index.performance | 104, 146 |
| abstract_inverted_index.potentially | 238 |
| abstract_inverted_index.transformer | 136 |
| abstract_inverted_index.DeepCrystal, | 110 |
| abstract_inverted_index.Furthermore, | 181 |
| abstract_inverted_index.classifiers, | 214 |
| abstract_inverted_index.propensities | 39, 77 |
| abstract_inverted_index.respectively | 144 |
| abstract_inverted_index.democratizing | 66 |
| abstract_inverted_index.determination | 5 |
| abstract_inverted_index.Characteristic | 172 |
| abstract_inverted_index.compatibility, | 221 |
| abstract_inverted_index.crystallizable | 192 |
| abstract_inverted_index.sequence-based | 107 |
| abstract_inverted_index.crystallizable. | 239 |
| abstract_inverted_index.crystallization | 38, 76, 122 |
| abstract_inverted_index.representations | 90 |
| abstract_inverted_index.trial-and-error | 32 |
| abstract_inverted_index.Precision-Recall | 164 |
| abstract_inverted_index.crystallography. | 11 |
| abstract_inverted_index.state-of-the-art | 106 |
| cited_by_percentile_year | |
| corresponding_author_ids | https://openalex.org/A5055826525 |
| countries_distinct_count | 2 |
| institutions_distinct_count | 5 |
| corresponding_institution_ids | https://openalex.org/I4210087059 |
| citation_normalized_percentile.value | 0.16045497 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |