Proteina: Scaling Flow-based Protein Structure Generative Models Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2503.00710
Recently, diffusion- and flow-based generative models of protein structures have emerged as a powerful tool for de novo protein design. Here, we develop Proteina, a new large-scale flow-based protein backbone generator that utilizes hierarchical fold class labels for conditioning and relies on a tailored scalable transformer architecture with up to 5x as many parameters as previous models. To meaningfully quantify performance, we introduce a new set of metrics that directly measure the distributional similarity of generated proteins with reference sets, complementing existing metrics. We further explore scaling training data to millions of synthetic protein structures and explore improved training and sampling recipes adapted to protein backbone generation. This includes fine-tuning strategies like LoRA for protein backbones, new guidance methods like classifier-free guidance and autoguidance for protein backbones, and new adjusted training objectives. Proteina achieves state-of-the-art performance on de novo protein backbone design and produces diverse and designable proteins at unprecedented length, up to 800 residues. The hierarchical conditioning offers novel control, enabling high-level secondary-structure guidance as well as low-level fold-specific generation.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2503.00710
- https://arxiv.org/pdf/2503.00710
- OA Status
- green
- Cited By
- 1
- OpenAlex ID
- https://openalex.org/W4415082968
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415082968Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2503.00710Digital Object Identifier
- Title
-
Proteina: Scaling Flow-based Protein Structure Generative ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-03-02Full publication date if available
- Authors
-
Tomas Geffner, Kieran Didi, Zuobai Zhang, Danny Reidenbach, Zhonglin Cao, Jason Yim, Mario Geiger, Christian Dallago, Emine Küçükbenli, Arash Vahdat, Karsten KreisList of authors in order
- Landing page
-
https://arxiv.org/abs/2503.00710Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2503.00710Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2503.00710Direct OA link when available
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1Per-year citation counts (last 5 years)
Full payload
| id | https://openalex.org/W4415082968 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2503.00710 |
| ids.doi | https://doi.org/10.48550/arxiv.2503.00710 |
| ids.openalex | https://openalex.org/W4415082968 |
| fwci | |
| type | preprint |
| title | Proteina: Scaling Flow-based Protein Structure Generative Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T13937 |
| topics[0].field.id | https://openalex.org/fields/13 |
| topics[0].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[0].score | 0.7918000221252441 |
| topics[0].domain.id | https://openalex.org/domains/1 |
| topics[0].domain.display_name | Life Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1312 |
| topics[0].subfield.display_name | Molecular Biology |
| topics[0].display_name | Genetics, Bioinformatics, and Biomedical Research |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2503.00710 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2503.00710 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2503.00710 |
| locations[1].id | doi:10.48550/arxiv.2503.00710 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2503.00710 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5085903755 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Tomas Geffner |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Geffner, Tomas |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5031230520 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-6839-3320 |
| authorships[1].author.display_name | Kieran Didi |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Didi, Kieran |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5120391855 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-7853-9079 |
| authorships[2].author.display_name | Zuobai Zhang |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Zhang, Zuobai |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5032963060 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Danny Reidenbach |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Reidenbach, Danny |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5101697895 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-5302-7344 |
| authorships[4].author.display_name | Zhonglin Cao |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Cao, Zhonglin |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5074628247 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-0575-7400 |
| authorships[5].author.display_name | Jason Yim |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Yim, Jason |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5002843500 |
| authorships[6].author.orcid | https://orcid.org/0000-0001-5433-0900 |
| authorships[6].author.display_name | Mario Geiger |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Geiger, Mario |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5088531553 |
| authorships[7].author.orcid | https://orcid.org/0000-0003-4650-6181 |
| authorships[7].author.display_name | Christian Dallago |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Dallago, Christian |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5009574969 |
| authorships[8].author.orcid | https://orcid.org/0000-0002-0588-7750 |
| authorships[8].author.display_name | Emine Küçükbenli |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Kucukbenli, Emine |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5038984764 |
| authorships[9].author.orcid | https://orcid.org/0009-0005-9476-1306 |
| authorships[9].author.display_name | Arash Vahdat |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Vahdat, Arash |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5032255237 |
| authorships[10].author.orcid | |
| authorships[10].author.display_name | Karsten Kreis |
| authorships[10].author_position | last |
| authorships[10].raw_author_name | Kreis, Karsten |
| authorships[10].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2503.00710 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-12T00:00:00 |
| display_name | Proteina: Scaling Flow-based Protein Structure Generative Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T13937 |
| primary_topic.field.id | https://openalex.org/fields/13 |
| primary_topic.field.display_name | Biochemistry, Genetics and Molecular Biology |
| primary_topic.score | 0.7918000221252441 |
| primary_topic.domain.id | https://openalex.org/domains/1 |
| primary_topic.domain.display_name | Life Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1312 |
| primary_topic.subfield.display_name | Molecular Biology |
| primary_topic.display_name | Genetics, Bioinformatics, and Biomedical Research |
| cited_by_count | 1 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2503.00710 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2503.00710 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2503.00710 |
| primary_location.id | pmh:oai:arXiv.org:2503.00710 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2503.00710 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2503.00710 |
| publication_date | 2025-03-02 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 12, 24, 42, 63 |
| abstract_inverted_index.5x | 50 |
| abstract_inverted_index.To | 57 |
| abstract_inverted_index.We | 83 |
| abstract_inverted_index.as | 11, 51, 54, 165, 167 |
| abstract_inverted_index.at | 148 |
| abstract_inverted_index.de | 16, 137 |
| abstract_inverted_index.of | 6, 66, 74, 91 |
| abstract_inverted_index.on | 41, 136 |
| abstract_inverted_index.to | 49, 89, 103, 152 |
| abstract_inverted_index.up | 48, 151 |
| abstract_inverted_index.we | 21, 61 |
| abstract_inverted_index.800 | 153 |
| abstract_inverted_index.The | 155 |
| abstract_inverted_index.and | 2, 39, 95, 99, 122, 127, 142, 145 |
| abstract_inverted_index.for | 15, 37, 113, 124 |
| abstract_inverted_index.new | 25, 64, 116, 128 |
| abstract_inverted_index.set | 65 |
| abstract_inverted_index.the | 71 |
| abstract_inverted_index.LoRA | 112 |
| abstract_inverted_index.This | 107 |
| abstract_inverted_index.data | 88 |
| abstract_inverted_index.fold | 34 |
| abstract_inverted_index.have | 9 |
| abstract_inverted_index.like | 111, 119 |
| abstract_inverted_index.many | 52 |
| abstract_inverted_index.novo | 17, 138 |
| abstract_inverted_index.that | 31, 68 |
| abstract_inverted_index.tool | 14 |
| abstract_inverted_index.well | 166 |
| abstract_inverted_index.with | 47, 77 |
| abstract_inverted_index.Here, | 20 |
| abstract_inverted_index.class | 35 |
| abstract_inverted_index.novel | 159 |
| abstract_inverted_index.sets, | 79 |
| abstract_inverted_index.design | 141 |
| abstract_inverted_index.labels | 36 |
| abstract_inverted_index.models | 5 |
| abstract_inverted_index.offers | 158 |
| abstract_inverted_index.relies | 40 |
| abstract_inverted_index.adapted | 102 |
| abstract_inverted_index.design. | 19 |
| abstract_inverted_index.develop | 22 |
| abstract_inverted_index.diverse | 144 |
| abstract_inverted_index.emerged | 10 |
| abstract_inverted_index.explore | 85, 96 |
| abstract_inverted_index.further | 84 |
| abstract_inverted_index.length, | 150 |
| abstract_inverted_index.measure | 70 |
| abstract_inverted_index.methods | 118 |
| abstract_inverted_index.metrics | 67 |
| abstract_inverted_index.models. | 56 |
| abstract_inverted_index.protein | 7, 18, 28, 93, 104, 114, 125, 139 |
| abstract_inverted_index.recipes | 101 |
| abstract_inverted_index.scaling | 86 |
| abstract_inverted_index.Proteina | 132 |
| abstract_inverted_index.achieves | 133 |
| abstract_inverted_index.adjusted | 129 |
| abstract_inverted_index.backbone | 29, 105, 140 |
| abstract_inverted_index.control, | 160 |
| abstract_inverted_index.directly | 69 |
| abstract_inverted_index.enabling | 161 |
| abstract_inverted_index.existing | 81 |
| abstract_inverted_index.guidance | 117, 121, 164 |
| abstract_inverted_index.improved | 97 |
| abstract_inverted_index.includes | 108 |
| abstract_inverted_index.metrics. | 82 |
| abstract_inverted_index.millions | 90 |
| abstract_inverted_index.powerful | 13 |
| abstract_inverted_index.previous | 55 |
| abstract_inverted_index.produces | 143 |
| abstract_inverted_index.proteins | 76, 147 |
| abstract_inverted_index.quantify | 59 |
| abstract_inverted_index.sampling | 100 |
| abstract_inverted_index.scalable | 44 |
| abstract_inverted_index.tailored | 43 |
| abstract_inverted_index.training | 87, 98, 130 |
| abstract_inverted_index.utilizes | 32 |
| abstract_inverted_index.Proteina, | 23 |
| abstract_inverted_index.Recently, | 0 |
| abstract_inverted_index.generated | 75 |
| abstract_inverted_index.generator | 30 |
| abstract_inverted_index.introduce | 62 |
| abstract_inverted_index.low-level | 168 |
| abstract_inverted_index.reference | 78 |
| abstract_inverted_index.residues. | 154 |
| abstract_inverted_index.synthetic | 92 |
| abstract_inverted_index.backbones, | 115, 126 |
| abstract_inverted_index.designable | 146 |
| abstract_inverted_index.diffusion- | 1 |
| abstract_inverted_index.flow-based | 3, 27 |
| abstract_inverted_index.generative | 4 |
| abstract_inverted_index.high-level | 162 |
| abstract_inverted_index.parameters | 53 |
| abstract_inverted_index.similarity | 73 |
| abstract_inverted_index.strategies | 110 |
| abstract_inverted_index.structures | 8, 94 |
| abstract_inverted_index.fine-tuning | 109 |
| abstract_inverted_index.generation. | 106, 170 |
| abstract_inverted_index.large-scale | 26 |
| abstract_inverted_index.objectives. | 131 |
| abstract_inverted_index.performance | 135 |
| abstract_inverted_index.transformer | 45 |
| abstract_inverted_index.architecture | 46 |
| abstract_inverted_index.autoguidance | 123 |
| abstract_inverted_index.conditioning | 38, 157 |
| abstract_inverted_index.hierarchical | 33, 156 |
| abstract_inverted_index.meaningfully | 58 |
| abstract_inverted_index.performance, | 60 |
| abstract_inverted_index.complementing | 80 |
| abstract_inverted_index.fold-specific | 169 |
| abstract_inverted_index.unprecedented | 149 |
| abstract_inverted_index.distributional | 72 |
| abstract_inverted_index.classifier-free | 120 |
| abstract_inverted_index.state-of-the-art | 134 |
| abstract_inverted_index.secondary-structure | 163 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 11 |
| citation_normalized_percentile |