RNA secondary structures: from ab initio prediction to better compression, and back Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2302.11669
In this paper, we use the biological domain knowledge incorporated into stochastic models for ab initio RNA secondary-structure prediction to improve the state of the art in joint compression of RNA sequence and structure data (Liu et al., BMC Bioinformatics, 2008). Moreover, we show that, conversely, compression ratio can serve as a cheap and robust proxy for comparing the prediction quality of different stochastic models, which may help guide the search for better RNA structure prediction models. Our results build on expert stochastic context-free grammar models of RNA secondary structures (Dowell & Eddy, BMC Bioinformatics, 2004; Nebel & Scheid, Theory in Biosciences, 2011) combined with different (static and adaptive) models for rule probabilities and arithmetic coding. We provide a prototype implementation and an extensive empirical evaluation, where we illustrate how grammar features and probability models affect compression ratios.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2302.11669
- https://arxiv.org/pdf/2302.11669
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4321853911
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4321853911Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2302.11669Digital Object Identifier
- Title
-
RNA secondary structures: from ab initio prediction to better compression, and backWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-02-22Full publication date if available
- Authors
-
Evarista Onokpasa, Sebastian Wild, Prudence W. H. WongList of authors in order
- Landing page
-
https://arxiv.org/abs/2302.11669Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2302.11669Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2302.11669Direct OA link when available
- Concepts
-
Computer science, RNA, Ab initio, Artificial intelligence, Algorithm, Theoretical computer science, Biology, Physics, Genetics, Quantum mechanics, GeneTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4321853911 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2302.11669 |
| ids.doi | https://doi.org/10.48550/arxiv.2302.11669 |
| ids.openalex | https://openalex.org/W4321853911 |
| fwci | |
| type | preprint |
| title | RNA secondary structures: from ab initio prediction to better compression, and back |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10521 |
| topics[0].field.id | https://openalex.org/fields/13 |
| topics[0].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[0].score | 0.9997000098228455 |
| topics[0].domain.id | https://openalex.org/domains/1 |
| topics[0].domain.display_name | Life Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1312 |
| topics[0].subfield.display_name | Molecular Biology |
| topics[0].display_name | RNA and protein synthesis mechanisms |
| topics[1].id | https://openalex.org/T10015 |
| topics[1].field.id | https://openalex.org/fields/13 |
| topics[1].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[1].score | 0.9929999709129333 |
| topics[1].domain.id | https://openalex.org/domains/1 |
| topics[1].domain.display_name | Life Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1312 |
| topics[1].subfield.display_name | Molecular Biology |
| topics[1].display_name | Genomics and Phylogenetic Studies |
| topics[2].id | https://openalex.org/T10181 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9922000169754028 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Natural Language Processing Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.6026427149772644 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C67705224 |
| concepts[1].level | 3 |
| concepts[1].score | 0.5553346872329712 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q11053 |
| concepts[1].display_name | RNA |
| concepts[2].id | https://openalex.org/C2781442258 |
| concepts[2].level | 2 |
| concepts[2].score | 0.45137444138526917 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q46310 |
| concepts[2].display_name | Ab initio |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.44781070947647095 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C11413529 |
| concepts[4].level | 1 |
| concepts[4].score | 0.3986057937145233 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[4].display_name | Algorithm |
| concepts[5].id | https://openalex.org/C80444323 |
| concepts[5].level | 1 |
| concepts[5].score | 0.33681467175483704 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q2878974 |
| concepts[5].display_name | Theoretical computer science |
| concepts[6].id | https://openalex.org/C86803240 |
| concepts[6].level | 0 |
| concepts[6].score | 0.15969416499137878 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[6].display_name | Biology |
| concepts[7].id | https://openalex.org/C121332964 |
| concepts[7].level | 0 |
| concepts[7].score | 0.14139091968536377 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[7].display_name | Physics |
| concepts[8].id | https://openalex.org/C54355233 |
| concepts[8].level | 1 |
| concepts[8].score | 0.12456262111663818 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q7162 |
| concepts[8].display_name | Genetics |
| concepts[9].id | https://openalex.org/C62520636 |
| concepts[9].level | 1 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q944 |
| concepts[9].display_name | Quantum mechanics |
| concepts[10].id | https://openalex.org/C104317684 |
| concepts[10].level | 2 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q7187 |
| concepts[10].display_name | Gene |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.6026427149772644 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/rna |
| keywords[1].score | 0.5553346872329712 |
| keywords[1].display_name | RNA |
| keywords[2].id | https://openalex.org/keywords/ab-initio |
| keywords[2].score | 0.45137444138526917 |
| keywords[2].display_name | Ab initio |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.44781070947647095 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/algorithm |
| keywords[4].score | 0.3986057937145233 |
| keywords[4].display_name | Algorithm |
| keywords[5].id | https://openalex.org/keywords/theoretical-computer-science |
| keywords[5].score | 0.33681467175483704 |
| keywords[5].display_name | Theoretical computer science |
| keywords[6].id | https://openalex.org/keywords/biology |
| keywords[6].score | 0.15969416499137878 |
| keywords[6].display_name | Biology |
| keywords[7].id | https://openalex.org/keywords/physics |
| keywords[7].score | 0.14139091968536377 |
| keywords[7].display_name | Physics |
| keywords[8].id | https://openalex.org/keywords/genetics |
| keywords[8].score | 0.12456262111663818 |
| keywords[8].display_name | Genetics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2302.11669 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2302.11669 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2302.11669 |
| locations[1].id | doi:10.48550/arxiv.2302.11669 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2302.11669 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5023241798 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Evarista Onokpasa |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Onokpasa, Evarista |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5071263179 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-6061-9177 |
| authorships[1].author.display_name | Sebastian Wild |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wild, Sebastian |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5063692696 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-7935-7245 |
| authorships[2].author.display_name | Prudence W. H. Wong |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Wong, Prudence W. H. |
| authorships[2].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2302.11669 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2023-02-25T00:00:00 |
| display_name | RNA secondary structures: from ab initio prediction to better compression, and back |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10521 |
| primary_topic.field.id | https://openalex.org/fields/13 |
| primary_topic.field.display_name | Biochemistry, Genetics and Molecular Biology |
| primary_topic.score | 0.9997000098228455 |
| primary_topic.domain.id | https://openalex.org/domains/1 |
| primary_topic.domain.display_name | Life Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1312 |
| primary_topic.subfield.display_name | Molecular Biology |
| primary_topic.display_name | RNA and protein synthesis mechanisms |
| related_works | https://openalex.org/W2051487156, https://openalex.org/W2073681303, https://openalex.org/W2053286651, https://openalex.org/W2181743346, https://openalex.org/W2187401768, https://openalex.org/W2181413294, https://openalex.org/W2989452537, https://openalex.org/W2052122378, https://openalex.org/W2544423928, https://openalex.org/W2147993839 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2302.11669 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2302.11669 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2302.11669 |
| primary_location.id | pmh:oai:arXiv.org:2302.11669 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2302.11669 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2302.11669 |
| publication_date | 2023-02-22 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 51, 118 |
| abstract_inverted_index.In | 0 |
| abstract_inverted_index.We | 116 |
| abstract_inverted_index.ab | 14 |
| abstract_inverted_index.an | 122 |
| abstract_inverted_index.as | 50 |
| abstract_inverted_index.et | 36 |
| abstract_inverted_index.in | 26, 100 |
| abstract_inverted_index.of | 23, 29, 61, 86 |
| abstract_inverted_index.on | 80 |
| abstract_inverted_index.to | 19 |
| abstract_inverted_index.we | 3, 42, 127 |
| abstract_inverted_index.BMC | 38, 93 |
| abstract_inverted_index.Our | 77 |
| abstract_inverted_index.RNA | 16, 30, 73, 87 |
| abstract_inverted_index.and | 32, 53, 107, 113, 121, 132 |
| abstract_inverted_index.art | 25 |
| abstract_inverted_index.can | 48 |
| abstract_inverted_index.for | 13, 56, 71, 110 |
| abstract_inverted_index.how | 129 |
| abstract_inverted_index.may | 66 |
| abstract_inverted_index.the | 5, 21, 24, 58, 69 |
| abstract_inverted_index.use | 4 |
| abstract_inverted_index.(Liu | 35 |
| abstract_inverted_index.al., | 37 |
| abstract_inverted_index.data | 34 |
| abstract_inverted_index.help | 67 |
| abstract_inverted_index.into | 10 |
| abstract_inverted_index.rule | 111 |
| abstract_inverted_index.show | 43 |
| abstract_inverted_index.this | 1 |
| abstract_inverted_index.with | 104 |
| abstract_inverted_index.& | 91, 97 |
| abstract_inverted_index.2004; | 95 |
| abstract_inverted_index.2011) | 102 |
| abstract_inverted_index.Eddy, | 92 |
| abstract_inverted_index.Nebel | 96 |
| abstract_inverted_index.build | 79 |
| abstract_inverted_index.cheap | 52 |
| abstract_inverted_index.guide | 68 |
| abstract_inverted_index.joint | 27 |
| abstract_inverted_index.proxy | 55 |
| abstract_inverted_index.ratio | 47 |
| abstract_inverted_index.serve | 49 |
| abstract_inverted_index.state | 22 |
| abstract_inverted_index.that, | 44 |
| abstract_inverted_index.where | 126 |
| abstract_inverted_index.which | 65 |
| abstract_inverted_index.2008). | 40 |
| abstract_inverted_index.Theory | 99 |
| abstract_inverted_index.affect | 135 |
| abstract_inverted_index.better | 72 |
| abstract_inverted_index.domain | 7 |
| abstract_inverted_index.expert | 81 |
| abstract_inverted_index.initio | 15 |
| abstract_inverted_index.models | 12, 85, 109, 134 |
| abstract_inverted_index.paper, | 2 |
| abstract_inverted_index.robust | 54 |
| abstract_inverted_index.search | 70 |
| abstract_inverted_index.(Dowell | 90 |
| abstract_inverted_index.(static | 106 |
| abstract_inverted_index.Scheid, | 98 |
| abstract_inverted_index.coding. | 115 |
| abstract_inverted_index.grammar | 84, 130 |
| abstract_inverted_index.improve | 20 |
| abstract_inverted_index.models, | 64 |
| abstract_inverted_index.models. | 76 |
| abstract_inverted_index.provide | 117 |
| abstract_inverted_index.quality | 60 |
| abstract_inverted_index.ratios. | 137 |
| abstract_inverted_index.results | 78 |
| abstract_inverted_index.combined | 103 |
| abstract_inverted_index.features | 131 |
| abstract_inverted_index.sequence | 31 |
| abstract_inverted_index.Moreover, | 41 |
| abstract_inverted_index.adaptive) | 108 |
| abstract_inverted_index.comparing | 57 |
| abstract_inverted_index.different | 62, 105 |
| abstract_inverted_index.empirical | 124 |
| abstract_inverted_index.extensive | 123 |
| abstract_inverted_index.knowledge | 8 |
| abstract_inverted_index.prototype | 119 |
| abstract_inverted_index.secondary | 88 |
| abstract_inverted_index.structure | 33, 74 |
| abstract_inverted_index.arithmetic | 114 |
| abstract_inverted_index.biological | 6 |
| abstract_inverted_index.illustrate | 128 |
| abstract_inverted_index.prediction | 18, 59, 75 |
| abstract_inverted_index.stochastic | 11, 63, 82 |
| abstract_inverted_index.structures | 89 |
| abstract_inverted_index.compression | 28, 46, 136 |
| abstract_inverted_index.conversely, | 45 |
| abstract_inverted_index.evaluation, | 125 |
| abstract_inverted_index.probability | 133 |
| abstract_inverted_index.Biosciences, | 101 |
| abstract_inverted_index.context-free | 83 |
| abstract_inverted_index.incorporated | 9 |
| abstract_inverted_index.probabilities | 112 |
| abstract_inverted_index.implementation | 120 |
| abstract_inverted_index.Bioinformatics, | 39, 94 |
| abstract_inverted_index.secondary-structure | 17 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.7099999785423279 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |