Deciphering genomic codes using advanced natural language processing techniques: a scoping review Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.1093/jamia/ocaf029
Objectives The vast and complex nature of human genomic sequencing data presents challenges for effective analysis. This review aims to investigate the application of natural language processing (NLP) techniques, particularly large language models (LLMs) and transformer architectures, in deciphering genomic codes, focusing on tokenization, transformer models, and regulatory annotation prediction. The goal of this review is to assess data and model accessibility in the most recent literature, gaining a better understanding of the existing capabilities and constraints of these tools in processing genomic sequencing data. Materials and Methods Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, our scoping review was conducted across PubMed, Medline, Scopus, Web of Science, Embase, and ACM Digital Library. Studies were included if they focused on NLP methodologies applied to genomic sequencing data analysis, without restrictions on publication date or article type. Results A total of 26 studies published between 2021 and April 2024 were selected for review. The review highlights that tokenization and transformer models enhance the processing and understanding of genomic data, with applications in predicting regulatory annotations like transcription-factor binding sites and chromatin accessibility. Discussion The application of NLP and LLMs to genomic sequencing data interpretation is a promising field that can help streamline the processing of large-scale genomic data while also providing a better understanding of its complex structures. It has the potential to drive advancements in personalized medicine by offering more efficient and scalable solutions for genomic analysis. Further research is also needed to discuss and overcome current limitations, enhancing model transparency and applicability. Conclusion This review highlights the growing role of NLP, particularly LLMs, in genomic sequencing data analysis. While these models improve data processing and regulatory annotation prediction, challenges remain in accessibility and interpretability. Further research is needed to refine their application in genomics.
Related Topics
- Type
- review
- Language
- en
- Landing Page
- https://doi.org/10.1093/jamia/ocaf029
- https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocaf029/62167460/ocaf029.pdf
- OA Status
- bronze
- Cited By
- 3
- References
- 36
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4407941982
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4407941982Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1093/jamia/ocaf029Digital Object Identifier
- Title
-
Deciphering genomic codes using advanced natural language processing techniques: a scoping reviewWork title
- Type
-
reviewOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-02-25Full publication date if available
- Authors
-
Shuyan Cheng, Yishu Wei, Yiliang Zhou, Zihan Xu, Drew Wright, Jinze Liu, Yifan PengList of authors in order
- Landing page
-
https://doi.org/10.1093/jamia/ocaf029Publisher landing page
- PDF URL
-
https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocaf029/62167460/ocaf029.pdfDirect link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
bronzeOpen access status per OpenAlex
- OA URL
-
https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocaf029/62167460/ocaf029.pdfDirect OA link when available
- Concepts
-
Computer science, Data science, Lexical analysis, Scalability, Annotation, Artificial intelligence, Computational biology, Data mining, Bioinformatics, Biology, DatabaseTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
3Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 3Per-year citation counts (last 5 years)
- References (count)
-
36Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4407941982 |
|---|---|
| doi | https://doi.org/10.1093/jamia/ocaf029 |
| ids.doi | https://doi.org/10.1093/jamia/ocaf029 |
| ids.pmid | https://pubmed.ncbi.nlm.nih.gov/39998912 |
| ids.openalex | https://openalex.org/W4407941982 |
| fwci | 8.04883388 |
| mesh[0].qualifier_ui | |
| mesh[0].descriptor_ui | D009323 |
| mesh[0].is_major_topic | True |
| mesh[0].qualifier_name | |
| mesh[0].descriptor_name | Natural Language Processing |
| mesh[1].qualifier_ui | |
| mesh[1].descriptor_ui | D006801 |
| mesh[1].is_major_topic | False |
| mesh[1].qualifier_name | |
| mesh[1].descriptor_name | Humans |
| mesh[2].qualifier_ui | |
| mesh[2].descriptor_ui | D023281 |
| mesh[2].is_major_topic | True |
| mesh[2].qualifier_name | |
| mesh[2].descriptor_name | Genomics |
| mesh[3].qualifier_ui | |
| mesh[3].descriptor_ui | D005815 |
| mesh[3].is_major_topic | True |
| mesh[3].qualifier_name | |
| mesh[3].descriptor_name | Genetic Code |
| mesh[4].qualifier_ui | |
| mesh[4].descriptor_ui | D009323 |
| mesh[4].is_major_topic | True |
| mesh[4].qualifier_name | |
| mesh[4].descriptor_name | Natural Language Processing |
| mesh[5].qualifier_ui | |
| mesh[5].descriptor_ui | D006801 |
| mesh[5].is_major_topic | False |
| mesh[5].qualifier_name | |
| mesh[5].descriptor_name | Humans |
| mesh[6].qualifier_ui | |
| mesh[6].descriptor_ui | D023281 |
| mesh[6].is_major_topic | True |
| mesh[6].qualifier_name | |
| mesh[6].descriptor_name | Genomics |
| mesh[7].qualifier_ui | |
| mesh[7].descriptor_ui | D005815 |
| mesh[7].is_major_topic | True |
| mesh[7].qualifier_name | |
| mesh[7].descriptor_name | Genetic Code |
| mesh[8].qualifier_ui | |
| mesh[8].descriptor_ui | D009323 |
| mesh[8].is_major_topic | True |
| mesh[8].qualifier_name | |
| mesh[8].descriptor_name | Natural Language Processing |
| mesh[9].qualifier_ui | |
| mesh[9].descriptor_ui | D006801 |
| mesh[9].is_major_topic | False |
| mesh[9].qualifier_name | |
| mesh[9].descriptor_name | Humans |
| mesh[10].qualifier_ui | |
| mesh[10].descriptor_ui | D023281 |
| mesh[10].is_major_topic | True |
| mesh[10].qualifier_name | |
| mesh[10].descriptor_name | Genomics |
| mesh[11].qualifier_ui | |
| mesh[11].descriptor_ui | D005815 |
| mesh[11].is_major_topic | True |
| mesh[11].qualifier_name | |
| mesh[11].descriptor_name | Genetic Code |
| mesh[12].qualifier_ui | |
| mesh[12].descriptor_ui | D009323 |
| mesh[12].is_major_topic | True |
| mesh[12].qualifier_name | |
| mesh[12].descriptor_name | Natural Language Processing |
| mesh[13].qualifier_ui | |
| mesh[13].descriptor_ui | D006801 |
| mesh[13].is_major_topic | False |
| mesh[13].qualifier_name | |
| mesh[13].descriptor_name | Humans |
| mesh[14].qualifier_ui | |
| mesh[14].descriptor_ui | D023281 |
| mesh[14].is_major_topic | True |
| mesh[14].qualifier_name | |
| mesh[14].descriptor_name | Genomics |
| mesh[15].qualifier_ui | |
| mesh[15].descriptor_ui | D005815 |
| mesh[15].is_major_topic | True |
| mesh[15].qualifier_name | |
| mesh[15].descriptor_name | Genetic Code |
| mesh[16].qualifier_ui | |
| mesh[16].descriptor_ui | D009323 |
| mesh[16].is_major_topic | True |
| mesh[16].qualifier_name | |
| mesh[16].descriptor_name | Natural Language Processing |
| mesh[17].qualifier_ui | |
| mesh[17].descriptor_ui | D006801 |
| mesh[17].is_major_topic | False |
| mesh[17].qualifier_name | |
| mesh[17].descriptor_name | Humans |
| mesh[18].qualifier_ui | |
| mesh[18].descriptor_ui | D023281 |
| mesh[18].is_major_topic | True |
| mesh[18].qualifier_name | |
| mesh[18].descriptor_name | Genomics |
| mesh[19].qualifier_ui | |
| mesh[19].descriptor_ui | D005815 |
| mesh[19].is_major_topic | True |
| mesh[19].qualifier_name | |
| mesh[19].descriptor_name | Genetic Code |
| mesh[20].qualifier_ui | |
| mesh[20].descriptor_ui | D009323 |
| mesh[20].is_major_topic | True |
| mesh[20].qualifier_name | |
| mesh[20].descriptor_name | Natural Language Processing |
| mesh[21].qualifier_ui | |
| mesh[21].descriptor_ui | D006801 |
| mesh[21].is_major_topic | False |
| mesh[21].qualifier_name | |
| mesh[21].descriptor_name | Humans |
| mesh[22].qualifier_ui | |
| mesh[22].descriptor_ui | D023281 |
| mesh[22].is_major_topic | True |
| mesh[22].qualifier_name | |
| mesh[22].descriptor_name | Genomics |
| mesh[23].qualifier_ui | |
| mesh[23].descriptor_ui | D005815 |
| mesh[23].is_major_topic | True |
| mesh[23].qualifier_name | |
| mesh[23].descriptor_name | Genetic Code |
| type | review |
| title | Deciphering genomic codes using advanced natural language processing techniques: a scoping review |
| awards[0].id | https://openalex.org/G550110772 |
| awards[0].funder_id | https://openalex.org/F4320337372 |
| awards[0].display_name | |
| awards[0].funder_award_id | R01LM014306 |
| awards[0].funder_display_name | U.S. National Library of Medicine |
| biblio.issue | 4 |
| biblio.volume | 32 |
| biblio.last_page | 772 |
| biblio.first_page | 761 |
| topics[0].id | https://openalex.org/T10521 |
| topics[0].field.id | https://openalex.org/fields/13 |
| topics[0].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[0].score | 0.9968000054359436 |
| topics[0].domain.id | https://openalex.org/domains/1 |
| topics[0].domain.display_name | Life Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1312 |
| topics[0].subfield.display_name | Molecular Biology |
| topics[0].display_name | RNA and protein synthesis mechanisms |
| topics[1].id | https://openalex.org/T10015 |
| topics[1].field.id | https://openalex.org/fields/13 |
| topics[1].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[1].score | 0.9962000250816345 |
| topics[1].domain.id | https://openalex.org/domains/1 |
| topics[1].domain.display_name | Life Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1312 |
| topics[1].subfield.display_name | Molecular Biology |
| topics[1].display_name | Genomics and Phylogenetic Studies |
| topics[2].id | https://openalex.org/T11482 |
| topics[2].field.id | https://openalex.org/fields/13 |
| topics[2].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[2].score | 0.9896000027656555 |
| topics[2].domain.id | https://openalex.org/domains/1 |
| topics[2].domain.display_name | Life Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1312 |
| topics[2].subfield.display_name | Molecular Biology |
| topics[2].display_name | RNA modifications and cancer |
| funders[0].id | https://openalex.org/F4320337372 |
| funders[0].ror | https://ror.org/0060t0j89 |
| funders[0].display_name | U.S. National Library of Medicine |
| is_xpac | False |
| apc_list.value | 3967 |
| apc_list.currency | USD |
| apc_list.value_usd | 3967 |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7361987829208374 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C2522767166 |
| concepts[1].level | 1 |
| concepts[1].score | 0.6445482969284058 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q2374463 |
| concepts[1].display_name | Data science |
| concepts[2].id | https://openalex.org/C176982825 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6085509061813354 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q835922 |
| concepts[2].display_name | Lexical analysis |
| concepts[3].id | https://openalex.org/C48044578 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5093554258346558 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q727490 |
| concepts[3].display_name | Scalability |
| concepts[4].id | https://openalex.org/C2776321320 |
| concepts[4].level | 2 |
| concepts[4].score | 0.4881065785884857 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q857525 |
| concepts[4].display_name | Annotation |
| concepts[5].id | https://openalex.org/C154945302 |
| concepts[5].level | 1 |
| concepts[5].score | 0.3711152672767639 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[5].display_name | Artificial intelligence |
| concepts[6].id | https://openalex.org/C70721500 |
| concepts[6].level | 1 |
| concepts[6].score | 0.33350831270217896 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q177005 |
| concepts[6].display_name | Computational biology |
| concepts[7].id | https://openalex.org/C124101348 |
| concepts[7].level | 1 |
| concepts[7].score | 0.33000820875167847 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q172491 |
| concepts[7].display_name | Data mining |
| concepts[8].id | https://openalex.org/C60644358 |
| concepts[8].level | 1 |
| concepts[8].score | 0.32102787494659424 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q128570 |
| concepts[8].display_name | Bioinformatics |
| concepts[9].id | https://openalex.org/C86803240 |
| concepts[9].level | 0 |
| concepts[9].score | 0.1454649567604065 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[9].display_name | Biology |
| concepts[10].id | https://openalex.org/C77088390 |
| concepts[10].level | 1 |
| concepts[10].score | 0.12995943427085876 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q8513 |
| concepts[10].display_name | Database |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7361987829208374 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/data-science |
| keywords[1].score | 0.6445482969284058 |
| keywords[1].display_name | Data science |
| keywords[2].id | https://openalex.org/keywords/lexical-analysis |
| keywords[2].score | 0.6085509061813354 |
| keywords[2].display_name | Lexical analysis |
| keywords[3].id | https://openalex.org/keywords/scalability |
| keywords[3].score | 0.5093554258346558 |
| keywords[3].display_name | Scalability |
| keywords[4].id | https://openalex.org/keywords/annotation |
| keywords[4].score | 0.4881065785884857 |
| keywords[4].display_name | Annotation |
| keywords[5].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[5].score | 0.3711152672767639 |
| keywords[5].display_name | Artificial intelligence |
| keywords[6].id | https://openalex.org/keywords/computational-biology |
| keywords[6].score | 0.33350831270217896 |
| keywords[6].display_name | Computational biology |
| keywords[7].id | https://openalex.org/keywords/data-mining |
| keywords[7].score | 0.33000820875167847 |
| keywords[7].display_name | Data mining |
| keywords[8].id | https://openalex.org/keywords/bioinformatics |
| keywords[8].score | 0.32102787494659424 |
| keywords[8].display_name | Bioinformatics |
| keywords[9].id | https://openalex.org/keywords/biology |
| keywords[9].score | 0.1454649567604065 |
| keywords[9].display_name | Biology |
| keywords[10].id | https://openalex.org/keywords/database |
| keywords[10].score | 0.12995943427085876 |
| keywords[10].display_name | Database |
| language | en |
| locations[0].id | doi:10.1093/jamia/ocaf029 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S129839026 |
| locations[0].source.issn | 1067-5027, 1527-974X |
| locations[0].source.type | journal |
| locations[0].source.is_oa | False |
| locations[0].source.issn_l | 1067-5027 |
| locations[0].source.is_core | True |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | Journal of the American Medical Informatics Association |
| locations[0].source.host_organization | https://openalex.org/P4310311648 |
| locations[0].source.host_organization_name | Oxford University Press |
| locations[0].source.host_organization_lineage | https://openalex.org/P4310311648, https://openalex.org/P4310311647 |
| locations[0].source.host_organization_lineage_names | Oxford University Press, University of Oxford |
| locations[0].license | |
| locations[0].pdf_url | https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocaf029/62167460/ocaf029.pdf |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Journal of the American Medical Informatics Association |
| locations[0].landing_page_url | https://doi.org/10.1093/jamia/ocaf029 |
| locations[1].id | pmid:39998912 |
| locations[1].is_oa | False |
| locations[1].source.id | https://openalex.org/S4306525036 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | False |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | PubMed |
| locations[1].source.host_organization | https://openalex.org/I1299303238 |
| locations[1].source.host_organization_name | National Institutes of Health |
| locations[1].source.host_organization_lineage | https://openalex.org/I1299303238 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | publishedVersion |
| locations[1].raw_type | |
| locations[1].license_id | |
| locations[1].is_accepted | True |
| locations[1].is_published | True |
| locations[1].raw_source_name | Journal of the American Medical Informatics Association : JAMIA |
| locations[1].landing_page_url | https://pubmed.ncbi.nlm.nih.gov/39998912 |
| indexed_in | crossref, pubmed |
| authorships[0].author.id | https://openalex.org/A5032793336 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-4533-5942 |
| authorships[0].author.display_name | Shuyan Cheng |
| authorships[0].countries | US |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I205783295, https://openalex.org/I4387153466 |
| authorships[0].affiliations[0].raw_affiliation_string | Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States |
| authorships[0].institutions[0].id | https://openalex.org/I205783295 |
| authorships[0].institutions[0].ror | https://ror.org/05bnh6r87 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I205783295 |
| authorships[0].institutions[0].country_code | US |
| authorships[0].institutions[0].display_name | Cornell University |
| authorships[0].institutions[1].id | https://openalex.org/I4387153466 |
| authorships[0].institutions[1].ror | https://ror.org/02r109517 |
| authorships[0].institutions[1].type | education |
| authorships[0].institutions[1].lineage | https://openalex.org/I205783295, https://openalex.org/I4387153466 |
| authorships[0].institutions[1].country_code | US |
| authorships[0].institutions[1].display_name | Weill Cornell Medicine |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Shuyan Cheng |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States |
| authorships[1].author.id | https://openalex.org/A5002546620 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-6446-0514 |
| authorships[1].author.display_name | Yishu Wei |
| authorships[1].countries | US |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I205783295, https://openalex.org/I4387153466 |
| authorships[1].affiliations[0].raw_affiliation_string | Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States |
| authorships[1].institutions[0].id | https://openalex.org/I205783295 |
| authorships[1].institutions[0].ror | https://ror.org/05bnh6r87 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I205783295 |
| authorships[1].institutions[0].country_code | US |
| authorships[1].institutions[0].display_name | Cornell University |
| authorships[1].institutions[1].id | https://openalex.org/I4387153466 |
| authorships[1].institutions[1].ror | https://ror.org/02r109517 |
| authorships[1].institutions[1].type | education |
| authorships[1].institutions[1].lineage | https://openalex.org/I205783295, https://openalex.org/I4387153466 |
| authorships[1].institutions[1].country_code | US |
| authorships[1].institutions[1].display_name | Weill Cornell Medicine |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yishu Wei |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States |
| authorships[2].author.id | https://openalex.org/A5030407665 |
| authorships[2].author.orcid | https://orcid.org/0009-0002-7457-7075 |
| authorships[2].author.display_name | Yiliang Zhou |
| authorships[2].countries | US |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I205783295, https://openalex.org/I4387153466 |
| authorships[2].affiliations[0].raw_affiliation_string | Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States |
| authorships[2].institutions[0].id | https://openalex.org/I205783295 |
| authorships[2].institutions[0].ror | https://ror.org/05bnh6r87 |
| authorships[2].institutions[0].type | education |
| authorships[2].institutions[0].lineage | https://openalex.org/I205783295 |
| authorships[2].institutions[0].country_code | US |
| authorships[2].institutions[0].display_name | Cornell University |
| authorships[2].institutions[1].id | https://openalex.org/I4387153466 |
| authorships[2].institutions[1].ror | https://ror.org/02r109517 |
| authorships[2].institutions[1].type | education |
| authorships[2].institutions[1].lineage | https://openalex.org/I205783295, https://openalex.org/I4387153466 |
| authorships[2].institutions[1].country_code | US |
| authorships[2].institutions[1].display_name | Weill Cornell Medicine |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Yiliang Zhou |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States |
| authorships[3].author.id | https://openalex.org/A5021399686 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-5632-0439 |
| authorships[3].author.display_name | Zihan Xu |
| authorships[3].countries | US |
| authorships[3].affiliations[0].institution_ids | https://openalex.org/I205783295, https://openalex.org/I4387153466 |
| authorships[3].affiliations[0].raw_affiliation_string | Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States |
| authorships[3].institutions[0].id | https://openalex.org/I205783295 |
| authorships[3].institutions[0].ror | https://ror.org/05bnh6r87 |
| authorships[3].institutions[0].type | education |
| authorships[3].institutions[0].lineage | https://openalex.org/I205783295 |
| authorships[3].institutions[0].country_code | US |
| authorships[3].institutions[0].display_name | Cornell University |
| authorships[3].institutions[1].id | https://openalex.org/I4387153466 |
| authorships[3].institutions[1].ror | https://ror.org/02r109517 |
| authorships[3].institutions[1].type | education |
| authorships[3].institutions[1].lineage | https://openalex.org/I205783295, https://openalex.org/I4387153466 |
| authorships[3].institutions[1].country_code | US |
| authorships[3].institutions[1].display_name | Weill Cornell Medicine |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zihan Xu |
| authorships[3].is_corresponding | False |
| authorships[3].raw_affiliation_strings | Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States |
| authorships[4].author.id | https://openalex.org/A5067831441 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-1776-5427 |
| authorships[4].author.display_name | Drew Wright |
| authorships[4].countries | US |
| authorships[4].affiliations[0].institution_ids | https://openalex.org/I205783295 |
| authorships[4].affiliations[0].raw_affiliation_string | Samuel J. Wood Library & C.V. Starr Biomedical Information Center, Weill Cornell Medicine, New York, NY 10065, United States |
| authorships[4].institutions[0].id | https://openalex.org/I205783295 |
| authorships[4].institutions[0].ror | https://ror.org/05bnh6r87 |
| authorships[4].institutions[0].type | education |
| authorships[4].institutions[0].lineage | https://openalex.org/I205783295 |
| authorships[4].institutions[0].country_code | US |
| authorships[4].institutions[0].display_name | Cornell University |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Drew N Wright |
| authorships[4].is_corresponding | False |
| authorships[4].raw_affiliation_strings | Samuel J. Wood Library & C.V. Starr Biomedical Information Center, Weill Cornell Medicine, New York, NY 10065, United States |
| authorships[5].author.id | https://openalex.org/A5021864206 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-0692-8793 |
| authorships[5].author.display_name | Jinze Liu |
| authorships[5].countries | US |
| authorships[5].affiliations[0].institution_ids | https://openalex.org/I184840846 |
| authorships[5].affiliations[0].raw_affiliation_string | School of Public Health, Virginia Commonwealth University, Richmond, VA 23219, United States |
| authorships[5].institutions[0].id | https://openalex.org/I184840846 |
| authorships[5].institutions[0].ror | https://ror.org/02nkdxk79 |
| authorships[5].institutions[0].type | education |
| authorships[5].institutions[0].lineage | https://openalex.org/I184840846 |
| authorships[5].institutions[0].country_code | US |
| authorships[5].institutions[0].display_name | Virginia Commonwealth University |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Jinze Liu |
| authorships[5].is_corresponding | False |
| authorships[5].raw_affiliation_strings | School of Public Health, Virginia Commonwealth University, Richmond, VA 23219, United States |
| authorships[6].author.id | https://openalex.org/A5085113833 |
| authorships[6].author.orcid | https://orcid.org/0000-0001-9309-8331 |
| authorships[6].author.display_name | Yifan Peng |
| authorships[6].countries | US |
| authorships[6].affiliations[0].institution_ids | https://openalex.org/I205783295, https://openalex.org/I4387153466 |
| authorships[6].affiliations[0].raw_affiliation_string | Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States |
| authorships[6].institutions[0].id | https://openalex.org/I205783295 |
| authorships[6].institutions[0].ror | https://ror.org/05bnh6r87 |
| authorships[6].institutions[0].type | education |
| authorships[6].institutions[0].lineage | https://openalex.org/I205783295 |
| authorships[6].institutions[0].country_code | US |
| authorships[6].institutions[0].display_name | Cornell University |
| authorships[6].institutions[1].id | https://openalex.org/I4387153466 |
| authorships[6].institutions[1].ror | https://ror.org/02r109517 |
| authorships[6].institutions[1].type | education |
| authorships[6].institutions[1].lineage | https://openalex.org/I205783295, https://openalex.org/I4387153466 |
| authorships[6].institutions[1].country_code | US |
| authorships[6].institutions[1].display_name | Weill Cornell Medicine |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Yifan Peng |
| authorships[6].is_corresponding | False |
| authorships[6].raw_affiliation_strings | Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States |
| has_content.pdf | True |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocaf029/62167460/ocaf029.pdf |
| open_access.oa_status | bronze |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Deciphering genomic codes using advanced natural language processing techniques: a scoping review |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10521 |
| primary_topic.field.id | https://openalex.org/fields/13 |
| primary_topic.field.display_name | Biochemistry, Genetics and Molecular Biology |
| primary_topic.score | 0.9968000054359436 |
| primary_topic.domain.id | https://openalex.org/domains/1 |
| primary_topic.domain.display_name | Life Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1312 |
| primary_topic.subfield.display_name | Molecular Biology |
| primary_topic.display_name | RNA and protein synthesis mechanisms |
| related_works | https://openalex.org/W2361861616, https://openalex.org/W2263699433, https://openalex.org/W2377979023, https://openalex.org/W2218034408, https://openalex.org/W4405003489, https://openalex.org/W2392921965, https://openalex.org/W2358755282, https://openalex.org/W4386014872, https://openalex.org/W1847536016, https://openalex.org/W4361193986 |
| cited_by_count | 3 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 3 |
| locations_count | 2 |
| best_oa_location.id | doi:10.1093/jamia/ocaf029 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S129839026 |
| best_oa_location.source.issn | 1067-5027, 1527-974X |
| best_oa_location.source.type | journal |
| best_oa_location.source.is_oa | False |
| best_oa_location.source.issn_l | 1067-5027 |
| best_oa_location.source.is_core | True |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | Journal of the American Medical Informatics Association |
| best_oa_location.source.host_organization | https://openalex.org/P4310311648 |
| best_oa_location.source.host_organization_name | Oxford University Press |
| best_oa_location.source.host_organization_lineage | https://openalex.org/P4310311648, https://openalex.org/P4310311647 |
| best_oa_location.source.host_organization_lineage_names | Oxford University Press, University of Oxford |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocaf029/62167460/ocaf029.pdf |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Journal of the American Medical Informatics Association |
| best_oa_location.landing_page_url | https://doi.org/10.1093/jamia/ocaf029 |
| primary_location.id | doi:10.1093/jamia/ocaf029 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S129839026 |
| primary_location.source.issn | 1067-5027, 1527-974X |
| primary_location.source.type | journal |
| primary_location.source.is_oa | False |
| primary_location.source.issn_l | 1067-5027 |
| primary_location.source.is_core | True |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | Journal of the American Medical Informatics Association |
| primary_location.source.host_organization | https://openalex.org/P4310311648 |
| primary_location.source.host_organization_name | Oxford University Press |
| primary_location.source.host_organization_lineage | https://openalex.org/P4310311648, https://openalex.org/P4310311647 |
| primary_location.source.host_organization_lineage_names | Oxford University Press, University of Oxford |
| primary_location.license | |
| primary_location.pdf_url | https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocaf029/62167460/ocaf029.pdf |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Journal of the American Medical Informatics Association |
| primary_location.landing_page_url | https://doi.org/10.1093/jamia/ocaf029 |
| publication_date | 2025-02-25 |
| publication_year | 2025 |
| referenced_works | https://openalex.org/W3164264961, https://openalex.org/W4389391294, https://openalex.org/W2396849069, https://openalex.org/W4394769874, https://openalex.org/W4401345106, https://openalex.org/W4388717695, https://openalex.org/W4385201870, https://openalex.org/W3096508121, https://openalex.org/W4320801614, https://openalex.org/W3127238141, https://openalex.org/W3129125493, https://openalex.org/W4285405240, https://openalex.org/W4296613150, https://openalex.org/W4288421329, https://openalex.org/W4387171262, https://openalex.org/W4389482632, https://openalex.org/W4311556529, https://openalex.org/W4291302282, https://openalex.org/W4220834094, https://openalex.org/W4288421316, https://openalex.org/W4392188680, https://openalex.org/W4291301137, https://openalex.org/W4385230279, https://openalex.org/W4390942844, https://openalex.org/W4389632273, https://openalex.org/W6855572148, https://openalex.org/W4386248120, https://openalex.org/W6861013869, https://openalex.org/W4388817935, https://openalex.org/W4382311896, https://openalex.org/W4214825821, https://openalex.org/W2978420588, https://openalex.org/W4392168151, https://openalex.org/W4401103147, https://openalex.org/W4388722291, https://openalex.org/W2171115585 |
| referenced_works_count | 36 |
| abstract_inverted_index.A | 141 |
| abstract_inverted_index.a | 69, 198, 214 |
| abstract_inverted_index.26 | 144 |
| abstract_inverted_index.It | 221 |
| abstract_inverted_index.by | 231 |
| abstract_inverted_index.if | 120 |
| abstract_inverted_index.in | 38, 63, 81, 174, 228, 268, 285, 297 |
| abstract_inverted_index.is | 56, 197, 243, 291 |
| abstract_inverted_index.of | 7, 24, 53, 72, 78, 110, 143, 169, 188, 207, 217, 264 |
| abstract_inverted_index.on | 43, 123, 134 |
| abstract_inverted_index.or | 137 |
| abstract_inverted_index.to | 20, 57, 127, 192, 225, 246, 293 |
| abstract_inverted_index.ACM | 114 |
| abstract_inverted_index.NLP | 124, 189 |
| abstract_inverted_index.The | 2, 51, 156, 186 |
| abstract_inverted_index.Web | 109 |
| abstract_inverted_index.and | 4, 35, 47, 60, 76, 87, 96, 113, 149, 161, 167, 182, 190, 235, 248, 255, 279, 287 |
| abstract_inverted_index.can | 202 |
| abstract_inverted_index.for | 14, 93, 154, 238 |
| abstract_inverted_index.has | 222 |
| abstract_inverted_index.its | 218 |
| abstract_inverted_index.our | 100 |
| abstract_inverted_index.the | 22, 64, 73, 165, 205, 223, 261 |
| abstract_inverted_index.was | 103 |
| abstract_inverted_index.2021 | 148 |
| abstract_inverted_index.2024 | 151 |
| abstract_inverted_index.LLMs | 191 |
| abstract_inverted_index.NLP, | 265 |
| abstract_inverted_index.This | 17, 258 |
| abstract_inverted_index.aims | 19 |
| abstract_inverted_index.also | 212, 244 |
| abstract_inverted_index.data | 11, 59, 130, 195, 210, 271, 277 |
| abstract_inverted_index.date | 136 |
| abstract_inverted_index.goal | 52 |
| abstract_inverted_index.help | 203 |
| abstract_inverted_index.like | 178 |
| abstract_inverted_index.more | 233 |
| abstract_inverted_index.most | 65 |
| abstract_inverted_index.role | 263 |
| abstract_inverted_index.that | 159, 201 |
| abstract_inverted_index.they | 121 |
| abstract_inverted_index.this | 54 |
| abstract_inverted_index.vast | 3 |
| abstract_inverted_index.were | 118, 152 |
| abstract_inverted_index.with | 172 |
| abstract_inverted_index.(NLP) | 28 |
| abstract_inverted_index.April | 150 |
| abstract_inverted_index.Items | 92 |
| abstract_inverted_index.LLMs, | 267 |
| abstract_inverted_index.While | 273 |
| abstract_inverted_index.data, | 171 |
| abstract_inverted_index.data. | 85 |
| abstract_inverted_index.drive | 226 |
| abstract_inverted_index.field | 200 |
| abstract_inverted_index.human | 8 |
| abstract_inverted_index.large | 31 |
| abstract_inverted_index.model | 61, 253 |
| abstract_inverted_index.sites | 181 |
| abstract_inverted_index.their | 295 |
| abstract_inverted_index.these | 79, 274 |
| abstract_inverted_index.tools | 80 |
| abstract_inverted_index.total | 142 |
| abstract_inverted_index.type. | 139 |
| abstract_inverted_index.while | 211 |
| abstract_inverted_index.(LLMs) | 34 |
| abstract_inverted_index.across | 105 |
| abstract_inverted_index.assess | 58 |
| abstract_inverted_index.better | 70, 215 |
| abstract_inverted_index.codes, | 41 |
| abstract_inverted_index.models | 33, 163, 275 |
| abstract_inverted_index.nature | 6 |
| abstract_inverted_index.needed | 245, 292 |
| abstract_inverted_index.recent | 66 |
| abstract_inverted_index.refine | 294 |
| abstract_inverted_index.remain | 284 |
| abstract_inverted_index.review | 18, 55, 102, 157, 259 |
| abstract_inverted_index.Digital | 115 |
| abstract_inverted_index.Embase, | 112 |
| abstract_inverted_index.Further | 241, 289 |
| abstract_inverted_index.Methods | 88 |
| abstract_inverted_index.PubMed, | 106 |
| abstract_inverted_index.Results | 140 |
| abstract_inverted_index.Reviews | 95 |
| abstract_inverted_index.Scopus, | 108 |
| abstract_inverted_index.Studies | 117 |
| abstract_inverted_index.applied | 126 |
| abstract_inverted_index.article | 138 |
| abstract_inverted_index.between | 147 |
| abstract_inverted_index.binding | 180 |
| abstract_inverted_index.complex | 5, 219 |
| abstract_inverted_index.current | 250 |
| abstract_inverted_index.discuss | 247 |
| abstract_inverted_index.enhance | 164 |
| abstract_inverted_index.focused | 122 |
| abstract_inverted_index.gaining | 68 |
| abstract_inverted_index.genomic | 9, 40, 83, 128, 170, 193, 209, 239, 269 |
| abstract_inverted_index.growing | 262 |
| abstract_inverted_index.improve | 276 |
| abstract_inverted_index.models, | 46 |
| abstract_inverted_index.natural | 25 |
| abstract_inverted_index.review. | 155 |
| abstract_inverted_index.scoping | 101 |
| abstract_inverted_index.studies | 145 |
| abstract_inverted_index.without | 132 |
| abstract_inverted_index.(PRISMA) | 98 |
| abstract_inverted_index.Abstract | 0 |
| abstract_inverted_index.Library. | 116 |
| abstract_inverted_index.Medline, | 107 |
| abstract_inverted_index.Science, | 111 |
| abstract_inverted_index.existing | 74 |
| abstract_inverted_index.focusing | 42 |
| abstract_inverted_index.included | 119 |
| abstract_inverted_index.language | 26, 32 |
| abstract_inverted_index.medicine | 230 |
| abstract_inverted_index.offering | 232 |
| abstract_inverted_index.overcome | 249 |
| abstract_inverted_index.presents | 12 |
| abstract_inverted_index.research | 242, 290 |
| abstract_inverted_index.scalable | 236 |
| abstract_inverted_index.selected | 153 |
| abstract_inverted_index.Following | 89 |
| abstract_inverted_index.Materials | 86 |
| abstract_inverted_index.Preferred | 90 |
| abstract_inverted_index.Reporting | 91 |
| abstract_inverted_index.analysis, | 131 |
| abstract_inverted_index.analysis. | 16, 240, 272 |
| abstract_inverted_index.chromatin | 183 |
| abstract_inverted_index.conducted | 104 |
| abstract_inverted_index.effective | 15 |
| abstract_inverted_index.efficient | 234 |
| abstract_inverted_index.enhancing | 252 |
| abstract_inverted_index.genomics. | 298 |
| abstract_inverted_index.potential | 224 |
| abstract_inverted_index.promising | 199 |
| abstract_inverted_index.providing | 213 |
| abstract_inverted_index.published | 146 |
| abstract_inverted_index.solutions | 237 |
| abstract_inverted_index.Conclusion | 257 |
| abstract_inverted_index.Discussion | 185 |
| abstract_inverted_index.Objectives | 1 |
| abstract_inverted_index.Systematic | 94 |
| abstract_inverted_index.annotation | 49, 281 |
| abstract_inverted_index.challenges | 13, 283 |
| abstract_inverted_index.highlights | 158, 260 |
| abstract_inverted_index.predicting | 175 |
| abstract_inverted_index.processing | 27, 82, 166, 206, 278 |
| abstract_inverted_index.regulatory | 48, 176, 280 |
| abstract_inverted_index.sequencing | 10, 84, 129, 194, 270 |
| abstract_inverted_index.streamline | 204 |
| abstract_inverted_index.annotations | 177 |
| abstract_inverted_index.application | 23, 187, 296 |
| abstract_inverted_index.constraints | 77 |
| abstract_inverted_index.deciphering | 39 |
| abstract_inverted_index.guidelines, | 99 |
| abstract_inverted_index.investigate | 21 |
| abstract_inverted_index.large-scale | 208 |
| abstract_inverted_index.literature, | 67 |
| abstract_inverted_index.prediction, | 282 |
| abstract_inverted_index.prediction. | 50 |
| abstract_inverted_index.publication | 135 |
| abstract_inverted_index.structures. | 220 |
| abstract_inverted_index.techniques, | 29 |
| abstract_inverted_index.transformer | 36, 45, 162 |
| abstract_inverted_index.advancements | 227 |
| abstract_inverted_index.applications | 173 |
| abstract_inverted_index.capabilities | 75 |
| abstract_inverted_index.limitations, | 251 |
| abstract_inverted_index.particularly | 30, 266 |
| abstract_inverted_index.personalized | 229 |
| abstract_inverted_index.restrictions | 133 |
| abstract_inverted_index.tokenization | 160 |
| abstract_inverted_index.transparency | 254 |
| abstract_inverted_index.Meta-Analyses | 97 |
| abstract_inverted_index.accessibility | 62, 286 |
| abstract_inverted_index.methodologies | 125 |
| abstract_inverted_index.tokenization, | 44 |
| abstract_inverted_index.understanding | 71, 168, 216 |
| abstract_inverted_index.accessibility. | 184 |
| abstract_inverted_index.applicability. | 256 |
| abstract_inverted_index.architectures, | 37 |
| abstract_inverted_index.interpretation | 196 |
| abstract_inverted_index.interpretability. | 288 |
| abstract_inverted_index.transcription-factor | 179 |
| cited_by_percentile_year.max | 97 |
| cited_by_percentile_year.min | 96 |
| countries_distinct_count | 1 |
| institutions_distinct_count | 7 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.4399999976158142 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile.value | 0.92355576 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | True |