Deciphering genomic codes using advanced NLP techniques: a scoping review. Article Swipe
Shuyan Cheng
,
Yishu Wei
,
Yiliang Zhou
,
Zihan Xu
,
Drew Wright
,
Jinze Liu
,
Yifan Peng
·
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.1093/jamia/ocaf029
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.1093/jamia/ocaf029
The application of NLP and LLMs to genomic sequencing data interpretation is a promising field that can help streamline the processing of large-scale genomic data while providing a better understanding of its complex structures. It can potentially drive advancements in personalized medicine by offering more efficient and scalable solutions for genomic analysis. Further research is needed to discuss and overcome limitations, enhancing model transparency and applicability.
Related Topics
Concepts
Metadata
- Type
- preprint
- Language
- en
- Landing Page
- https://pubmed.ncbi.nlm.nih.gov/39650606
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4404987130
All OpenAlex metadata
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4404987130Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1093/jamia/ocaf029Digital Object Identifier
- Title
-
Deciphering genomic codes using advanced NLP techniques: a scoping review.Work title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-11-25Full publication date if available
- Authors
-
Shuyan Cheng, Yishu Wei, Yiliang Zhou, Zihan Xu, Drew Wright, Jinze Liu, Yifan PengList of authors in order
- Landing page
-
https://pubmed.ncbi.nlm.nih.gov/39650606Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2411.16084Direct OA link when available
- Concepts
-
Computer science, Data science, Lexical analysis, Annotation, Scalability, Artificial intelligence, Data mining, Computational biology, Machine learning, Natural language processing, Biology, DatabaseTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4404987130 |
|---|---|
| doi | https://doi.org/10.1093/jamia/ocaf029 |
| ids.pmid | https://pubmed.ncbi.nlm.nih.gov/39650606 |
| ids.openalex | https://openalex.org/W4404987130 |
| fwci | 0.63877855 |
| type | preprint |
| title | Deciphering genomic codes using advanced NLP techniques: a scoping review. |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10028 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9071000218391418 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Topic Modeling |
| topics[1].id | https://openalex.org/T11710 |
| topics[1].field.id | https://openalex.org/fields/13 |
| topics[1].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[1].score | 0.9021999835968018 |
| topics[1].domain.id | https://openalex.org/domains/1 |
| topics[1].domain.display_name | Life Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1312 |
| topics[1].subfield.display_name | Molecular Biology |
| topics[1].display_name | Biomedical Text Mining and Ontologies |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7023876905441284 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C2522767166 |
| concepts[1].level | 1 |
| concepts[1].score | 0.6360440850257874 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q2374463 |
| concepts[1].display_name | Data science |
| concepts[2].id | https://openalex.org/C176982825 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6313741207122803 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q835922 |
| concepts[2].display_name | Lexical analysis |
| concepts[3].id | https://openalex.org/C2776321320 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5201271176338196 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q857525 |
| concepts[3].display_name | Annotation |
| concepts[4].id | https://openalex.org/C48044578 |
| concepts[4].level | 2 |
| concepts[4].score | 0.48087289929389954 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q727490 |
| concepts[4].display_name | Scalability |
| concepts[5].id | https://openalex.org/C154945302 |
| concepts[5].level | 1 |
| concepts[5].score | 0.42402997612953186 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[5].display_name | Artificial intelligence |
| concepts[6].id | https://openalex.org/C124101348 |
| concepts[6].level | 1 |
| concepts[6].score | 0.34676700830459595 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q172491 |
| concepts[6].display_name | Data mining |
| concepts[7].id | https://openalex.org/C70721500 |
| concepts[7].level | 1 |
| concepts[7].score | 0.3353736102581024 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q177005 |
| concepts[7].display_name | Computational biology |
| concepts[8].id | https://openalex.org/C119857082 |
| concepts[8].level | 1 |
| concepts[8].score | 0.3251814544200897 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[8].display_name | Machine learning |
| concepts[9].id | https://openalex.org/C204321447 |
| concepts[9].level | 1 |
| concepts[9].score | 0.32182562351226807 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[9].display_name | Natural language processing |
| concepts[10].id | https://openalex.org/C86803240 |
| concepts[10].level | 0 |
| concepts[10].score | 0.15403631329536438 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[10].display_name | Biology |
| concepts[11].id | https://openalex.org/C77088390 |
| concepts[11].level | 1 |
| concepts[11].score | 0.13875946402549744 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q8513 |
| concepts[11].display_name | Database |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7023876905441284 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/data-science |
| keywords[1].score | 0.6360440850257874 |
| keywords[1].display_name | Data science |
| keywords[2].id | https://openalex.org/keywords/lexical-analysis |
| keywords[2].score | 0.6313741207122803 |
| keywords[2].display_name | Lexical analysis |
| keywords[3].id | https://openalex.org/keywords/annotation |
| keywords[3].score | 0.5201271176338196 |
| keywords[3].display_name | Annotation |
| keywords[4].id | https://openalex.org/keywords/scalability |
| keywords[4].score | 0.48087289929389954 |
| keywords[4].display_name | Scalability |
| keywords[5].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[5].score | 0.42402997612953186 |
| keywords[5].display_name | Artificial intelligence |
| keywords[6].id | https://openalex.org/keywords/data-mining |
| keywords[6].score | 0.34676700830459595 |
| keywords[6].display_name | Data mining |
| keywords[7].id | https://openalex.org/keywords/computational-biology |
| keywords[7].score | 0.3353736102581024 |
| keywords[7].display_name | Computational biology |
| keywords[8].id | https://openalex.org/keywords/machine-learning |
| keywords[8].score | 0.3251814544200897 |
| keywords[8].display_name | Machine learning |
| keywords[9].id | https://openalex.org/keywords/natural-language-processing |
| keywords[9].score | 0.32182562351226807 |
| keywords[9].display_name | Natural language processing |
| keywords[10].id | https://openalex.org/keywords/biology |
| keywords[10].score | 0.15403631329536438 |
| keywords[10].display_name | Biology |
| keywords[11].id | https://openalex.org/keywords/database |
| keywords[11].score | 0.13875946402549744 |
| keywords[11].display_name | Database |
| language | en |
| locations[0].id | pmid:39650606 |
| locations[0].is_oa | False |
| locations[0].source.id | https://openalex.org/S4306525036 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | False |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | PubMed |
| locations[0].source.host_organization | https://openalex.org/I1299303238 |
| locations[0].source.host_organization_name | National Institutes of Health |
| locations[0].source.host_organization_lineage | https://openalex.org/I1299303238 |
| locations[0].license | |
| locations[0].pdf_url | |
| locations[0].version | publishedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | ArXiv |
| locations[0].landing_page_url | https://pubmed.ncbi.nlm.nih.gov/39650606 |
| locations[1].id | pmh:oai:arXiv.org:2411.16084 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | https://arxiv.org/pdf/2411.16084 |
| locations[1].version | submittedVersion |
| locations[1].raw_type | text |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | False |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | http://arxiv.org/abs/2411.16084 |
| locations[2].id | pmh:oai:pubmedcentral.nih.gov:11623714 |
| locations[2].is_oa | True |
| locations[2].source.id | https://openalex.org/S2764455111 |
| locations[2].source.issn | |
| locations[2].source.type | repository |
| locations[2].source.is_oa | False |
| locations[2].source.issn_l | |
| locations[2].source.is_core | False |
| locations[2].source.is_in_doaj | False |
| locations[2].source.display_name | PubMed Central |
| locations[2].source.host_organization | https://openalex.org/I1299303238 |
| locations[2].source.host_organization_name | National Institutes of Health |
| locations[2].source.host_organization_lineage | https://openalex.org/I1299303238 |
| locations[2].license | cc-by |
| locations[2].pdf_url | |
| locations[2].version | submittedVersion |
| locations[2].raw_type | Text |
| locations[2].license_id | https://openalex.org/licenses/cc-by |
| locations[2].is_accepted | False |
| locations[2].is_published | False |
| locations[2].raw_source_name | ArXiv |
| locations[2].landing_page_url | https://www.ncbi.nlm.nih.gov/pmc/articles/11623714 |
| indexed_in | arxiv, pubmed |
| authorships[0].author.id | https://openalex.org/A5032793336 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-4533-5942 |
| authorships[0].author.display_name | Shuyan Cheng |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Shuyan Cheng |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5002546620 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-6446-0514 |
| authorships[1].author.display_name | Yishu Wei |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yishu Wei |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5030407665 |
| authorships[2].author.orcid | https://orcid.org/0009-0002-7457-7075 |
| authorships[2].author.display_name | Yiliang Zhou |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Yiliang Zhou |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5021399686 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-5632-0439 |
| authorships[3].author.display_name | Zihan Xu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zihan Xu |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5067831441 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-1776-5427 |
| authorships[4].author.display_name | Drew Wright |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Drew N Wright |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5021864206 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-0692-8793 |
| authorships[5].author.display_name | Jinze Liu |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Jinze Liu |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5033862822 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-8581-8674 |
| authorships[6].author.display_name | Yifan Peng |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Yifan Peng |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2411.16084 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Deciphering genomic codes using advanced NLP techniques: a scoping review. |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T04:12:42.849631 |
| primary_topic.id | https://openalex.org/T10028 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9071000218391418 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Topic Modeling |
| related_works | https://openalex.org/W2361861616, https://openalex.org/W2263699433, https://openalex.org/W2377979023, https://openalex.org/W2218034408, https://openalex.org/W4300598845, https://openalex.org/W2601638452, https://openalex.org/W2285263069, https://openalex.org/W4376107815, https://openalex.org/W4319309671, https://openalex.org/W4319309603 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 3 |
| best_oa_location.id | pmh:oai:arXiv.org:2411.16084 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2411.16084 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2411.16084 |
| primary_location.id | pmid:39650606 |
| primary_location.is_oa | False |
| primary_location.source.id | https://openalex.org/S4306525036 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | False |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | PubMed |
| primary_location.source.host_organization | https://openalex.org/I1299303238 |
| primary_location.source.host_organization_name | National Institutes of Health |
| primary_location.source.host_organization_lineage | https://openalex.org/I1299303238 |
| primary_location.license | |
| primary_location.pdf_url | |
| primary_location.version | publishedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | ArXiv |
| primary_location.landing_page_url | https://pubmed.ncbi.nlm.nih.gov/39650606 |
| publication_date | 2024-11-25 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 12, 27 |
| abstract_inverted_index.It | 34 |
| abstract_inverted_index.by | 42 |
| abstract_inverted_index.in | 39 |
| abstract_inverted_index.is | 11, 54 |
| abstract_inverted_index.of | 2, 21, 30 |
| abstract_inverted_index.to | 6, 56 |
| abstract_inverted_index.NLP | 3 |
| abstract_inverted_index.The | 0 |
| abstract_inverted_index.and | 4, 46, 58, 64 |
| abstract_inverted_index.can | 16, 35 |
| abstract_inverted_index.for | 49 |
| abstract_inverted_index.its | 31 |
| abstract_inverted_index.the | 19 |
| abstract_inverted_index.LLMs | 5 |
| abstract_inverted_index.data | 9, 24 |
| abstract_inverted_index.help | 17 |
| abstract_inverted_index.more | 44 |
| abstract_inverted_index.that | 15 |
| abstract_inverted_index.drive | 37 |
| abstract_inverted_index.field | 14 |
| abstract_inverted_index.model | 62 |
| abstract_inverted_index.while | 25 |
| abstract_inverted_index.better | 28 |
| abstract_inverted_index.needed | 55 |
| abstract_inverted_index.Further | 52 |
| abstract_inverted_index.complex | 32 |
| abstract_inverted_index.discuss | 57 |
| abstract_inverted_index.genomic | 7, 23, 50 |
| abstract_inverted_index.medicine | 41 |
| abstract_inverted_index.offering | 43 |
| abstract_inverted_index.overcome | 59 |
| abstract_inverted_index.research | 53 |
| abstract_inverted_index.scalable | 47 |
| abstract_inverted_index.analysis. | 51 |
| abstract_inverted_index.efficient | 45 |
| abstract_inverted_index.enhancing | 61 |
| abstract_inverted_index.promising | 13 |
| abstract_inverted_index.providing | 26 |
| abstract_inverted_index.solutions | 48 |
| abstract_inverted_index.processing | 20 |
| abstract_inverted_index.sequencing | 8 |
| abstract_inverted_index.streamline | 18 |
| abstract_inverted_index.application | 1 |
| abstract_inverted_index.large-scale | 22 |
| abstract_inverted_index.potentially | 36 |
| abstract_inverted_index.structures. | 33 |
| abstract_inverted_index.advancements | 38 |
| abstract_inverted_index.limitations, | 60 |
| abstract_inverted_index.personalized | 40 |
| abstract_inverted_index.transparency | 63 |
| abstract_inverted_index.understanding | 29 |
| abstract_inverted_index.applicability. | 65 |
| abstract_inverted_index.interpretation | 10 |
| cited_by_percentile_year.max | 95 |
| cited_by_percentile_year.min | 91 |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.5 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile.value | 0.73727151 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |