A rule-free workflow for the automated generation of databases from scientific literature Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.1038/s41524-023-01171-9
In recent times, transformer networks have achieved state-of-the-art performance in a wide range of natural language processing tasks. Here we present a workflow based on the fine-tuning of BERT models for different downstream tasks, which results in the automated extraction of structured information from unstructured natural language in scientific literature. Contrary to existing methods for the automated extraction of structured compound-property relations from similar sources, our workflow does not rely on the definition of intricate grammar rules. Hence, it can be adapted to a new task without requiring extensive implementation efforts and knowledge. We test our data-extraction workflow by automatically generating a database for Curie temperatures and one for band gaps. These are then compared with manually curated datasets and with those obtained with a state-of-the-art rule-based method. Furthermore, in order to showcase the practical utility of the automatically extracted data in a material-design workflow, we employ them to construct machine-learning models to predict Curie temperatures and band gaps. In general, we find that, although more noisy, automatically extracted datasets can grow fast in volume and that such volume partially compensates for the inaccuracy in downstream tasks.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.1038/s41524-023-01171-9
- https://www.nature.com/articles/s41524-023-01171-9.pdf
- OA Status
- gold
- Cited By
- 26
- References
- 58
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4389686182
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4389686182Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1038/s41524-023-01171-9Digital Object Identifier
- Title
-
A rule-free workflow for the automated generation of databases from scientific literatureWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-12-13Full publication date if available
- Authors
-
Luke P. J. Gilligan, Matteo Cobelli, Valentin Taufour, Stefano SanvitoList of authors in order
- Landing page
-
https://doi.org/10.1038/s41524-023-01171-9Publisher landing page
- PDF URL
-
https://www.nature.com/articles/s41524-023-01171-9.pdfDirect link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://www.nature.com/articles/s41524-023-01171-9.pdfDirect OA link when available
- Concepts
-
Workflow, Computer science, Information extraction, Natural language processing, Artificial intelligence, Data mining, Transformer, Database, Volume (thermodynamics), Information retrieval, Machine learning, Physics, Voltage, Quantum mechanicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
26Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 17, 2024: 9Per-year citation counts (last 5 years)
- References (count)
-
58Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4389686182 |
|---|---|
| doi | https://doi.org/10.1038/s41524-023-01171-9 |
| ids.doi | https://doi.org/10.1038/s41524-023-01171-9 |
| ids.pmid | https://pubmed.ncbi.nlm.nih.gov/38666056 |
| ids.openalex | https://openalex.org/W4389686182 |
| fwci | 3.48439247 |
| type | article |
| title | A rule-free workflow for the automated generation of databases from scientific literature |
| awards[0].id | https://openalex.org/G7677830966 |
| awards[0].funder_id | https://openalex.org/F4320320847 |
| awards[0].display_name | |
| awards[0].funder_award_id | 12/RC/2278−P2 |
| awards[0].funder_display_name | Science Foundation Ireland |
| awards[1].id | https://openalex.org/G7046851851 |
| awards[1].funder_id | https://openalex.org/F4320332276 |
| awards[1].display_name | |
| awards[1].funder_award_id | Critical Materials Institute |
| awards[1].funder_display_name | Advanced Research Projects Agency - Energy |
| biblio.issue | 1 |
| biblio.volume | 9 |
| biblio.last_page | 222 |
| biblio.first_page | 222 |
| topics[0].id | https://openalex.org/T11948 |
| topics[0].field.id | https://openalex.org/fields/25 |
| topics[0].field.display_name | Materials Science |
| topics[0].score | 1.0 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2505 |
| topics[0].subfield.display_name | Materials Chemistry |
| topics[0].display_name | Machine Learning in Materials Science |
| topics[1].id | https://openalex.org/T10260 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9850999712944031 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1710 |
| topics[1].subfield.display_name | Information Systems |
| topics[1].display_name | Software Engineering Research |
| topics[2].id | https://openalex.org/T10028 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9836000204086304 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Topic Modeling |
| funders[0].id | https://openalex.org/F4320320847 |
| funders[0].ror | https://ror.org/0271asj38 |
| funders[0].display_name | Science Foundation Ireland |
| funders[1].id | https://openalex.org/F4320321056 |
| funders[1].ror | https://ror.org/051xex213 |
| funders[1].display_name | Irish Research Council |
| funders[2].id | https://openalex.org/F4320332276 |
| funders[2].ror | https://ror.org/03q1rgc19 |
| funders[2].display_name | Advanced Research Projects Agency - Energy |
| is_xpac | False |
| apc_list.value | 2890 |
| apc_list.currency | USD |
| apc_list.value_usd | 2890 |
| apc_paid.value | 2890 |
| apc_paid.currency | USD |
| apc_paid.value_usd | 2890 |
| concepts[0].id | https://openalex.org/C177212765 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8331782221794128 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q627335 |
| concepts[0].display_name | Workflow |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.782994270324707 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C195807954 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5357235074043274 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1662562 |
| concepts[2].display_name | Information extraction |
| concepts[3].id | https://openalex.org/C204321447 |
| concepts[3].level | 1 |
| concepts[3].score | 0.4905978739261627 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[3].display_name | Natural language processing |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.47902289032936096 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C124101348 |
| concepts[5].level | 1 |
| concepts[5].score | 0.4601709246635437 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q172491 |
| concepts[5].display_name | Data mining |
| concepts[6].id | https://openalex.org/C66322947 |
| concepts[6].level | 3 |
| concepts[6].score | 0.45824122428894043 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11658 |
| concepts[6].display_name | Transformer |
| concepts[7].id | https://openalex.org/C77088390 |
| concepts[7].level | 1 |
| concepts[7].score | 0.4506323039531708 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q8513 |
| concepts[7].display_name | Database |
| concepts[8].id | https://openalex.org/C20556612 |
| concepts[8].level | 2 |
| concepts[8].score | 0.4122558534145355 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q4469374 |
| concepts[8].display_name | Volume (thermodynamics) |
| concepts[9].id | https://openalex.org/C23123220 |
| concepts[9].level | 1 |
| concepts[9].score | 0.34358346462249756 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q816826 |
| concepts[9].display_name | Information retrieval |
| concepts[10].id | https://openalex.org/C119857082 |
| concepts[10].level | 1 |
| concepts[10].score | 0.32674431800842285 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[10].display_name | Machine learning |
| concepts[11].id | https://openalex.org/C121332964 |
| concepts[11].level | 0 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[11].display_name | Physics |
| concepts[12].id | https://openalex.org/C165801399 |
| concepts[12].level | 2 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q25428 |
| concepts[12].display_name | Voltage |
| concepts[13].id | https://openalex.org/C62520636 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q944 |
| concepts[13].display_name | Quantum mechanics |
| keywords[0].id | https://openalex.org/keywords/workflow |
| keywords[0].score | 0.8331782221794128 |
| keywords[0].display_name | Workflow |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.782994270324707 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/information-extraction |
| keywords[2].score | 0.5357235074043274 |
| keywords[2].display_name | Information extraction |
| keywords[3].id | https://openalex.org/keywords/natural-language-processing |
| keywords[3].score | 0.4905978739261627 |
| keywords[3].display_name | Natural language processing |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.47902289032936096 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/data-mining |
| keywords[5].score | 0.4601709246635437 |
| keywords[5].display_name | Data mining |
| keywords[6].id | https://openalex.org/keywords/transformer |
| keywords[6].score | 0.45824122428894043 |
| keywords[6].display_name | Transformer |
| keywords[7].id | https://openalex.org/keywords/database |
| keywords[7].score | 0.4506323039531708 |
| keywords[7].display_name | Database |
| keywords[8].id | https://openalex.org/keywords/volume |
| keywords[8].score | 0.4122558534145355 |
| keywords[8].display_name | Volume (thermodynamics) |
| keywords[9].id | https://openalex.org/keywords/information-retrieval |
| keywords[9].score | 0.34358346462249756 |
| keywords[9].display_name | Information retrieval |
| keywords[10].id | https://openalex.org/keywords/machine-learning |
| keywords[10].score | 0.32674431800842285 |
| keywords[10].display_name | Machine learning |
| language | en |
| locations[0].id | doi:10.1038/s41524-023-01171-9 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4210232664 |
| locations[0].source.issn | 2057-3960 |
| locations[0].source.type | journal |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | 2057-3960 |
| locations[0].source.is_core | True |
| locations[0].source.is_in_doaj | True |
| locations[0].source.display_name | npj Computational Materials |
| locations[0].source.host_organization | https://openalex.org/P4310319908 |
| locations[0].source.host_organization_name | Nature Portfolio |
| locations[0].source.host_organization_lineage | https://openalex.org/P4310319908, https://openalex.org/P4310319965 |
| locations[0].source.host_organization_lineage_names | Nature Portfolio, Springer Nature |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://www.nature.com/articles/s41524-023-01171-9.pdf |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | npj Computational Materials |
| locations[0].landing_page_url | https://doi.org/10.1038/s41524-023-01171-9 |
| locations[1].id | pmid:38666056 |
| locations[1].is_oa | False |
| locations[1].source.id | https://openalex.org/S4306525036 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | False |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | PubMed |
| locations[1].source.host_organization | https://openalex.org/I1299303238 |
| locations[1].source.host_organization_name | National Institutes of Health |
| locations[1].source.host_organization_lineage | https://openalex.org/I1299303238 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | publishedVersion |
| locations[1].raw_type | |
| locations[1].license_id | |
| locations[1].is_accepted | True |
| locations[1].is_published | True |
| locations[1].raw_source_name | npj computational materials |
| locations[1].landing_page_url | https://pubmed.ncbi.nlm.nih.gov/38666056 |
| locations[2].id | pmh:oai:doaj.org/article:9afe7a8212f3457fb9064bf7ea8c8b9a |
| locations[2].is_oa | False |
| locations[2].source.id | https://openalex.org/S4306401280 |
| locations[2].source.issn | |
| locations[2].source.type | repository |
| locations[2].source.is_oa | False |
| locations[2].source.issn_l | |
| locations[2].source.is_core | False |
| locations[2].source.is_in_doaj | False |
| locations[2].source.display_name | DOAJ (DOAJ: Directory of Open Access Journals) |
| locations[2].source.host_organization | |
| locations[2].source.host_organization_name | |
| locations[2].license | |
| locations[2].pdf_url | |
| locations[2].version | submittedVersion |
| locations[2].raw_type | article |
| locations[2].license_id | |
| locations[2].is_accepted | False |
| locations[2].is_published | False |
| locations[2].raw_source_name | npj Computational Materials, Vol 9, Iss 1, Pp 1-14 (2023) |
| locations[2].landing_page_url | https://doaj.org/article/9afe7a8212f3457fb9064bf7ea8c8b9a |
| locations[3].id | pmh:oai:escholarship.org:ark:/13030/qt5mb1p1pw |
| locations[3].is_oa | True |
| locations[3].source | |
| locations[3].license | cc-by |
| locations[3].pdf_url | |
| locations[3].version | submittedVersion |
| locations[3].raw_type | article |
| locations[3].license_id | https://openalex.org/licenses/cc-by |
| locations[3].is_accepted | False |
| locations[3].is_published | False |
| locations[3].raw_source_name | npj Computational Materials, vol 9, iss 1 |
| locations[3].landing_page_url | https://escholarship.org/uc/item/5mb1p1pw |
| locations[4].id | pmh:oai:osti.gov:2229779 |
| locations[4].is_oa | True |
| locations[4].source.id | https://openalex.org/S4306402487 |
| locations[4].source.issn | |
| locations[4].source.type | repository |
| locations[4].source.is_oa | False |
| locations[4].source.issn_l | |
| locations[4].source.is_core | False |
| locations[4].source.is_in_doaj | False |
| locations[4].source.display_name | OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information) |
| locations[4].source.host_organization | https://openalex.org/I139351228 |
| locations[4].source.host_organization_name | Office of Scientific and Technical Information |
| locations[4].source.host_organization_lineage | https://openalex.org/I139351228 |
| locations[4].license | |
| locations[4].pdf_url | |
| locations[4].version | submittedVersion |
| locations[4].raw_type | |
| locations[4].license_id | |
| locations[4].is_accepted | False |
| locations[4].is_published | False |
| locations[4].raw_source_name | |
| locations[4].landing_page_url | https://www.osti.gov/biblio/2229779 |
| locations[5].id | pmh:oai:osti.gov:2469936 |
| locations[5].is_oa | True |
| locations[5].source.id | https://openalex.org/S4306402487 |
| locations[5].source.issn | |
| locations[5].source.type | repository |
| locations[5].source.is_oa | False |
| locations[5].source.issn_l | |
| locations[5].source.is_core | False |
| locations[5].source.is_in_doaj | False |
| locations[5].source.display_name | OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information) |
| locations[5].source.host_organization | https://openalex.org/I139351228 |
| locations[5].source.host_organization_name | Office of Scientific and Technical Information |
| locations[5].source.host_organization_lineage | https://openalex.org/I139351228 |
| locations[5].license | |
| locations[5].pdf_url | |
| locations[5].version | submittedVersion |
| locations[5].raw_type | |
| locations[5].license_id | |
| locations[5].is_accepted | False |
| locations[5].is_published | False |
| locations[5].raw_source_name | |
| locations[5].landing_page_url | https://www.osti.gov/biblio/2469936 |
| locations[6].id | pmh:oai:pubmedcentral.nih.gov:11041762 |
| locations[6].is_oa | True |
| locations[6].source.id | https://openalex.org/S2764455111 |
| locations[6].source.issn | |
| locations[6].source.type | repository |
| locations[6].source.is_oa | False |
| locations[6].source.issn_l | |
| locations[6].source.is_core | False |
| locations[6].source.is_in_doaj | False |
| locations[6].source.display_name | PubMed Central |
| locations[6].source.host_organization | https://openalex.org/I1299303238 |
| locations[6].source.host_organization_name | National Institutes of Health |
| locations[6].source.host_organization_lineage | https://openalex.org/I1299303238 |
| locations[6].license | other-oa |
| locations[6].pdf_url | |
| locations[6].version | submittedVersion |
| locations[6].raw_type | Text |
| locations[6].license_id | https://openalex.org/licenses/other-oa |
| locations[6].is_accepted | False |
| locations[6].is_published | False |
| locations[6].raw_source_name | NPJ Comput Mater |
| locations[6].landing_page_url | https://www.ncbi.nlm.nih.gov/pmc/articles/11041762 |
| indexed_in | crossref, doaj, pubmed |
| authorships[0].author.id | https://openalex.org/A5074473093 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-4139-7801 |
| authorships[0].author.display_name | Luke P. J. Gilligan |
| authorships[0].countries | IE |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I205274468, https://openalex.org/I4210112033 |
| authorships[0].affiliations[0].raw_affiliation_string | School of Physics, AMBER and CRANN Institute, Trinity College, Dublin 2, Dublin, Ireland |
| authorships[0].institutions[0].id | https://openalex.org/I4210112033 |
| authorships[0].institutions[0].ror | https://ror.org/01c4rxk68 |
| authorships[0].institutions[0].type | facility |
| authorships[0].institutions[0].lineage | https://openalex.org/I142762351, https://openalex.org/I188760350, https://openalex.org/I205274468, https://openalex.org/I230495080, https://openalex.org/I27577105, https://openalex.org/I4210112033, https://openalex.org/I42934936 |
| authorships[0].institutions[0].country_code | IE |
| authorships[0].institutions[0].display_name | Advanced Materials and BioEngineering Research |
| authorships[0].institutions[1].id | https://openalex.org/I205274468 |
| authorships[0].institutions[1].ror | https://ror.org/02tyrky19 |
| authorships[0].institutions[1].type | education |
| authorships[0].institutions[1].lineage | https://openalex.org/I205274468 |
| authorships[0].institutions[1].country_code | IE |
| authorships[0].institutions[1].display_name | Trinity College Dublin |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Luke P. J. Gilligan |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | School of Physics, AMBER and CRANN Institute, Trinity College, Dublin 2, Dublin, Ireland |
| authorships[1].author.id | https://openalex.org/A5050098484 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-6829-2754 |
| authorships[1].author.display_name | Matteo Cobelli |
| authorships[1].countries | IE |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I205274468, https://openalex.org/I4210112033 |
| authorships[1].affiliations[0].raw_affiliation_string | School of Physics, AMBER and CRANN Institute, Trinity College, Dublin 2, Dublin, Ireland |
| authorships[1].institutions[0].id | https://openalex.org/I4210112033 |
| authorships[1].institutions[0].ror | https://ror.org/01c4rxk68 |
| authorships[1].institutions[0].type | facility |
| authorships[1].institutions[0].lineage | https://openalex.org/I142762351, https://openalex.org/I188760350, https://openalex.org/I205274468, https://openalex.org/I230495080, https://openalex.org/I27577105, https://openalex.org/I4210112033, https://openalex.org/I42934936 |
| authorships[1].institutions[0].country_code | IE |
| authorships[1].institutions[0].display_name | Advanced Materials and BioEngineering Research |
| authorships[1].institutions[1].id | https://openalex.org/I205274468 |
| authorships[1].institutions[1].ror | https://ror.org/02tyrky19 |
| authorships[1].institutions[1].type | education |
| authorships[1].institutions[1].lineage | https://openalex.org/I205274468 |
| authorships[1].institutions[1].country_code | IE |
| authorships[1].institutions[1].display_name | Trinity College Dublin |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Matteo Cobelli |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | School of Physics, AMBER and CRANN Institute, Trinity College, Dublin 2, Dublin, Ireland |
| authorships[2].author.id | https://openalex.org/A5083481154 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-0024-9960 |
| authorships[2].author.display_name | Valentin Taufour |
| authorships[2].countries | US |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I84218800 |
| authorships[2].affiliations[0].raw_affiliation_string | Department of Physics and Astronomy, University of California, Davis, CA, 95616, USA |
| authorships[2].institutions[0].id | https://openalex.org/I84218800 |
| authorships[2].institutions[0].ror | https://ror.org/05rrcem69 |
| authorships[2].institutions[0].type | education |
| authorships[2].institutions[0].lineage | https://openalex.org/I84218800 |
| authorships[2].institutions[0].country_code | US |
| authorships[2].institutions[0].display_name | University of California, Davis |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Valentin Taufour |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | Department of Physics and Astronomy, University of California, Davis, CA, 95616, USA |
| authorships[3].author.id | https://openalex.org/A5049903688 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-0291-715X |
| authorships[3].author.display_name | Stefano Sanvito |
| authorships[3].countries | IE |
| authorships[3].affiliations[0].institution_ids | https://openalex.org/I205274468, https://openalex.org/I4210112033 |
| authorships[3].affiliations[0].raw_affiliation_string | School of Physics, AMBER and CRANN Institute, Trinity College, Dublin 2, Dublin, Ireland |
| authorships[3].institutions[0].id | https://openalex.org/I4210112033 |
| authorships[3].institutions[0].ror | https://ror.org/01c4rxk68 |
| authorships[3].institutions[0].type | facility |
| authorships[3].institutions[0].lineage | https://openalex.org/I142762351, https://openalex.org/I188760350, https://openalex.org/I205274468, https://openalex.org/I230495080, https://openalex.org/I27577105, https://openalex.org/I4210112033, https://openalex.org/I42934936 |
| authorships[3].institutions[0].country_code | IE |
| authorships[3].institutions[0].display_name | Advanced Materials and BioEngineering Research |
| authorships[3].institutions[1].id | https://openalex.org/I205274468 |
| authorships[3].institutions[1].ror | https://ror.org/02tyrky19 |
| authorships[3].institutions[1].type | education |
| authorships[3].institutions[1].lineage | https://openalex.org/I205274468 |
| authorships[3].institutions[1].country_code | IE |
| authorships[3].institutions[1].display_name | Trinity College Dublin |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Stefano Sanvito |
| authorships[3].is_corresponding | False |
| authorships[3].raw_affiliation_strings | School of Physics, AMBER and CRANN Institute, Trinity College, Dublin 2, Dublin, Ireland |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://www.nature.com/articles/s41524-023-01171-9.pdf |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | A rule-free workflow for the automated generation of databases from scientific literature |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T11948 |
| primary_topic.field.id | https://openalex.org/fields/25 |
| primary_topic.field.display_name | Materials Science |
| primary_topic.score | 1.0 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2505 |
| primary_topic.subfield.display_name | Materials Chemistry |
| primary_topic.display_name | Machine Learning in Materials Science |
| related_works | https://openalex.org/W1981780420, https://openalex.org/W2182707996, https://openalex.org/W45233828, https://openalex.org/W2964988449, https://openalex.org/W2397952901, https://openalex.org/W2029380707, https://openalex.org/W188202134, https://openalex.org/W4255934811, https://openalex.org/W2465382974, https://openalex.org/W2010229520 |
| cited_by_count | 26 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 17 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 9 |
| locations_count | 7 |
| best_oa_location.id | doi:10.1038/s41524-023-01171-9 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4210232664 |
| best_oa_location.source.issn | 2057-3960 |
| best_oa_location.source.type | journal |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | 2057-3960 |
| best_oa_location.source.is_core | True |
| best_oa_location.source.is_in_doaj | True |
| best_oa_location.source.display_name | npj Computational Materials |
| best_oa_location.source.host_organization | https://openalex.org/P4310319908 |
| best_oa_location.source.host_organization_name | Nature Portfolio |
| best_oa_location.source.host_organization_lineage | https://openalex.org/P4310319908, https://openalex.org/P4310319965 |
| best_oa_location.source.host_organization_lineage_names | Nature Portfolio, Springer Nature |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://www.nature.com/articles/s41524-023-01171-9.pdf |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | npj Computational Materials |
| best_oa_location.landing_page_url | https://doi.org/10.1038/s41524-023-01171-9 |
| primary_location.id | doi:10.1038/s41524-023-01171-9 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4210232664 |
| primary_location.source.issn | 2057-3960 |
| primary_location.source.type | journal |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | 2057-3960 |
| primary_location.source.is_core | True |
| primary_location.source.is_in_doaj | True |
| primary_location.source.display_name | npj Computational Materials |
| primary_location.source.host_organization | https://openalex.org/P4310319908 |
| primary_location.source.host_organization_name | Nature Portfolio |
| primary_location.source.host_organization_lineage | https://openalex.org/P4310319908, https://openalex.org/P4310319965 |
| primary_location.source.host_organization_lineage_names | Nature Portfolio, Springer Nature |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://www.nature.com/articles/s41524-023-01171-9.pdf |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | npj Computational Materials |
| primary_location.landing_page_url | https://doi.org/10.1038/s41524-023-01171-9 |
| publication_date | 2023-12-13 |
| publication_year | 2023 |
| referenced_works | https://openalex.org/W3203912530, https://openalex.org/W2117363206, https://openalex.org/W3013990447, https://openalex.org/W1992985800, https://openalex.org/W2278970271, https://openalex.org/W2606478829, https://openalex.org/W1812560530, https://openalex.org/W2963606102, https://openalex.org/W2885789139, https://openalex.org/W2164524421, https://openalex.org/W2509907061, https://openalex.org/W3012190599, https://openalex.org/W3129696436, https://openalex.org/W2975270375, https://openalex.org/W2319902168, https://openalex.org/W2527045065, https://openalex.org/W2949096148, https://openalex.org/W4280512145, https://openalex.org/W2790960441, https://openalex.org/W3118334075, https://openalex.org/W2523785361, https://openalex.org/W2144211451, https://openalex.org/W2250539671, https://openalex.org/W2953641512, https://openalex.org/W2971258845, https://openalex.org/W2975059944, https://openalex.org/W2970771982, https://openalex.org/W2911489562, https://openalex.org/W3201869313, https://openalex.org/W2766362701, https://openalex.org/W2755202310, https://openalex.org/W3207468551, https://openalex.org/W4229443452, https://openalex.org/W4362640271, https://openalex.org/W4283024074, https://openalex.org/W2808304511, https://openalex.org/W4225409008, https://openalex.org/W1967256595, https://openalex.org/W6613168069, https://openalex.org/W2191064099, https://openalex.org/W2897598572, https://openalex.org/W2094087345, https://openalex.org/W2416811952, https://openalex.org/W2043756755, https://openalex.org/W2021994802, https://openalex.org/W2329945485, https://openalex.org/W2058122340, https://openalex.org/W1995608178, https://openalex.org/W1960802504, https://openalex.org/W2464725281, https://openalex.org/W3023937119, https://openalex.org/W6629967362, https://openalex.org/W3100710928, https://openalex.org/W4391836235, https://openalex.org/W3100220443, https://openalex.org/W3083787461, https://openalex.org/W3099714901, https://openalex.org/W3121984752 |
| referenced_works_count | 58 |
| abstract_inverted_index.a | 11, 22, 84, 102, 125, 143 |
| abstract_inverted_index.In | 1, 160 |
| abstract_inverted_index.We | 94 |
| abstract_inverted_index.be | 81 |
| abstract_inverted_index.by | 99 |
| abstract_inverted_index.in | 10, 37, 48, 130, 142, 174, 185 |
| abstract_inverted_index.it | 79 |
| abstract_inverted_index.of | 14, 28, 41, 59, 74, 137 |
| abstract_inverted_index.on | 25, 71 |
| abstract_inverted_index.to | 52, 83, 132, 149, 153 |
| abstract_inverted_index.we | 20, 146, 162 |
| abstract_inverted_index.and | 92, 107, 120, 157, 176 |
| abstract_inverted_index.are | 113 |
| abstract_inverted_index.can | 80, 171 |
| abstract_inverted_index.for | 31, 55, 104, 109, 182 |
| abstract_inverted_index.new | 85 |
| abstract_inverted_index.not | 69 |
| abstract_inverted_index.one | 108 |
| abstract_inverted_index.our | 66, 96 |
| abstract_inverted_index.the | 26, 38, 56, 72, 134, 138, 183 |
| abstract_inverted_index.BERT | 29 |
| abstract_inverted_index.Here | 19 |
| abstract_inverted_index.band | 110, 158 |
| abstract_inverted_index.data | 141 |
| abstract_inverted_index.does | 68 |
| abstract_inverted_index.fast | 173 |
| abstract_inverted_index.find | 163 |
| abstract_inverted_index.from | 44, 63 |
| abstract_inverted_index.grow | 172 |
| abstract_inverted_index.have | 6 |
| abstract_inverted_index.more | 166 |
| abstract_inverted_index.rely | 70 |
| abstract_inverted_index.such | 178 |
| abstract_inverted_index.task | 86 |
| abstract_inverted_index.test | 95 |
| abstract_inverted_index.that | 177 |
| abstract_inverted_index.them | 148 |
| abstract_inverted_index.then | 114 |
| abstract_inverted_index.wide | 12 |
| abstract_inverted_index.with | 116, 121, 124 |
| abstract_inverted_index.Curie | 105, 155 |
| abstract_inverted_index.These | 112 |
| abstract_inverted_index.based | 24 |
| abstract_inverted_index.gaps. | 111, 159 |
| abstract_inverted_index.order | 131 |
| abstract_inverted_index.range | 13 |
| abstract_inverted_index.that, | 164 |
| abstract_inverted_index.those | 122 |
| abstract_inverted_index.which | 35 |
| abstract_inverted_index.Hence, | 78 |
| abstract_inverted_index.employ | 147 |
| abstract_inverted_index.models | 30, 152 |
| abstract_inverted_index.noisy, | 167 |
| abstract_inverted_index.recent | 2 |
| abstract_inverted_index.rules. | 77 |
| abstract_inverted_index.tasks, | 34 |
| abstract_inverted_index.tasks. | 18, 187 |
| abstract_inverted_index.times, | 3 |
| abstract_inverted_index.volume | 175, 179 |
| abstract_inverted_index.adapted | 82 |
| abstract_inverted_index.curated | 118 |
| abstract_inverted_index.efforts | 91 |
| abstract_inverted_index.grammar | 76 |
| abstract_inverted_index.method. | 128 |
| abstract_inverted_index.methods | 54 |
| abstract_inverted_index.natural | 15, 46 |
| abstract_inverted_index.predict | 154 |
| abstract_inverted_index.present | 21 |
| abstract_inverted_index.results | 36 |
| abstract_inverted_index.similar | 64 |
| abstract_inverted_index.utility | 136 |
| abstract_inverted_index.without | 87 |
| abstract_inverted_index.Abstract | 0 |
| abstract_inverted_index.Contrary | 51 |
| abstract_inverted_index.achieved | 7 |
| abstract_inverted_index.although | 165 |
| abstract_inverted_index.compared | 115 |
| abstract_inverted_index.database | 103 |
| abstract_inverted_index.datasets | 119, 170 |
| abstract_inverted_index.existing | 53 |
| abstract_inverted_index.general, | 161 |
| abstract_inverted_index.language | 16, 47 |
| abstract_inverted_index.manually | 117 |
| abstract_inverted_index.networks | 5 |
| abstract_inverted_index.obtained | 123 |
| abstract_inverted_index.showcase | 133 |
| abstract_inverted_index.sources, | 65 |
| abstract_inverted_index.workflow | 23, 67, 98 |
| abstract_inverted_index.automated | 39, 57 |
| abstract_inverted_index.construct | 150 |
| abstract_inverted_index.different | 32 |
| abstract_inverted_index.extensive | 89 |
| abstract_inverted_index.extracted | 140, 169 |
| abstract_inverted_index.intricate | 75 |
| abstract_inverted_index.partially | 180 |
| abstract_inverted_index.practical | 135 |
| abstract_inverted_index.relations | 62 |
| abstract_inverted_index.requiring | 88 |
| abstract_inverted_index.workflow, | 145 |
| abstract_inverted_index.definition | 73 |
| abstract_inverted_index.downstream | 33, 186 |
| abstract_inverted_index.extraction | 40, 58 |
| abstract_inverted_index.generating | 101 |
| abstract_inverted_index.inaccuracy | 184 |
| abstract_inverted_index.knowledge. | 93 |
| abstract_inverted_index.processing | 17 |
| abstract_inverted_index.rule-based | 127 |
| abstract_inverted_index.scientific | 49 |
| abstract_inverted_index.structured | 42, 60 |
| abstract_inverted_index.compensates | 181 |
| abstract_inverted_index.fine-tuning | 27 |
| abstract_inverted_index.information | 43 |
| abstract_inverted_index.literature. | 50 |
| abstract_inverted_index.performance | 9 |
| abstract_inverted_index.transformer | 4 |
| abstract_inverted_index.Furthermore, | 129 |
| abstract_inverted_index.temperatures | 106, 156 |
| abstract_inverted_index.unstructured | 45 |
| abstract_inverted_index.automatically | 100, 139, 168 |
| abstract_inverted_index.implementation | 90 |
| abstract_inverted_index.data-extraction | 97 |
| abstract_inverted_index.material-design | 144 |
| abstract_inverted_index.machine-learning | 151 |
| abstract_inverted_index.state-of-the-art | 8, 126 |
| abstract_inverted_index.compound-property | 61 |
| cited_by_percentile_year.max | 100 |
| cited_by_percentile_year.min | 99 |
| countries_distinct_count | 2 |
| institutions_distinct_count | 4 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.8399999737739563 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile.value | 0.92302125 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | True |