Jacqueline M. Cole
YOU?
Author Swipe
View article: Automated Determination of the Molecular Substructure from Nuclear Magnetic Resonance Spectra Using Neural Networks
Automated Determination of the Molecular Substructure from Nuclear Magnetic Resonance Spectra Using Neural Networks Open
Nuclear magnetic resonance (NMR) spectroscopy is an indispensable tool for determining the structural characteristics of a molecule by analyzing its chemical shifts. A wealth of NMR spectra therefore exists and continues to amass on a dail…
View article: Autogenerating a Domain-Specific Question-Answering Data Set from a Thermoelectric Materials Database to Enable High-Performing BERT Models
Autogenerating a Domain-Specific Question-Answering Data Set from a Thermoelectric Materials Database to Enable High-Performing BERT Models Open
We present a method for autogenerating a large domain-specific question-answering (QA) dataset from a thermoelectric materials database. We show that a small language model, BERT, once fine-tuned on this automatically generated dataset of …
View article: Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications
Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications Open
Pretrained language models have demonstrated strong capability and versatility in natural language processing (NLP) tasks, and they have important applications in optoelectronics research, such as data mining and topic modeling. Many langu…
View article: Ternary molecular switching in a single-crystal optical actuator with correlated crystal strain
Ternary molecular switching in a single-crystal optical actuator with correlated crystal strain Open
View article: MechBERT: Language Models for Extracting Chemical and Property Relationships about Mechanical Stress and Strain
MechBERT: Language Models for Extracting Chemical and Property Relationships about Mechanical Stress and Strain Open
Language models are transforming materials-aware natural-language processing by enabling the extraction of dynamic, context-rich information from unstructured text, thus, moving beyond the limitations of traditional information-extraction …
View article: Auto-generating question-answering datasets with domain-specific knowledge for language models in scientific tasks
Auto-generating question-answering datasets with domain-specific knowledge for language models in scientific tasks Open
Algorithms use existing high-quality materials databases to produce a large question-answering dataset whose domain knowledge is sufficient to fine-tune a small language model with high performance.
View article: Automatic Prediction of Molecular Properties Using Substructure Vector Embeddings within a Feature Selection Workflow
Automatic Prediction of Molecular Properties Using Substructure Vector Embeddings within a Feature Selection Workflow Open
Machine learning (ML) methods provide a pathway to accurately predict molecular properties, leveraging patterns derived from structure-property relationships within materials databases. This approach holds significant importance in drug di…
View article: A Database of Stress-Strain Properties Auto-generated from the Scientific Literature using ChemDataExtractor
A Database of Stress-Strain Properties Auto-generated from the Scientific Literature using ChemDataExtractor Open
There has been an ongoing need for information-rich databases in the mechanical-engineering domain to aid in data-driven materials science. To address the lack of suitable property databases, this study employs the latest version of the ch…
View article: Predictive Modeling of High-Entropy Alloys and Amorphous Metallic Alloys Using Machine Learning
Predictive Modeling of High-Entropy Alloys and Amorphous Metallic Alloys Using Machine Learning Open
High entropy alloys and amorphous metallic alloys represent two distinct classes of advanced alloy materials, each with unique structural characteristics. Their emergence has garnered considerable interest across the materials science and …
View article: Machine-Learning Predictions of Critical Temperatures from Chemical Compositions of Superconductors
Machine-Learning Predictions of Critical Temperatures from Chemical Compositions of Superconductors Open
In the quest for advanced superconducting materials, the accurate prediction of critical temperatures (Tc) poses a formidable challenge, largely due to the complex interdependencies between superconducting properties and the chemical and s…
View article: Machine-Learning Prediction of Curie Temperature from Chemical Compositions of Ferromagnetic Materials
Machine-Learning Prediction of Curie Temperature from Chemical Compositions of Ferromagnetic Materials Open
Room-temperature ferromagnets are high-value targets for discovery given the ease by which they could be embedded within magnetic devices. However, the multitude of potential interactions among magnetic ions and their surrounding environme…
View article: How Beneficial Is Pretraining on a Narrow Domain-Specific Corpus for Information Extraction about Photocatalytic Water Splitting?
How Beneficial Is Pretraining on a Narrow Domain-Specific Corpus for Information Extraction about Photocatalytic Water Splitting? Open
Language models trained on domain-specific corpora have been employed to increase the performance in specialized tasks. However, little previous work has been reported on how specific a "domain-specific" corpus should be. Here, we test a n…
View article: Automatic Prediction of Peak Optical Absorption Wavelengths in Molecules Using Convolutional Neural Networks
Automatic Prediction of Peak Optical Absorption Wavelengths in Molecules Using Convolutional Neural Networks Open
Molecular design depends heavily on optical properties for applications such as solar cells and polymer-based batteries. Accurate prediction of these properties is essential, and multiple predictive methods exist, from ab initio to …
View article: Automatic Prediction of Band Gaps of Inorganic Materials Using a Gradient Boosted and Statistical Feature Selection Workflow
Automatic Prediction of Band Gaps of Inorganic Materials Using a Gradient Boosted and Statistical Feature Selection Workflow Open
Machine learning (ML) methods can train a model to predict material properties by exploiting patterns in materials databases that arise from structure-property relationships. However, the importance of ML-based feature analysis and selecti…
View article: Digitizing images of electrical-circuit schematics
Digitizing images of electrical-circuit schematics Open
Electrical-circuit schematics are a foundational tool in electrical engineering. A method that can automatically digitalize them is desirable since a knowledge base of such schematics could preserve their functional information as well as …
View article: A database of thermally activated delayed fluorescent molecules auto-generated from scientific literature with ChemDataExtractor
A database of thermally activated delayed fluorescent molecules auto-generated from scientific literature with ChemDataExtractor Open
A database of thermally activated delayed fluorescent (TADF) molecules was automatically generated from the scientific literature. It consists of 25,482 data records with an overall precision of 82%. Among these, 5,349 records have chemica…
View article: Multi-task scattering-model classification and parameter regression of nanostructures from small-angle scattering data
Multi-task scattering-model classification and parameter regression of nanostructures from small-angle scattering data Open
Machine learning (ML) can be employed at the data-analysis stage of small-angle scattering (SAS) experiments.
View article: Gradient boosted and statistical feature selection workflow for materials property predictions
Gradient boosted and statistical feature selection workflow for materials property predictions Open
With the emergence of big data initiatives and the wealth of available chemical data, data-driven approaches are becoming a vital component of materials discovery pipelines or workflows. The screening of materials using machine-learning mo…
View article: Snowball 2.0: Generic Material Data Parser for ChemDataExtractor
Snowball 2.0: Generic Material Data Parser for ChemDataExtractor Open
The ever-growing amount of chemical data found in the scientific literature has led to the emergence of data-driven materials discovery. The first step in the pipeline, to automatically extract chemical information from plain text, has bee…
View article: Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications
Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications Open
We present an automatically generated dataset of 15,755 records that were extracted from 47,357 papers. These records contain water-splitting activity in the presence of certain photocatalysts, along with additional information about the c…
View article: ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes
ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes Open
Knowledge in the chemical domain is often disseminated graphically via chemical reaction schemes. The task of describing chemical transformations is greatly simplified by introducing reaction schemes that are composed of chemical diagrams …
View article: OpticalBERT and OpticalTable-SQA: Text- and Table-Based Language Models for the Optical-Materials Domain
OpticalBERT and OpticalTable-SQA: Text- and Table-Based Language Models for the Optical-Materials Domain Open
Text mining in the optical-materials domain is becoming increasingly important as the number of scientific publications in this area grows rapidly. Language models such as Bidirectional Encoder Representations from Transformers (BERT) have…
View article: An Auto-generated Photocatalysis Database for Water-Splitting Applications by Exploiting Inter- and Intra-Sentence Relations
An Auto-generated Photocatalysis Database for Water-Splitting Applications by Exploiting Inter- and Intra-Sentence Relations Open
(1) Photocatalysis databases.(2) Photocatalysis extraction models.(3) Photocatalysis extraction scripts.(4) Code for ChemDataExtractor v2.2 that was developed to auto-generate this database (static version).(5) A supporting CDEDatabase sof…
View article: Automatic materials characterization from infrared spectra using convolutional neural networks
Automatic materials characterization from infrared spectra using convolutional neural networks Open
Infrared spectroscopy is a technique used to characterize unknown materials by identifying the constituent functional groups of molecules through the analysis of obtained spectra. This analysis has now been automated using artificial intel…
View article: ChemDataWriter: a transformer-based toolkit for auto-generating books that summarise research
ChemDataWriter: a transformer-based toolkit for auto-generating books that summarise research Open
ChemDataWriter automatically generates literature reviews via artificial intelligence that suggests potential book content, by retrieving and re-ranking relevant papers that the user has provided as input, and summarising and paraphrasing …
View article: A Database of Thermally Activated Delayed Fluorescent Molecules Auto-generated from Scientific Literature with ChemDataExtractor
A Database of Thermally Activated Delayed Fluorescent Molecules Auto-generated from Scientific Literature with ChemDataExtractor Open
Master and subsidiary databases that contain data records of four TADF-relevant properties (figshare_std_value_exp_flag.zip)BERT-based language model for the associated paper (exp_theory_stsplit_classifer.pth)Bespoke TADF-specific version …
View article: In‐Silico Device Performance Prediction of Cosensitizer Dye Pairs for Dye‐Sensitized Solar Cells
In‐Silico Device Performance Prediction of Cosensitizer Dye Pairs for Dye‐Sensitized Solar Cells Open
Endeavors in the field of dye‐sensitized solar cells (DSCs) have shown great promise when adopting a data‐driven approach to materials discovery, such as successful molecular‐scale predictions of light‐harvesting chromophores. However, pre…
View article: Analyzing Structure–Activity Variations for Mn–Carbonyl Complexes in the Reduction of CO<sub>2</sub> to CO
Analyzing Structure–Activity Variations for Mn–Carbonyl Complexes in the Reduction of CO<sub>2</sub> to CO Open
Contemporary electrocatalysts for the reduction of CO2 often suffer from low stability, activity, and selectivity, or a combination thereof. Mn-carbonyl complexes represent a promising class of molecular electrocatalysts for the reduction …
View article: A thermoelectric materials database auto-generated from the scientific literature using ChemDataExtractor
A thermoelectric materials database auto-generated from the scientific literature using ChemDataExtractor Open
View article: CCDC 2086989: Experimental Crystal Structure Determination
CCDC 2086989: Experimental Crystal Structure Determination Open
An entry from the Cambridge Structural Database, the world’s repository for small molecule crystal structures. The entry contains experimental data from a crystal diffraction study. The deposited dataset for this entry is freely available …