Dorian Bagni
YOU?
Author Swipe
View article: SynLlama: Generating Synthesizable Molecules and Their Analogs with Large Language Models
SynLlama: Generating Synthesizable Molecules and Their Analogs with Large Language Models Open
Generative machine learning models for exploring chemical space have shown immense promise, but many molecules they generate are too difficult to synthesize, making them impractical for further investigation or development. In this work, w…
View article: A workflow to create a high-quality protein–ligand binding dataset for training, validation, and prediction tasks
A workflow to create a high-quality protein–ligand binding dataset for training, validation, and prediction tasks Open
HiQBind-WF is an open-source, semi-automated workflow that corrects common structural artifacts found in PDB. We use it to create HiQBind, a high-quality non-covalent protein–ligand dataset with reliable binding data from existing database…
View article: A Workflow to Create a High-Quality Protein-Ligand Binding Dataset for Training, Validation, and Prediction Tasks
A Workflow to Create a High-Quality Protein-Ligand Binding Dataset for Training, Validation, and Prediction Tasks Open
Development of scoring functions (SFs) used to predict protein-ligand binding energies requires high-quality 3D structures and binding assay data for training and testing their parameters. In this work, we show that one of the widely-used …
View article: SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration
SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration Open
Here we show that a general-purpose large language model (LLM) chatbot, Llama-3.1-8B-Instruct, can be transformed via supervised fine-tuning of engineered prompts into a chemical language model (CLM), SmileyLlama, for molecule generation. …
View article: Mining for Potent Inhibitors through Artificial Intelligence and Physics: A Unified Methodology for Ligand Based and Structure Based Drug Design
Mining for Potent Inhibitors through Artificial Intelligence and Physics: A Unified Methodology for Ligand Based and Structure Based Drug Design Open
Determining the viability of a new drug molecule is a time- and resource-intensive task that makes computer-aided assessments a vital approach to rapid drug discovery. Here we develop a machine learning algorithm, iMiner, that generates no…
View article: Leak Proof PDBBind: A Reorganized Dataset of Protein-Ligand Complexes for More Generalizable Binding Affinity Prediction
Leak Proof PDBBind: A Reorganized Dataset of Protein-Ligand Complexes for More Generalizable Binding Affinity Prediction Open
Many physics-based and machine-learned scoring functions (SFs) used to predict protein-ligand binding free energies have been trained on the PDBBind dataset. However, it is controversial as to whether new SFs are actually improving since t…