Daniel Probst
YOU?
Author Swipe
View article: Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings
Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings Open
Generating diverse, all-atom conformational ensembles of dynamic proteins such as G-protein-coupled receptors (GPCRs) is critical for understanding their function, yet most generative models simplify atomic detail or ignore conformational …
View article: Implicit Neural Representations of Molecular Vector-Valued Functions
Implicit Neural Representations of Molecular Vector-Valued Functions Open
Molecules have various computational representations, including numerical descriptors, strings, graphs, point clouds, and surfaces. Each representation method enables the application of various machine learning methodologies from linear re…
View article: Boosting Protein Graph Representations through Static-Dynamic Fusion
Boosting Protein Graph Representations through Static-Dynamic Fusion Open
Machine learning for protein modeling faces significant challenges due to proteins’ inherently dynamic nature, yet most graph-based machine learning methods rely solely on static structural information. Recently, the growing availability o…
View article: Commit: Reaction classification and yield prediction using the differential reaction fingerprint DRFP
Commit: Reaction classification and yield prediction using the differential reaction fingerprint DRFP Open
In “Reaction classification and yield prediction using the differential reaction fingerprint DRFP”, we introduce a reaction fingerprint, which sets the state-of-the-art in predicting the yield of reactions sourced from electronic lab noteb…
View article: Learning on compressed molecular representations
Learning on compressed molecular representations Open
It was proposed that a k -nearest neighbour classifier is able to outperform large-language models using compressed text as input and normalised compression distance as a metric. We successfully applied this method to cheminformatics tasks.
View article: Molecular set representation learning
Molecular set representation learning Open
Computational representation of molecules can take many forms, including graphs, string encodings of graphs, binary vectors or learned embeddings in the form of real-valued vectors. These representations are then used in downstream classif…
View article: Language models can identify enzymatic binding sites in protein sequences
Language models can identify enzymatic binding sites in protein sequences Open
Recent advances in language modeling have had a tremendous impact on how we handle sequential data in science. Language architectures have emerged as a hotbed of innovation and creativity in natural language processing over the last decade…
View article: Molecular set representation learning
Molecular set representation learning Open
Computational representation of molecules can take many forms, including graphs, string-encodings of graphs, binary vectors, or learned embeddings in the form of real-valued vectors. These representations are then used in downstream classi…
View article: Learning on Compressed Molecular Representations
Learning on Compressed Molecular Representations Open
Last year, a preprint gained notoriety, proposing that a k-nearest neighbour classifier is able to outperform large-language models using compressed text as input and normalised compression distance (NCD) as a metric. In chemistry and bioc…
View article: Molecular Set Representation Learning
Molecular Set Representation Learning Open
Computational representation of molecules can take many forms, including graphs, string-encodings of graphs, binary vectors, or learned embeddings in the form of real-valued vectors. These representations are then used in downstream classi…
View article: Molecular set representation learning
Molecular set representation learning Open
Computational representation of molecules can take many forms, including graphs, string-encodings of graphs, binary vectors, or learned embeddings in the form of real-valued vectors. These representations are then used in downstream classi…
View article: Parameter-Free Molecular Classification and Regression with Gzip
Parameter-Free Molecular Classification and Regression with Gzip Open
In recent years, natural language processing approaches to machine learning, most prominently deep neural network-based transformers, have been extensively applied to molecular classification and regression tasks, including the prediction …
View article: Data for "An explainability framework for deep learning on chemical reactions exemplified by enzyme-catalysed reaction classification"
Data for "An explainability framework for deep learning on chemical reactions exemplified by enzyme-catalysed reaction classification" Open
The data used to train EC class prediction models in "An explainability framework for deep learning on chemical reactions exemplified by enzyme-catalysed reaction classification". The original sources of the data are: - https://github.com/…
View article: EnzymeMap: Curation, validation and data-driven prediction of enzymatic reactions
EnzymeMap: Curation, validation and data-driven prediction of enzymatic reactions Open
Enzymatic reactions are an ecofriendly, selective and versatile addition, sometimes even alternative to organic reactions for the synthesis of chemical compounds such as pharmaceuticals or fine chemicals. To identify suitable reactions, co…
View article: Fuelling the Digital Chemistry Revolution with Language Models
Fuelling the Digital Chemistry Revolution with Language Models Open
The RXN for Chemistry project, initiated by IBM Research Europe – Zurich in 2017, aimed to develop a series of digital assets using machine learning techniques to promote the use of data-driven methodologies in synthetic organic chemistry.…
View article: Parameter-Free Molecular Classification and Regression with Gzip
Parameter-Free Molecular Classification and Regression with Gzip Open
In recent years, NLP approaches to machine learning, most prominently deep neural network-based transformers, have been applied to molecular classification and regression tasks for molecular categorisation and property prediction. However,…
View article: EnzymeMap: Curation, validation and data-driven prediction of enzymatic reactions
EnzymeMap: Curation, validation and data-driven prediction of enzymatic reactions Open
Enzymatic reactions are an ecofriendly, selective and versatile addition, sometimes even alternative to organic reactions for the synthesis of chemical compounds such as pharmaceuticals or fine chemicals. To identify suitable reactions, co…
View article: EnzymeMap
EnzymeMap Open
EnzymeMap (enzymemap_v2_brenda2023.csv) is a large dataset of atom mapped, balanced enzymatic reactions sorted by EC (Enzyme Commission) number. It is intended to be used for machine learning models for predicting enzymatic reactions or bi…
View article: EnzymeMap
EnzymeMap Open
EnzymeMap (enzymemap_v2_brenda2023.csv) is a large dataset of atom mapped, balanced enzymatic reactions sorted by EC (Enzyme Commission) number. It is intended to be used for machine learning models for predicting enzymatic reactions or bi…
View article: EnzymeMap
EnzymeMap Open
EnzymeMap (enzymemap_brenda2023.csv) is a large dataset of atom mapped, balanced enzymatic reactions sorted by EC (Enzyme Commission) number. It is intended to be used for machine learning models for predicting enzymatic reactions or biore…
View article: Alchemical Analysis of FDA Approved Drugs
Alchemical Analysis of FDA Approved Drugs Open
Chemical space maps help visualize similarities within molecular sets. However, there are many different molecular similarity measures resulting in a confusing number of possible comparisons. To overcome this limitation, we exploit the fac…
View article: Explainable prediction of catalysing enzymes from reactions using multilayer perceptrons
Explainable prediction of catalysing enzymes from reactions using multilayer perceptrons Open
Assigning or proposing a catalysing enzyme given a chemical or biochemical reaction is of great interest to life sciences and chemistry alike. The exploration and design of metabolic pathways or the challenge of finding more sustainable en…
View article: The Societal and Scientific Importance of Inclusivity, Diversity, and Equity in Machine Learning for Chemistry
The Societal and Scientific Importance of Inclusivity, Diversity, and Equity in Machine Learning for Chemistry Open
While the introduction of practical deep learning has driven progress across scientific fields, recent research highlighted that the requirement of deep learning for ever-increasing computational resources and data has potential negative i…
View article: Supporting Information for the Journal Article "Quantum Chemical Data Generation as Fill-In for Reliability Enhancement of Machine-Learning Reaction and Retrosynthesis Planning"
Supporting Information for the Journal Article "Quantum Chemical Data Generation as Fill-In for Reliability Enhancement of Machine-Learning Reaction and Retrosynthesis Planning" Open
This data set contains all data produced when exploring the Williamson ether synthesis starting from iodoethane and phenol. The set is structures as follows: analysis: Contains the script used to analyze the exploration and the output of …
View article: Supporting Information for the Journal Article "Quantum Chemical Data Generation as Fill-In for Reliability Enhancement of Machine-Learning Reaction and Retrosynthesis Planning"
Supporting Information for the Journal Article "Quantum Chemical Data Generation as Fill-In for Reliability Enhancement of Machine-Learning Reaction and Retrosynthesis Planning" Open
This data set contains all data produced when exploring the Williamson ether synthesis starting from iodoethane and phenol. The set is structures as follows: analysis: Contains the script used to analyze the exploration and the output of …
View article: Language models can identify enzymatic active sites in protein sequences
Language models can identify enzymatic active sites in protein sequences Open
Recent advances in language modeling have tremendously impacted how we handle sequential data in science. Language architectures have emerged as a hotbed of innovation and creativity in natural language processing over the last decade, and…
View article: Explainable prediction of catalysing enzymes from reactions using multilayer perceptrons
Explainable prediction of catalysing enzymes from reactions using multilayer perceptrons Open
Assigning or proposing a catalysing enzyme given a chemical or biochemical reaction is of great interest to life sciences and chemistry alike. The exploration and design of metabolic pathways or the challenge of finding more sustainable en…
View article: Dataset of biocatalyzed reactions
Dataset of biocatalyzed reactions Open
Data and results related to the paper Language models can identify enzymatic active sites in protein sequences.