Miles Cranmer
YOU?
Author Swipe
View article: Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model
Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model Open
Recent advances in mechanistic interpretability have revealed that large language models (LLMs) develop internal representations corresponding not only to concrete entities but also distinct, human-understandable abstract concepts and beha…
View article: The Denario project: Deep knowledge AI agents for scientific discovery
The Denario project: Deep knowledge AI agents for scientific discovery Open
We present Denario, an AI multi-agent system designed to serve as a scientific research assistant. Denario can perform many different tasks, such as generating ideas, checking the literature, developing research plans, writing and executin…
View article: Universal Spectral Tokenization via Self-Supervised Panchromatic Representation Learning
Universal Spectral Tokenization via Self-Supervised Panchromatic Representation Learning Open
Sequential scientific data span many resolutions and domains, and unifying them into a common representation is a key step toward developing foundation models for the sciences. Astronomical spectra exemplify this challenge: massive surveys…
View article: AION-1: Omnimodal Foundation Model for Astronomical Sciences
AION-1: Omnimodal Foundation Model for Astronomical Sciences Open
While foundation models have shown promise across a variety of fields, astronomy still lacks a unified framework for joint modeling across its highly diverse data modalities. In this paper, we present AION-1, a family of large-scale multim…
View article: Expressions found by PySR software from dataset of equations of state for neutron star matter based on relativistic mean field model with a non-linear mesonic interaction
Expressions found by PySR software from dataset of equations of state for neutron star matter based on relativistic mean field model with a non-linear mesonic interaction Open
This dataset was generated in the context of a Masters Dissertation (https://doi.org/10.5281/zenodo.17158902). Finding an Equation of State (EOS) that can provide an accurate description of nuclear matter both at the atomic scale and at th…
View article: Call for Action: towards the next generation of symbolic regression benchmark
Call for Action: towards the next generation of symbolic regression benchmark Open
Symbolic Regression (SR) is a powerful technique for discovering interpretable mathematical expressions. However, benchmarking SR methods remains challenging due to the diversity of algorithms, datasets, and evaluation criteria. In this wo…
View article: SymbolFit: Automatic Parametric Modeling with Symbolic Regression
SymbolFit: Automatic Parametric Modeling with Symbolic Regression Open
We introduce SymbolFit (API: https://github.com/hftsoi/symbolfit ), a framework that automates parametric modeling by using symbolic regression to perform a machine-search for functions that fit the data while simultaneously providing unce…
View article: Comparative Biosignatures
Comparative Biosignatures Open
The discovery of inhabited exoplanets hinges on identifying biosignature gases. JWST is revealing potential biosignatures in exoplanet atmospheres, though their presence is yet to provide strong evidence for life. The central challenge is …
View article: The Multimodal Universe: 100 TB of Machine Learning Ready Astronomical Data
The Multimodal Universe: 100 TB of Machine Learning Ready Astronomical Data Open
We present the Multimodal Universe , a new framework collating over 100 TB of multimodal astronomical data for its first release, spanning images, spectra, time series, tabular and hyper-spectral data. This unified collection enables a wid…
View article: The ones that got away: chemical tagging of globular cluster-origin stars with <i>Gaia</i> BP/RP spectra
The ones that got away: chemical tagging of globular cluster-origin stars with <i>Gaia</i> BP/RP spectra Open
Globular clusters (GCs) are sites of extremely efficient star formation, and recent studies suggest they significantly contributed to the early Milky Way’s stellar mass build-up. Although their role has since diminished, GCs’ impact on the…
View article: The Multimodal Universe: Enabling Large-Scale Machine Learning with 100TB of Astronomical Scientific Data
The Multimodal Universe: Enabling Large-Scale Machine Learning with 100TB of Astronomical Scientific Data Open
We present the MULTIMODAL UNIVERSE, a large-scale multimodal dataset of scientific astronomical data, compiled specifically to facilitate machine learning research. Overall, the MULTIMODAL UNIVERSE contains hundreds of millions of astronom…
View article: The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning
The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning Open
Machine learning based surrogate models offer researchers powerful tools for accelerating simulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evalua…
View article: Multi-Agent System for Cosmological Parameter Analysis
Multi-Agent System for Cosmological Parameter Analysis Open
Multi-agent systems (MAS) utilizing multiple Large Language Model agents with Retrieval Augmented Generation and that can execute code locally may become beneficial in cosmological data analysis. Here, we illustrate a first small step towa…
View article: SymbolFit: Automatic Parametric Modeling with Symbolic Regression
SymbolFit: Automatic Parametric Modeling with Symbolic Regression Open
We introduce SymbolFit, a framework that automates parametric modeling by using symbolic regression to perform a machine-search for functions that fit the data while simultaneously providing uncertainty estimates in a single run. Tradition…
View article: Accelerating Giant-impact Simulations with Machine Learning
Accelerating Giant-impact Simulations with Machine Learning Open
Constraining planet-formation models based on the observed exoplanet population requires generating large samples of synthetic planetary systems, which can be computationally prohibitive. A significant bottleneck is simulating the giant-im…
View article: Symbolic Regression with a Learned Concept Library
Symbolic Regression with a Learned Concept Library Open
We present a novel method for symbolic regression (SR), the task of searching for compact programmatic hypotheses that best explain a dataset. The problem is commonly solved using genetic algorithms; we show that we can enhance such method…
View article: The ones that got away: chemical tagging of globular cluster-origin stars with Gaia BP/RP spectra
The ones that got away: chemical tagging of globular cluster-origin stars with Gaia BP/RP spectra Open
Globular clusters (GCs) are sites of extremely efficient star formation, and recent studies suggest they significantly contributed to the early Milky Way's stellar mass build-up. Although their role has since diminished, GCs' impact on the…
View article: Machine Learning with Physics Knowledge for Prediction: A Survey
Machine Learning with Physics Knowledge for Prediction: A Survey Open
This survey examines the broad suite of methods and models for combining machine learning with physics knowledge for prediction and forecast, with a focus on partial differential equations. These methods have attracted significant interest…
View article: Accelerating Giant Impact Simulations with Machine Learning
Accelerating Giant Impact Simulations with Machine Learning Open
Constraining planet formation models based on the observed exoplanet population requires generating large samples of synthetic planetary systems, which can be computationally prohibitive. A significant bottleneck is simulating the giant im…
View article: SRBench++: Principled Benchmarking of Symbolic Regression With Domain-Expert Interpretation
SRBench++: Principled Benchmarking of Symbolic Regression With Domain-Expert Interpretation Open
Symbolic regression searches for analytic expressions that accurately describe studied phenomena. The main promise of this approach is that it may return an interpretable model that can be insightful to users, while maintaining high accura…
View article: AstroCLIP: a cross-modal foundation model for galaxies
AstroCLIP: a cross-modal foundation model for galaxies Open
We present AstroCLIP, a single, versatile model that can embed both galaxy images and spectra into a shared, physically meaningful latent space. These embeddings can then be used – without any model fine-tuning – for a variety of downstrea…
View article: Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task
Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task Open
Transformers have revolutionized machine learning across diverse domains, yet understanding their behavior remains crucial, particularly in high-stakes applications. This paper introduces the contextual counting task, a novel toy problem a…
View article: Symbolic Regression on FPGAs for Fast Machine Learning Inference
Symbolic Regression on FPGAs for Fast Machine Learning Inference Open
The high-energy physics community is investigating the potential of deploying machine-learning-based solutions on Field-Programmable Gate Arrays (FPGAs) to enhance physics sensitivity while still meeting data processing time constraints. I…
View article: Bifrost: A Python/C++ Framework for High-Throughput Stream Processing in Astronomy
Bifrost: A Python/C++ Framework for High-Throughput Stream Processing in Astronomy Open
Radio astronomy observatories with high throughput back end instruments require real-time data processing. While computing hardware continues to advance rapidly, development of real-time processing pipelines remains difficult and time-cons…
View article: xVal: A Continuous Numerical Tokenization for Scientific Language Models
xVal: A Continuous Numerical Tokenization for Scientific Language Models Open
Due in part to their discontinuous and discrete default encodings for numbers, Large Language Models (LLMs) have not yet been commonly used to process numerically-dense scientific datasets. Rendering datasets as text, however, could help a…
View article: Multiple Physics Pretraining for Physical Surrogate Models
Multiple Physics Pretraining for Physical Surrogate Models Open
We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. In MPP, rather than training one model on a specific physica…
View article: AstroCLIP: A Cross-Modal Foundation Model for Galaxies
AstroCLIP: A Cross-Modal Foundation Model for Galaxies Open
We present AstroCLIP, a single, versatile model that can embed both galaxy images and spectra into a shared, physically meaningful latent space. These embeddings can then be used - without any model fine-tuning - for a variety of downstrea…
View article: Workshop Summary: Exoplanet Orbits and Dynamics
Workshop Summary: Exoplanet Orbits and Dynamics Open
Exoplanetary systems show a wide variety of architectures, which can be explained by different formation and dynamical evolution processes. Precise orbital monitoring is mandatory to accurately constrain their orbital and dynamical paramet…
View article: Reusability report: Prostate cancer stratification with diverse biologically-informed neural architectures
Reusability report: Prostate cancer stratification with diverse biologically-informed neural architectures Open
In Elmarakeby et al., "Biologically informed deep neural network for prostate cancer discovery", a feedforward neural network with biologically informed, sparse connections (P-NET) was presented to model the state of prostate cancer. We ve…
View article: Rediscovering orbital mechanics with machine learning
Rediscovering orbital mechanics with machine learning Open
We present an approach for using machine learning to automatically discover the governing equations and unknown properties (in this case, masses) of real physical systems from observations. We train a ‘graph neural network’ to simulate the…