Aaron Schein
YOU?
Author Swipe
View article: Broad Spectrum Structure Discovery in Large-Scale Higher-Order Networks
Broad Spectrum Structure Discovery in Large-Scale Higher-Order Networks Open
Complex systems are often driven by higher-order interactions among multiple units, naturally represented as hypergraphs. Understanding dependency structures within these hypergraphs is crucial for understanding and predicting the behavior…
View article: Linear Representations of Political Perspective Emerge in Large Language Models
Linear Representations of Political Perspective Emerge in Large Language Models Open
Large language models (LLMs) have demonstrated the ability to generate text that realistically reflects a range of different subjective human perspectives. This paper studies how LLMs are seemingly able to reflect more liberal versus more …
View article: Addressing discretization-induced bias in demographic prediction
Addressing discretization-induced bias in demographic prediction Open
Racial and other demographic imputation is necessary for many applications, especially in auditing disparities and outreach targeting in political campaigns. The canonical approach is to construct continuous predictions—e.g. based on name …
View article: Doubly Non-Central Beta Matrix Factorization for Stable Dimensionality Reduction of Bounded Support Matrix Data
Doubly Non-Central Beta Matrix Factorization for Stable Dimensionality Reduction of Bounded Support Matrix Data Open
We consider the problem of developing interpretable and computationally efficient matrix decomposition methods for matrices whose entries have bounded support. Such matrices are found in large-scale DNA methylation studies and many other s…
View article: Activation Scaling for Steering and Interpreting Language Models
Activation Scaling for Steering and Interpreting Language Models Open
Given the prompt "Rome is in", can we steer a language model to flip its prediction of an incorrect token "France" to a correct token "Italy" by only multiplying a few relevant activation vectors with scalars? We argue that successfully in…
View article: Addressing Discretization-Induced Bias in Demographic Prediction
Addressing Discretization-Induced Bias in Demographic Prediction Open
Racial and other demographic imputation is necessary for many applications, especially in auditing disparities and outreach targeting in political campaigns. The canonical approach is to construct continuous predictions -- e.g., based on n…
View article: Context versus Prior Knowledge in Language Models
Context versus Prior Knowledge in Language Models Open
To answer a question, language models often need to integrate prior knowledge learned during pretraining and new information presented in context. We hypothesize that models perform this integration in a predictable way across different qu…
View article: The AL$\ell_0$CORE Tensor Decomposition for Sparse Count Data
The AL$\ell_0$CORE Tensor Decomposition for Sparse Count Data Open
This paper introduces AL$\ell_0$CORE, a new form of probabilistic non-negative tensor decomposition. AL$\ell_0$CORE is a Tucker decomposition where the number of non-zero elements (i.e., the $\ell_0$-norm) of the core tensor is constrained…
View article: Activation Scaling for Steering and Interpreting Language Models
Activation Scaling for Steering and Interpreting Language Models Open
Given the prompt “Rome is in”, can we steer a language model to flip its prediction of an incorrect token “France” to a correct token “Italy” by only multiplying a few relevant activation vectors with scalars? We argue that successfully in…
View article: Measurement in the Age of LLMs: An Application to Ideological Scaling
Measurement in the Age of LLMs: An Application to Ideological Scaling Open
Much of social science is centered around terms like ``ideology'' or ``power'', which generally elude precise definition, and whose contextual meanings are trapped in surrounding language. This paper explores the use of large language mode…
View article: Estimating conflict losses and reporting biases
Estimating conflict losses and reporting biases Open
Determining the number of casualties and fatalities suffered in militarized conflicts is important for conflict measurement, forecasting, and accountability. However, given the nature of conflict, reliable statistics on casualties are rare…
View article: Sentiment as an Ordinal Latent Variable
Sentiment as an Ordinal Latent Variable Open
Sentiment analysis has become a central tool in various disciplines outside of natural language processing. In particular in applied and domain-specific settings with strong requirements for interpretable methods, dictionary-based approach…
View article: An Ordinal Latent Variable Model of Conflict Intensity
An Ordinal Latent Variable Model of Conflict Intensity Open
Measuring the intensity of events is crucial for monitoring and tracking armed conflict. Advances in automated event extraction have yielded massive data sets of '' who did what to whom '' micro-records that enable datadriven approaches to…
View article: The Ordered Matrix Dirichlet for State-Space Models
The Ordered Matrix Dirichlet for State-Space Models Open
Many dynamical systems in the real world are naturally described by latent states with intrinsic orderings, such as "ally", "neutral", and "enemy" relationships in international relations. These latent states manifest through countries' co…
View article: An Ordinal Latent Variable Model of Conflict Intensity
An Ordinal Latent Variable Model of Conflict Intensity Open
Measuring the intensity of events is crucial for monitoring and tracking armed conflict. Advances in automated event extraction have yielded massive data sets of "who did what to whom" micro-records that enable data-driven approaches to mo…
View article: Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data
Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data Open
We present a new non-negative matrix factorization model for $(0,1)$ bounded-support data based on the doubly non-central beta (DNCB) distribution, a generalization of the beta distribution. The expressiveness of the DNCB distributio…
View article: Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data
Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data Open
We present a new non-negative matrix factorization model for $(0,1)$ bounded-support data based on the doubly non-central beta (DNCB) distribution, a generalization of the beta distribution. The expressiveness of the DNCB distribution is p…
View article: A Bayesian nonparametric model for inferring subclonal populations from structured DNA sequencing data
A Bayesian nonparametric model for inferring subclonal populations from structured DNA sequencing data Open
There are distinguishing features or "hallmarks" of cancer that are found across tumors, individuals, and types of cancer, and these hallmarks can be driven by specific genetic mutations. Yet, within a single tumor there is often extensive…
View article: Assessing the Effects of Friend-to-Friend Texting onTurnout in the 2018 US Midterm Elections
Assessing the Effects of Friend-to-Friend Texting onTurnout in the 2018 US Midterm Elections Open
Recent mobile app technology lets people systematize the process of messaging their friends to urge them to vote. Prior to the most recent US midterm elections in 2018, the mobile app Outvote randomized an aspect of their system, hoping to…
View article: Allocative Poisson Factorization for Computational Social Science
Allocative Poisson Factorization for Computational Social Science Open
Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and bur…
View article: A Bayesian Nonparametric Model for Inferring Subclonal Populations from Structured DNA Sequencing Data
A Bayesian Nonparametric Model for Inferring Subclonal Populations from Structured DNA Sequencing Data Open
There are distinguishing features or “hallmarks” of cancer that are found across tumors, individuals, and types of cancer, and these hallmarks can be driven by specific genetic mutations. Yet, within a single tumor there is often extensive…
View article: Poisson-Randomized Gamma Dynamical Systems
Poisson-Randomized Gamma Dynamical Systems Open
This paper presents the Poisson-randomized gamma dynamical system (PRGDS), a model for sequentially observed count tensors that encodes a strong inductive bias toward sparsity and burstiness. The PRGDS is based on a new motif in Bayesian l…
View article: The Hyperedge Event Model
The Hyperedge Event Model Open
We introduce the hyperedge event model (HEM)---a generative model for events that can be represented as directed edges with one sender and one or more receivers or one receiver and one or more senders. We integrate a dynamic version of the…
View article: Locally Private Bayesian Inference for Count Models
Locally Private Bayesian Inference for Count Models Open
We present a general method for privacy-preserving Bayesian inference in Poisson factorization, a broad class of models that includes some of the most widely used models in the social sciences. Our method satisfies limited precision local …
View article: Poisson--Gamma Dynamical Systems
Poisson--Gamma Dynamical Systems Open
We introduce a new dynamical system for sequentially observed multivariate count data. This model is based on the gamma--Poisson construction---a natural choice for count data---and relies on a novel Bayesian nonparametric prior that ties …
View article: Poisson--Gamma Dynamical Systems
Poisson--Gamma Dynamical Systems Open
We introduce a new dynamical system for sequentially observed multivariate count data. This model is based on the gamma--Poisson construction---a natural choice for count data---and relies on a novel Bayesian nonparametric prior that ties …
View article: Bayesian Poisson Tucker Decomposition for Learning the Structure of International Relations
Bayesian Poisson Tucker Decomposition for Learning the Structure of International Relations Open
We introduce Bayesian Poisson Tucker decomposition (BPTD) for modeling country--country interaction event data. These data consist of interaction events of the form "country $i$ took action $a$ toward country $j$ at time $t$." BPTD discove…
View article: Bayesian Poisson Tensor Factorization for Inferring Multilateral Relations from Sparse Dyadic Event Counts
Bayesian Poisson Tensor Factorization for Inferring Multilateral Relations from Sparse Dyadic Event Counts Open
We present a Bayesian tensor factorization model for inferring latent group structures from dynamic pairwise interaction patterns. For decades, political scientists have collected and analyzed records of the form "country $i$ took action $…