Danny Halawi
YOU?
Author Swipe
View article: ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities
ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities Open
Forecasts of future events are essential inputs into informed decision-making. Machine learning (ML) systems have the potential to deliver forecasts at scale, but there is no framework for evaluating the accuracy of ML systems on a standar…
View article: Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation Open
Black-box finetuning is an emerging interface for adapting state-of-the-art language models to user needs. However, such access may also let malicious actors undermine model safety. To demonstrate the challenge of defending finetuning inte…
View article: Approaching Human-Level Forecasting with Language Models
Approaching Human-Level Forecasting with Language Models Open
Forecasting future events is important for policy and decision making. In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters. Towards this goal, we develop a retrieval-augmented LM …
View article: Overthinking the Truth: Understanding how Language Models Process False Demonstrations
Overthinking the Truth: Understanding how Language Models Process False Demonstrations Open
Modern language models can imitate complex patterns through few-shot learning, enabling them to complete challenging tasks without fine-tuning. However, imitation can also lead models to reproduce inaccuracies or harmful content if present…
View article: Eliciting Latent Predictions from Transformers with the Tuned Lens
Eliciting Latent Predictions from Transformers with the Tuned Lens Open
We analyze transformers from the perspective of iterative inference, seeking to understand how model predictions are refined layer by layer. To do so, we train an affine probe for each block in a frozen pretrained model, making it possible…
View article: Verifying Source Citations in the Hadith Literature
Verifying Source Citations in the Hadith Literature Open
Historians rely on hadiths (narratives about Muhammad) as a source for writing the history of early Islam. Each hadith is preceded by an isnād, which is a list of names purporting to give the sequence of individuals who transmitted it. Sch…