Sanjeev Arora
YOU?
Author Swipe
View article: On the Impossibility of Retrain Equivalence in Machine Unlearning
On the Impossibility of Retrain Equivalence in Machine Unlearning Open
Machine unlearning seeks to selectively remove the "influence" of specific training data on a model's outputs. The ideal goal is Retrain Equivalence--behavior identical to a model trained from scratch on only the retained data. This goal w…
View article: Rethinking Thinking Tokens: LLMs as Improvement Operators
Rethinking Thinking Tokens: LLMs as Improvement Operators Open
Reasoning training incentivizes LLMs to produce long chains of thought (long CoT), which among other things, allows them to explore solution strategies with self-checking. This results in higher accuracy, but inflates context length, token…
View article: OCA: A Shiny web application for transparent overload compensation in higher education
OCA: A Shiny web application for transparent overload compensation in higher education Open
View article: Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors
Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors Open
Large language models (LLMs) now solve multi-step problems by emitting extended chains of thought. During the process, they often re-derive the same intermediate steps across problems, inflating token usage and latency. This saturation of …
View article: The Dose Response Regarding Microbial Disease: A Mathematical View
The Dose Response Regarding Microbial Disease: A Mathematical View Open
In the society, we have seen the incidence of both communicable and non-communicable disease, we are also familiar with the events of chronic diseases like T. B. which takes longer duration to be cured as compared to cold, cough or flu etc…
View article: Advancing science- and evidence-based AI policy
Advancing science- and evidence-based AI policy Open
Policy must be informed by, but also facilitate the generation of, scientific evidence
View article: Review Article: Ecotoxicological Impacts of Pollution on Biodiversity and Ecosystem Health in the Anthropocene
Review Article: Ecotoxicological Impacts of Pollution on Biodiversity and Ecosystem Health in the Anthropocene Open
The Anthropocene epoch, characterized by accelerated industrial growth and intensified human activities, has led to widespread environmental contamination, significantly endangering biodiversity and ecosystem functionality. This review del…
View article: Evaluating the impact of Extension of Community Healthcare Outcome (ECHO) telementoring on enhancing knowledge and perceived skills in alcohol use disorders among the nonmedical District Mental Health Programme healthcare providers in Karnataka
Evaluating the impact of Extension of Community Healthcare Outcome (ECHO) telementoring on enhancing knowledge and perceived skills in alcohol use disorders among the nonmedical District Mental Health Programme healthcare providers in Karnataka Open
Background: Alcohol use disorder (AUD) is a public health problem. In India, about 5.2% of the population aged 10–75 years, that is, approximately 5.7 crore individuals, need help for their alcohol use problems, and around 20 lakhs from Ka…
View article: Weak-to-Strong Generalization Even in Random Feature Networks, Provably
Weak-to-Strong Generalization Even in Random Feature Networks, Provably Open
Weak-to-Strong Generalization (Burns et al., 2024) is the phenomenon whereby a strong student, say GPT-4, learns a task from a weak teacher, say GPT-2, and ends up significantly outperforming the teacher. We show that this phenomenon does …
View article: Time to Rethink AI for Combinatorial Optimization: Classical Algorithms Remain Tough to Match
Time to Rethink AI for Combinatorial Optimization: Classical Algorithms Remain Tough to Match Open
This position paper argues that the machine learning community should fundamentally rethink how AI-inspired methods are developed and evaluated for combinatorial optimization (CO). We present comprehensive empirical benchmarks comparing va…
View article: Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? Open
Vision Language Models (VLMs) are impressive at visual question answering and image captioning. But they underperform on multi-step visual reasoning -- even compared to LLMs on the same tasks presented in text form -- giving rise to percep…
View article: A dual strain probiotic administered via the waterline beneficially modulates the ileal and cecal microbiome, sIgA and acute phase protein levels, and growth performance of broilers during a dysbacteriosis challenge
A dual strain probiotic administered via the waterline beneficially modulates the ileal and cecal microbiome, sIgA and acute phase protein levels, and growth performance of broilers during a dysbacteriosis challenge Open
Intestinal dysbacteriosis is increasing in broilers due to the reduced use of antibiotics in feed. This study tested the effect of daily waterline administration of a dual-strain probiotic comprising Lactobacillus acidophilus AG01 and Bifi…
View article: NAMAH—An Innovative Tele-ECHO Mentoring Program to Foster Well-being Among Physicians
NAMAH—An Innovative Tele-ECHO Mentoring Program to Foster Well-being Among Physicians Open
Background: The current study aimed to develop and implement the National Assistance in Mental Health for Health Care Providers (NAMAH) module, which focused on wellness and building resilience for a cohort of physicians. Methods: The NAMA…
View article: Can Models Learn Skill Composition from Examples?
Can Models Learn Skill Composition from Examples? Open
As large language models (LLMs) become increasingly advanced, their ability to exhibit compositional generalization -- the capacity to combine learned skills in novel ways not encountered during training -- has garnered significant attenti…
View article: Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning
Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning Open
We introduce Instruct-SkillMix, an automated approach for creating diverse, high quality SFT data for instruction-following. The pipeline involves two stages, each leveraging an existing powerful LLM: (1) Skill extraction: uses the LLM to …
View article: ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty
ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty Open
Compositionality is a critical capability in Text-to-Image (T2I) models, as it reflects their ability to understand and combine multiple concepts from text descriptions. Existing evaluations of compositional capability rely heavily on huma…
View article: Correction: Clinical Outcomes of Rural Patients with Diabetes Treated by ECHO-Trained Providers Versus an Academic Medical Center
Correction: Clinical Outcomes of Rural Patients with Diabetes Treated by ECHO-Trained Providers Versus an Academic Medical Center Open
View article: Expanding Hepatitis C Virus Treatment in the New Mexico State Prison System: Using the ECHO Model for Provider and Prison Peer Education
Expanding Hepatitis C Virus Treatment in the New Mexico State Prison System: Using the ECHO Model for Provider and Prison Peer Education Open
It is critical to address hepatitis C virus (HCV) in carceral settings to achieve worldwide elimination of the virus. We describe New Mexico's (NM) experience expanding HCV treatment in state prisons, supplemented with Project ECHO (ECHO; …
View article: AI-Assisted Generation of Difficult Math Questions
AI-Assisted Generation of Difficult Math Questions Open
Current LLM training positions mathematical reasoning as a core capability. With publicly available sources fully tapped, there is unmet demand for diverse and challenging math questions. Relying solely on human experts is both time-consum…
View article: Clinical Outcomes of Rural Patients with Diabetes Treated by ECHO-Trained Providers Versus an Academic Medical Center
Clinical Outcomes of Rural Patients with Diabetes Treated by ECHO-Trained Providers Versus an Academic Medical Center Open
View article: CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs Open
Chart understanding plays a pivotal role when applying Multimodal Large Language Models (MLLMs) to real-world tasks such as analyzing scientific papers or financial reports. However, existing datasets often focus on oversimplified and homo…
View article: Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving Open
Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, …
View article: The Hidden Threat: Exposing OSINT Exploitation in Cyber Attacks
The Hidden Threat: Exposing OSINT Exploitation in Cyber Attacks Open
The use of open-source intelligence (OSINT) has become an important tool for hackers in modern cyber warfare. This paper examines how attackers use publicly available information to target individuals and organizations. We explore various …
View article: Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates Open
Public LLMs such as the Llama 2-Chat underwent alignment training and were considered safe. Recently Qi et al. [2024] reported that even benign fine-tuning on seemingly safe datasets can give rise to unsafe behaviors in the models. The cur…
View article: Language Models as Science Tutors
Language Models as Science Tutors Open
NLP has recently made exciting progress toward training language models (LMs) with strong scientific problem-solving skills. However, model development has not focused on real-life use-cases of LMs for science, including applications in ed…
View article: LESS: Selecting Influential Data for Targeted Instruction Tuning
LESS: Selecting Influential Data for Targeted Instruction Tuning Open
Instruction tuning has unlocked powerful capabilities in large language models (LLMs), effectively using combined datasets to develop generalpurpose chatbots. However, real-world applications often require a specialized suite of skills (e.…
View article: Unlearning via Sparse Representations
Unlearning via Sparse Representations Open
Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques. We propose a nearly compute-free zero-shot unlearning technique based…
View article: Curriculum-based Faculty Training in Networking: Knowledge and Self-efficacy Outcomes
Curriculum-based Faculty Training in Networking: Knowledge and Self-efficacy Outcomes Open
Although the advantages of developmental networks are well-known, most faculty do not know how to participate in such networks actively. Additionally, institutions face challenges in teaching faculty the best practices of networking. This …
View article: Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models
Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models Open
With LLMs shifting their role from statistical modeling of language to serving as general-purpose AI agents, how should LLM evaluations change? Arguably, a key ability of an AI agent is to flexibly combine, as needed, the basic skills it h…
View article: A Quadratic Synchronization Rule for Distributed Deep Learning
A Quadratic Synchronization Rule for Distributed Deep Learning Open
In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work together to train large models. Local gradient methods, such as Loca…