Explanipedia

You Don’t Need Robust Machine Learning to Manage Adversarial Attack Risks Open

Edward Raff, Michel Benaroch, Andrew Farris · 2025

The robustness of modern machine learning (ML) models has become an increasing concern within the community. The ability to subvert a model into making errant predictions using seemingly inconsequential changes to input is startling, as is…

EMBER2024 - A Benchmark Dataset for Holistic Evaluation of Malware Classifiers Open

Robert J. Joyce, Gerald R. Miller, Phil Roth, Richard Zak, Elliott Zaresky-Williams , et al. · 2025

Computer science Geography

A lack of accessible data has historically restricted malware analysis research, and practitioners have relied heavily on datasets provided by industry sources to advance. Existing public datasets are limited by narrow scope - most include…

Quick Draw Bandits: Quickly Optimizing in Nonstationary Environments with Extremely Many Arms Open

D. Everett, Fred Lu, Edward Raff, Fernando Camacho, James Holt · 2025

Computer science

Canonical algorithms for multi-armed bandits typically assume a stationary reward environment where the size of the action space (number of arms) is small. More recently developed methods typically relax only one of these assumptions: exis…

Adversarial Machine Learning Attacks on Financial Reporting via Maximum Violated Multi-Objective Attack Open

Edward Raff, Karen Kukla, Michel Benaroch, Joseph Comprix · 2025

Bad actors, primarily distressed firms, have the incentive and desire to manipulate their financial reports to hide their distress and derive personal gains. As attackers, these firms are motivated by potentially millions of dollars and th…

ClarAVy: A Tool for Scalable and Accurate Malware Family Labeling Open

Robert J. Joyce, D. Everett, Maya Fuchs, Edward Raff, James Holt · 2025

Computer science

Determining the family to which a malicious file belongs is an essential component of cyberattack investigation, attribution, and remediation. Performing this task manually is time consuming and requires expert knowledge. Automated tools u…

Disassembly as Weighted Interval Scheduling with Learned Weights Open

Antonio Flores-Montoya, Joon Seo Lim, Adam Seitz, Anil K. Sood, Edward Raff , et al. · 2025

Disassembly is the first step of a variety of binary analysis and transformation techniques, such as reverse engineering, or binary rewriting. Recent disassembly approaches consist of three phases: an exploration phase, that overapproximat…

Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation Open

Seyedreza Mohseni, Siamak Mohammadi, Deepa Tilwani, Yash Saxena, Gerald Ketu Ndawula , et al. · 2025

Computer science

Malware authors often employ code obfuscations to make their malware harder to detect. Existing tools for generating obfuscated code often require access to the original source code (e.g., C++ or Java), and adding new obfuscations is a non…

What Do Machine Learning Researchers Mean by “Reproducible”? Open

Edward Raff, Michel Benaroch, Sagar Samtani, Andrew Farris · 2025

Computer science Psychology

The concern that Artificial Intelligence (AI) and Machine Learning (ML) are entering a "reproducibility crisis" has spurred significant research in the past few years. Yet with each paper, it is often unclear what someone means by "reprodu…

Differentially Private Iterative Screening Rules for Linear Regression Open

Amol Khanna, Fred Lu, Edward Raff · 2025

Linear $L_1$-regularized models have remained one of the simplest and most effective tools in data science. Over the past decade, screening rules have risen in popularity as a way to eliminate features when producing the sparse regression …

Multi-layer Radial Basis Function Networks for Out-of-distribution Detection Open

Amol Khanna, C.C. Ling, D. H. Everett, Edward Raff, Nathan Inkawhich · 2025

Computer science Mathematics Materials science

Existing methods for out-of-distribution (OOD) detection use various techniques to produce a score, separate from classification, that determines how ``OOD'' an input is. Our insight is that OOD detection can be simplified by using a neura…

Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation Open

Seyedreza Mohseni, Siamak Mohammadi, Deepa Tilwani, Yash Saxena, Gerald Ndwula , et al. · 2024

Computer science

Malware authors often employ code obfuscations to make their malware harder to detect. Existing tools for generating obfuscated code often require access to the original source code (e.g., C++ or Java), and adding new obfuscations is a non…

Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context Open

Nilanjana Das, Edward Raff, Manas Gaur · 2024

Computer science Psychology Biology

As the AI systems become deeply embedded in social media platforms, we've uncovered a concerning security vulnerability that goes beyond traditional adversarial attacks. It becomes important to assess the risks of LLMs before the general p…

What Do Machine Learning Researchers Mean by "Reproducible"? Open

Edward Raff, Michel Benaroch, Sagar Samtani, Andrew Farris · 2024

Computer science

The concern that Artificial Intelligence (AI) and Machine Learning (ML) are entering a "reproducibility crisis" has spurred significant research in the past few years. Yet with each paper, it is often unclear what someone means by "reprodu…

Living off the Analyst: Harvesting Features from Yara Rules for Malware Detection Open

Siddhant Gupta, Fred Lu, Andrew Barlow, Edward Raff, Francis Ferraro , et al. · 2024

Computer science

A strategy used by malicious actors is to "live off the land," where benign systems and tools already available on a victim's systems are used and repurposed for the malicious actor's intent. In this work, we ask if there is a way for anti…

Stabilizing Linear Passive-Aggressive Online Learning with Weighted Reservoir Sampling Open

Skyler Wu, Fred Lu, Edward Raff, James B. Holt · 2024

Computer science Mathematics

Online learning methods, like the seminal Passive-Aggressive (PA) classifier, are still highly effective for high-dimensional streaming data, out-of-core processing, and other throughput-sensitive applications. Many such algorithms rely on…

Is Function Similarity Over-Engineered? Building a Benchmark Open

Rebecca Saul, Chang Liu, Noah Fleischmann, Richard Zak, Kristopher Micinski , et al. · 2024

Computer science Geography Biology

Binary analysis is a core component of many critical security tasks, including reverse engineering, malware analysis, and vulnerability detection. Manual analysis is often time-consuming, but identifying commonly-used or previously-seen fu…

A Walsh Hadamard Derived Linear Vector Symbolic Architecture Open

Mohammad Mahmudul Alam, Alexander Oberle, Edward Raff, Stella Biderman, Tim Oates , et al. · 2024

Mathematics Computer science Biology

Vector Symbolic Architectures (VSAs) are one approach to developing Neuro-symbolic AI, where two vectors in $\mathbb{R}^d$ are `bound' together to produce a new vector in the same space. VSAs support the commutativity and associativity of …

Position: Challenges and Opportunities for Differential Privacy in the U.S. Federal Government Open

Amol Khanna, Adam McCormick, André T. Nguyen, C Aguirre, Edward Raff · 2024

Business Political science Computer science

In this article, we seek to elucidate challenges and opportunities for differential privacy within the federal government setting, as seen by a team of differential privacy researchers, privacy lawyers, and data scientists working closely …

Neural Normalized Compression Distance and the Disconnect Between Compression and Classification Open

John Hurwitz, Charles Nicholas, Edward Raff · 2024

Computer science Materials science

It is generally well understood that predictive classification and compression are intrinsically related concepts in information theory. Indeed, many deep learning methods are explained as learning a kind of compression, and that better co…

High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global Updates Open

Fred Lu, Ryan R. Curtin, Edward Raff, Francis Ferraro, James Holt · 2024

Computer science

As the size of datasets used in statistical learning continues to grow, distributed training of models has attracted increasing attention. These methods partition the data and exploit parallelism to reduce memory and runtime, but suffer in…

More Options for Prelabor Rupture of Membranes, A Bayesian Analysis Open

Ashley Klein, Edward Raff, Elisabeth Seamon, Lily Foley, Timothy Bussert · 2024

Computer science Biology

An obstetric goal for a laboring mother is to achieve a vaginal delivery as it reduces the risks inherent in major abdominal surgery (i.e., a Cesarean section). Various medical interventions may be used by a physician to increase the likel…

Feature Selection from Differentially Private Correlations Open

Ryan Swope, Amol Khanna, Philip Doldo, Saptarshi Roy, Edward Raff · 2024

Computer science Philosophy

Data scientists often seek to identify the most important features in high-dimensional datasets. This can be done through $L_1$-regularized regression, but this can become inefficient for very high-dimensional datasets. Additionally, high-…

Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context Open

Nilanjana Das, Edward Raff, Manas Gaur · 2024

Computer science Psychology History

Previous research on testing the vulnerabilities in Large Language Models (LLMs) using adversarial attacks has primarily focused on nonsensical prompt injections, which are easily detected upon manual or automated review (e.g., via byte en…

WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions Open

Seyedali Mohammadi, Edward Raff, Jinendra Malekar, Vedant Palit, Francis Ferraro , et al. · 2024

Computer science Chemistry

Language Models (LMs) are being proposed for mental health applications where the heightened risk of adverse outcomes means predictive performance may not be a sufficient litmus test of a model's utility in clinical practice. A model that …

Optimizing the Optimal Weighted Average: Efficient Distributed Sparse Classification Open

Fred Lu, Ryan R. Curtin, Edward Raff, Francis Ferraro, James Holt · 2024

Computer science Mathematics

While distributed training is often viewed as a solution to optimizing linear models on increasingly large datasets, inter-machine communication costs of popular distributed approaches can dominate as data dimensionality increases. Recent …

Assemblage: Automatic Binary Dataset Construction for Machine Learning Open

Chang Liu, Rebecca Saul, Yihao Sun, Edward Raff, Maya Fuchs , et al. · 2024

Computer science Geography Mathematics

Binary code is pervasive, and binary analysis is a key task in reverse engineering, malware classification, and vulnerability discovery. Unfortunately, while there exist large corpora of malicious binaries, obtaining high-quality corpora o…

Attribution in Scientific Literature: New Benchmark and Methods Open

Deepa Tilwani, Yash Saxena, Ali Mohammadi, Edward Raff, Sheth Amit , et al. · 2024

Computer science Geography

Large language models (LLMs) present a promising yet challenging frontier for automated source citation in scientific communication. Previous approaches to citation generation have been limited by citation ambiguity and LLM overgeneralizat…

SoK: A Review of Differentially Private Linear Models For High-Dimensional Data Open

Amol Khanna, Edward Raff, Nathan Inkawhich · 2024

Computer science

Linear models are ubiquitous in data science, but are particularly prone to overfitting and data memorization in high dimensions. To guarantee the privacy of training data, differential privacy can be used. Many papers have proposed optimi…

Comparison of Two Methods of Antepartum Anticoagulation: Continuation of Enoxaparin until Scheduled Induction of Labor Versus Transitioning to Heparin with Spontaneous Labor Open

Marcia DesJardin, Edward Raff, Brian James, Angelina Mozier, Nicholas Baranco , et al. · 2024

Medicine Computer science

Pregnancy is a hypercoagulable state. There is a lack of strong evidence-based guidance regarding management when anticoagulation is required to prevent or treat venous thromboembolism during pregnancy. In practice, some patients are presc…

Edward Raff YOU? Author Swipe