Irina Rish
YOU?
Author Swipe
View article: Critical role of EEG signals in assessment of sex-specific insights in neurological diagnostics via machine learning approach
Critical role of EEG signals in assessment of sex-specific insights in neurological diagnostics via machine learning approach Open
Early detection and diagnosis of neurological pathology are essential for timely treatment and intervention. While deep learning has shown promise in analyzing brain imaging data, the influence of sex-specific patterns in electroencephalog…
View article: Influence Functions for Efficient Data Selection in Reasoning
Influence Functions for Efficient Data Selection in Reasoning Open
Fine-tuning large language models (LLMs) on chain-of-thought (CoT) data shows that a small amount of high-quality data can outperform massive datasets. Yet, what constitutes "quality" remains ill-defined. Existing reasoning methods rely on…
View article: Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?
Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks? Open
AI agents are vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external content or tool outputs cause unintended or harmful behavior. Inspired by the well-established concept of firewalls, we show t…
View article: Warming Up for Zeroth-Order Federated Pre-Training with Low Resource Clients
Warming Up for Zeroth-Order Federated Pre-Training with Low Resource Clients Open
Federated learning enables collaborative model training across numerous edge devices without requiring participants to share data; however, memory and communication constraints on these edge devices may preclude their participation in trai…
View article: A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy
A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy Open
Deep learning models operating in the image domain are vulnerable to small input perturbations. For years, robustness to such perturbations was pursued by training models from scratch (i.e., with random initializations) using specialized l…
View article: Revisiting Replay and Gradient Alignment for Continual Pre-Training of Large Language Models
Revisiting Replay and Gradient Alignment for Continual Pre-Training of Large Language Models Open
Training large language models (LLMs) typically involves pre-training on massive corpora, only to restart the process entirely when new data becomes available. A more efficient and resource-conserving approach would be continual pre-traini…
View article: GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities
GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities Open
The rapid evolution of software libraries poses a considerable hurdle for code generation, necessitating continuous adaptation to frequent version updates while preserving backward compatibility. While existing code evolution benchmarks pr…
View article: Spectra 1.1: Scaling Laws and Efficient Inference for Ternary Language Models
Spectra 1.1: Scaling Laws and Efficient Inference for Ternary Language Models Open
Large language models (LLMs) are increasingly used across research and industry applications, yet their inference efficiency remains a significant challenge. As the computational power of modern GPU architectures continuously improves, the…
View article: Random Initialization Can't Catch Up: The Advantage of Language Model Transfer for Time Series Forecasting
Random Initialization Can't Catch Up: The Advantage of Language Model Transfer for Time Series Forecasting Open
Recent works have demonstrated the effectiveness of adapting pre-trained language models (LMs) for forecasting time series in the low-data regime. We build upon these findings by analyzing the effective transfer from language models to tim…
View article: Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning
Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning Open
This work aims to understand how scaling improves language models, specifically in terms of training dynamics. We find that language models undergo loss deceleration early in training; an abrupt slowdown in the rate of loss improvement, re…
View article: Artificial neural networks for magnetoencephalography: a review of an emerging field
Artificial neural networks for magnetoencephalography: a review of an emerging field Open
Objective . Magnetoencephalography (MEG) is a cutting-edge neuroimaging technique that measures the intricate brain dynamics underlying cognitive processes with an unparalleled combination of high temporal and spatial precision. While MEG …
View article: MEEGNet: An open source python library for the application of convolutional neural networks to MEG
MEEGNet: An open source python library for the application of convolutional neural networks to MEG Open
Artificial Neural Networks (ANNs) are rapidly gaining traction in neuroscience, proving invaluable for decoding and modeling brain signals from techniques such as electroencephalography (EEG) and functional magnetic resonance imaging (fMRI…
View article: Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training Open
The ever-growing availability of unlabeled data presents both opportunities and challenges for training artificial intelligence systems. While self-supervised learning (SSL) has emerged as a powerful paradigm for extracting meaningful repr…
View article: Artificial Neural Networks for Magnetoencephalography: A review of an emerging field
Artificial Neural Networks for Magnetoencephalography: A review of an emerging field Open
Magnetoencephalography (MEG) is a cutting-edge neuroimaging technique that measures the intricate brain dynamics underlying cognitive processes with an unparalleled combination of high temporal and spatial precision. MEG data analytics has…
View article: CHIRP: A Fine-Grained Benchmark for Open-Ended Response Evaluation in Vision-Language Models
CHIRP: A Fine-Grained Benchmark for Open-Ended Response Evaluation in Vision-Language Models Open
The proliferation of Vision-Language Models (VLMs) in the past several years calls for rigorous and comprehensive evaluation methods and benchmarks. This work analyzes existing VLM evaluation techniques, including automated metrics, AI-bas…
View article: Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference Open
Realtime environments change even as agents perform action inference and learning, thus requiring high interaction frequencies to effectively minimize regret. However, recent advances in machine learning involve larger neural networks with…
View article: Critical Role of EEG Signals in Assessment of Sex-Specific Insights in Neurological Diagnostics via Machine Learning Approach
Critical Role of EEG Signals in Assessment of Sex-Specific Insights in Neurological Diagnostics via Machine Learning Approach Open
Early detection and diagnosis of pathology are essential for efficient treatment and therapeutic interventions. The emergence of Artificial Intelligence (AI) and deep machine learning techniques have demonstrated the promising capability o…
View article: RedPajama: an Open Dataset for Training Large Language Models
RedPajama: an Open Dataset for Training Large Language Models Open
Large language models are increasingly becoming a cornerstone technology in artificial intelligence, the sciences, and society as a whole, yet the optimal strategies for dataset composition and filtering remain largely elusive. Many of the…
View article: Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching Open
In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment. Traditionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and…
View article: GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models Open
The rapid evolution of software libraries presents a significant challenge for code generation models, which must adapt to frequent version updates while maintaining compatibility with previous versions. Existing code completion benchmarks…
View article: Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning
Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning Open
Decoder-only Transformers often struggle with complex reasoning tasks, particularly arithmetic reasoning requiring multiple sequential operations. In this work, we identify representation collapse in the model's intermediate layers as a ke…
View article: Context is Key: A Benchmark for Forecasting with Essential Textual Information
Context is Key: A Benchmark for Forecasting with Essential Textual Information Open
Forecasting is a critical task in decision-making across numerous domains. While historical numerical data provide a start, they fail to convey the complete context for reliable and accurate predictions. Human forecasters frequently rely o…
View article: VFA: Vision Frequency Analysis of Foundation Models and Human
VFA: Vision Frequency Analysis of Foundation Models and Human Open
Machine learning models often struggle with distribution shifts in real-world scenarios, whereas humans exhibit robust adaptation. Models that better align with human perception may achieve higher out-of-distribution generalization. In thi…
View article: Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale
Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale Open
Rapid advancements in GPU computational power has outpaced memory capacity and bandwidth growth, creating bottlenecks in Large Language Model (LLM) inference. Post-training quantization is the leading method for addressing memory-related b…
View article: Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent
Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent Open
Understanding the mechanisms behind decisions taken by large foundation models in sequential decision making tasks is critical to ensuring that such systems operate transparently and safely. In this work, we perform exploratory analysis on…
View article: Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques
Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques Open
Vision-Language Models (VLMs) have witnessed a surge in both research and real-world applications. However, as they are becoming increasingly prevalent, ensuring their robustness against adversarial attacks is paramount. This work systemat…
View article: Lost in Translation: The Algorithmic Gap Between LMs and the Brain
Lost in Translation: The Algorithmic Gap Between LMs and the Brain Open
Language Models (LMs) have achieved impressive performance on various linguistic tasks, but their relationship to human language processing in the brain remains unclear. This paper examines the gaps and overlaps between LMs and the brain a…