Timothy Mann
YOU?
Author Swipe
View article: The impact of a focused behavioral intervention on brain cannabinoid signaling and interoceptive function: Implications for mood and anxiety
The impact of a focused behavioral intervention on brain cannabinoid signaling and interoceptive function: Implications for mood and anxiety Open
The Wim Hof method (WHM) is a behavioral intervention technique that consists of deep breathing exercises, cold exposure and meditation. In light of the crucial role of the cannabinoid system in modulating neurotransmitter release through …
View article: MuZero with Self-competition for Rate Control in VP9 Video Compression
MuZero with Self-competition for Rate Control in VP9 Video Compression Open
Video streaming usage has seen a significant rise as entertainment, education, and business increasingly rely on online video. Optimizing video compression has the potential to increase access and quality of content to users, and reduce en…
View article: Data Augmentation Can Improve Robustness
Data Augmentation Can Improve Robustness Open
Adversarial training suffers from robust overfitting, a phenomenon where the robust test accuracy starts to decrease during training. In this paper, we focus on reducing robust overfitting by using common data augmentation schemes. We demo…
View article: Improving Robustness using Generated Data
Improving Robustness using Generated Data Open
Recent work argues that robust training requires substantially larger datasets than those required for standard classification. On CIFAR-10 and CIFAR-100, this translates into a sizable robust-accuracy gap between models trained solely on …
View article: Defending Against Image Corruptions Through Adversarial Augmentations
Defending Against Image Corruptions Through Adversarial Augmentations Open
Modern neural networks excel at image classification, yet they remain vulnerable to common image corruptions such as blur, speckle noise or fog. Recent methods that focus on this problem, such as AugMix and DeepAugment, introduce defenses …
View article: Fixing Data Augmentation to Improve Adversarial Robustness
Fixing Data Augmentation to Improve Adversarial Robustness Open
Adversarial training suffers from robust overfitting, a phenomenon where the robust test accuracy starts to decrease during training. In this paper, we focus on both heuristics-driven and data-driven augmentations as a means to reduce robu…
View article: Robust Constrained Reinforcement Learning for Continuous Control with Model Misspecification
Robust Constrained Reinforcement Learning for Continuous Control with Model Misspecification Open
Many real-world physical control systems are required to satisfy constraints upon deployment. Furthermore, real-world systems are often subject to effects such as non-stationarity, wear-and-tear, uncalibrated sensors and so on. Such effect…
View article: Balancing Constraints and Rewards with Meta-Gradient D4PG
Balancing Constraints and Rewards with Meta-Gradient D4PG Open
Deploying Reinforcement Learning (RL) agents to solve real-world applications often requires satisfying complex system constraints. Often the constraint thresholds are incorrectly set due to the complex nature of a system or the inability …
View article: Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples Open
Adversarial training and its variants have become de facto standards for learning robust deep neural networks. In this paper, we explore the landscape around adversarial training in a bid to uncover its limits. We systematically study the …
View article: The NodeHopper: Enabling Low Latency Ranking with Constraints via a Fast Dual Solver
The NodeHopper: Enabling Low Latency Ranking with Constraints via a Fast Dual Solver Open
Modern recommender systems need to deal with multiple objectives like balancing user engagement with recommending diverse and fresh content. An appealing way to optimally trade these off is by imposing constraints on the ranking according …
View article: Non-Stationary Delayed Bandits with Intermediate Observations
Non-Stationary Delayed Bandits with Intermediate Observations Open
Online recommender systems often face long delays in receiving feedback, especially when optimizing for some long-term metrics. While mitigating the effects of delays in learning is well-understood in stationary environments, the problem b…
View article: Achieving Robustness in the Wild via Adversarial Mixing With Disentangled Representations
Achieving Robustness in the Wild via Adversarial Mixing With Disentangled Representations Open
Recent research has made the surprising finding that state-of-the-art deep learning models sometimes fail to generalize to small variations of the input. Adversarial training has been shown to be an effective approach to overcome this prob…
View article: Robust Reinforcement Learning for Continuous Control with Model Misspecification
Robust Reinforcement Learning for Continuous Control with Model Misspecification Open
We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on inco…
View article: An Alternative Surrogate Loss for PGD-based Adversarial Testing
An Alternative Surrogate Loss for PGD-based Adversarial Testing Open
Adversarial testing methods based on Projected Gradient Descent (PGD) are widely used for searching norm-bounded perturbations that cause the inputs of neural networks to be misclassified. This paper takes a deeper look at these methods an…
View article: A Dual Approach to Verify and Train Deep Networks
A Dual Approach to Verify and Train Deep Networks Open
This paper addressed the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (e.g., robustness to bounded …
View article: Active Roll-outs in MDP with Irreversible Dynamics
Active Roll-outs in MDP with Irreversible Dynamics Open
In Reinforcement Learning (RL), regret guarantees scaling with the square root of the time horizon have been shown to hold only for communicating Markov decision processes (MDPs) where any two states are connected. This essentially means t…
View article: Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates
Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates Open
We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two…
View article: Robust Reinforcement Learning for Continuous Control with Model Misspecification
Robust Reinforcement Learning for Continuous Control with Model Misspecification Open
We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on inco…
View article: A Bayesian Approach to Robust Reinforcement Learning
A Bayesian Approach to Robust Reinforcement Learning Open
Robust Markov Decision Processes (RMDPs) intend to ensure robustness with respect to changing or adversarial system behavior. In this framework, transitions are modeled as arbitrary elements of a known and properly structured uncertainty s…
View article: Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates
Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates Open
We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two…
View article: On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models
On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models Open
Recent work has shown that it is possible to train deep neural networks that are provably robust to norm-bounded adversarial perturbations. Most of these methods are based on minimizing an upper bound on the worst-case loss over all possib…
View article: Learning from Delayed Outcomes via Proxies with Applications to\n Recommender Systems
Learning from Delayed Outcomes via Proxies with Applications to\n Recommender Systems Open
Predicting delayed outcomes is an important problem in recommender systems\n(e.g., if customers will finish reading an ebook). We formalize the problem as\nan adversarial, delayed online learning problem and consider how a proxy for\nthe d…
View article: Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems
Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems Open
Predicting delayed outcomes is an important problem in recommender systems (e.g., if customers will finish reading an ebook). We formalize the problem as an adversarial, delayed online learning problem and consider how a proxy for the dela…
View article: Temporal Difference Learning with Neural Networks - Study of the Leakage Propagation Problem
Temporal Difference Learning with Neural Networks - Study of the Leakage Propagation Problem Open
Temporal-Difference learning (TD) [Sutton, 1988] with function approximation can converge to solutions that are worse than those obtained by Monte-Carlo regression, even in the simple case of on-policy evaluation. To increase our understan…
View article: Learning Robust Options
Learning Robust Options Open
Robust reinforcement learning aims to produce policies that have strong guarantees even in the face of environments/transition models whose parameters have strong uncertainty. Existing work uses value-based methods and the usual primitive …
View article: A Dual Approach to Scalable Verification of Deep Networks
A Dual Approach to Scalable Verification of Deep Networks Open
This paper addresses the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (robustness to bounded norm a…
View article: Soft-Robust Actor-Critic Policy-Gradient
Soft-Robust Actor-Critic Policy-Gradient Open
Robust Reinforcement Learning aims to derive optimal behavior that accounts for model uncertainty in dynamical systems. However, previous studies have shown that by considering the worst case scenario, robust policies can be overly conserv…
View article: Optimizing Slate Recommendations via Slate-CVAE.
Optimizing Slate Recommendations via Slate-CVAE. Open
The slate recommendation problem aims to find the ordering of a subset of documents to be presented on a surface that we call The definition of changes depending on the underlying applications but a typical goal is to maximize user enga…
View article: Beyond Greedy Ranking: Slate Optimization via List-CVAE
Beyond Greedy Ranking: Slate Optimization via List-CVAE Open
The conventional solution to the recommendation problem greedily ranks individual document candidates by prediction scores. However, this method fails to optimize the slate as a whole, and hence, often struggles to capture biases caused by…