Explanipedia

Causal Layering via Conditional Entropy Open

Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles , et al. · 2024

Causal discovery aims to recover information about an unobserved causal graph from the observable data it generates. Layerings are orderings of the variables which place causes before effects. In this paper, we provide ways to recover laye…

Editing Arbitrary Propositions in LLMs without Subject Labels Open

Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles , et al. · 2024

Computer science Mathematics Philosophy

Large Language Model (LLM) editing modifies factual information in LLMs. Locate-and-Edit (L\&E) methods accomplish this by finding where relevant information is stored within the neural network, and editing the weights at that location. Th…

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents Open

Zhiwei Liu, Weiran Yao, Jianguo Zhang, Le Xue, Shelby Heinecke , et al. · 2023

Computer science Business Physics

The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs). An LAA is able to generate actions with its core LLM and interact with environments, which facilitates the …

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization Open

Weiran Yao, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Yihao Feng , et al. · 2023

Computer science History Art

Recent months have seen the emergence of a powerful new trend in which large language models (LLMs) are augmented to become autonomous language agents capable of performing objective oriented multi-step tasks on their own, rather than mere…

REX: Rapid Exploration and eXploitation for AI Agents Open

Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue , et al. · 2023

Computer science

In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-maki…

On the Unlikelihood of D-Separation Open

Itai Feigenbaum, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao , et al. · 2023

Mathematics Computer science

Causal discovery aims to recover a causal graph from data generated by it; constraint based methods do so by searching for a d-separating conditioning set of nodes in the graph via an oracle. In this paper, we provide analytic evidence tha…

Salesforce CausalAI Library: A Fast and Scalable Framework for Causal Analysis of Time Series and Tabular Data Open

Devansh Arpit, Matthew Fernandez, Chenghao Liu, Weiran Yao, Wenzhuo Yang , et al. · 2023

Computer science Mathematics Physics

We introduce the Salesforce CausalAI Library, an open-source library for causal analysis using observational data. It supports causal discovery and causal inference for tabular and time series data, of discrete, continuous and heterogeneou…

Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization Open

Devansh Arpit, Huan Wang, Yingbo Zhou, Caiming Xiong · 2021

Computer science Mathematics Economics

In Domain Generalization (DG) settings, models trained independently on a given set of training domains have notoriously chaotic performance on distribution shifted test domains, and stochasticity in optimization (e.g. seed) plays a big ro…

Learning Rich Nearest Neighbor Representations from Self-supervised Ensembles Open

Bram Wallace, Devansh Arpit, Huan Wang, Caiming Xiong · 2021

Computer science Mathematics Philosophy

Pretraining convolutional neural networks via self-supervision, and applying them in transfer learning, is an incredibly fast-growing field that is rapidly and iteratively improving performance across practically all image domains. Meanwhi…

Momentum Contrastive Autoencoder: Using Contrastive Learning for Latent Space Distribution Matching in WAE Open

Devansh Arpit, Aadyot Bhatnagar, Huan Wang, Caiming Xiong · 2021

Computer science Mathematics Economics

Wasserstein autoencoder (WAE) shows that matching two distributions is equivalent to minimizing a simple autoencoder (AE) loss under the constraint that the latent space of this AE matches a pre-specified prior distribution. This latent sp…

Merlion: A Machine Learning Library for Time Series Open

Aadyot Bhatnagar, Paul Kassianik, Chenghao Liu, Tian Lan, Wenzhuo Yang , et al. · 2021

Computer science Biology Geography

We introduce Merlion, an open-source machine learning library for time series. It features a unified interface for many commonly used models and datasets for anomaly detection and forecasting on both univariate and multivariate time series…

Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization Open

Stanisław Jastrzȩbski, Devansh Arpit, Oliver Åstrand, Giancarlo Kerg, Huan Wang , et al. · 2020

Computer science Mathematics Philosophy

The early phase of training a deep neural network has a dramatic effect on the local curvature of the loss function. For instance, using a small learning rate does not guarantee stable optimization because the optimization trajectory has a…

Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts\n Generalization Open

Stanisław Jastrzȩbski, Devansh Arpit, Oliver Åstrand, Giancarlo Kerg, Huan Wang , et al. · 2020

Computer science Mathematics Materials science

The early phase of training a deep neural network has a dramatic effect on\nthe local curvature of the loss function. For instance, using a small learning\nrate does not guarantee stable optimization because the optimization trajectory\nha…

The Break-Even Point on Optimization Trajectories of Deep Neural Networks Open

Stanisław Jastrzȩbski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor , et al. · 2020

Computer science Mathematics Physics

The early phase of training of deep neural networks is critical for their final performance. In this work, we study how the hyperparameters of stochastic gradient descent (SGD) used in the early phase of training affect the rest of the opt…

Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning Open

Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio · 2020

Computer science Mathematics Engineering

We introduce a parameterization method called Neural Bayes which allows computing statistical quantities that are in general difficult to compute and opens avenues for formulating new objectives for unsupervised representation learning. Sp…

Predicting with High Correlation Features Open

Devansh Arpit, Caiming Xiong, Richard Socher · 2019

Computer science Mathematics Philosophy

It has been shown that instead of learning actual object features, deep networks tend to exploit non-robust (spurious) discriminative features that are shared between training and test sets. Therefore, while they achieve state of the art p…

Entropy Penalty: Towards Generalization Beyond the IID Assumption Open

Devansh Arpit, Caiming Xiong, Richard Socher · 2019

Computer science Physics

It has been shown that instead of learning actual object features, deep networks tend to exploit non-robust (spurious) discriminative features that are shared between training and test sets. Therefore, while they achieve state of the art p…

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets Open

Devansh Arpit, Víctor Campos, Yoshua Bengio · 2019

Computer science Chemistry Sociology

Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter initialization strategies have not been studied previously for weight normalized networks and, in practice…

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets Open

Devansh Arpit, Víctor Campos, Yoshua Bengio · 2019

Computer science Mathematics Sociology

Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter initialization strategies have not been studied previously for weight normalized networks and, in practice…

The Benefits of Over-parameterization at Initialization in Deep ReLU Networks Open

Devansh Arpit, Yoshua Bengio · 2019

Computer science Mathematics Philosophy

It has been noted in existing literature that over-parameterization in ReLU networks generally improves performance. While there could be several factors involved behind this, we prove some desirable theoretical properties at initializatio…

h-detach: Modifying the LSTM Gradient Towards Better Optimization Open

Devansh Arpit, Bhargav Kanuparthi, Giancarlo Kerg, Nan Rosemary Ke, Ioannis Mitliagkas , et al. · 2018

Computer science Mathematics Economics

Recurrent neural networks are known for their notorious exploding and vanishing gradient problem (EVGP). This problem becomes more evident in tasks where the information needed to correctly solve them exist over long time scales, because E…

On the Spectral Bias of Deep Neural Networks Open

Nasim Rahaman, Devansh Arpit, Aristide Baratin, Felix Draxler, Min Lin , et al. · 2018

Computer science Mathematics Engineering

It is well known that over-parametrized deep neural networks (DNNs) are an overly expressive class of functions that can memorize even random data with $100\%$ training accuracy. This raises the question why they do not easily overfit real…

On the Spectral Bias of Neural Networks Open

Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin , et al. · 2018

Computer science Mathematics Physics

Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy. In this work, we present properties of neural networks that complement this aspect of expressivity.…

A Walk with SGD Open

Xing Chen, Devansh Arpit, Christos Tsirigotis, Yoshua Bengio · 2018

Computer science

We present novel empirical observations regarding how stochastic gradient descent (SGD) navigates the loss landscape of over-parametrized deep neural networks (DNNs). These observations expose the qualitatively different roles of learning …

A Walk with SGD. Open

Xing Chen, Devansh Arpit, Christos Tsirigotis, Yoshua Bengio · 2018

Computer science Mathematics Geology

We present novel empirical observations regarding how stochastic gradient descent (SGD) navigates the loss landscape of over-parametrized deep neural networks (DNNs). These observations expose the qualitatively different roles of learning …

Variational Bi-LSTMs Open

Samira Shabanian, Devansh Arpit, Adam Trischler, Yoshua Bengio · 2017

Computer science Engineering Political science

Recurrent neural networks like long short-term memory (LSTM) are important architectures for sequential prediction tasks. LSTMs (and RNNs in general) model sequences along the forward time direction. Bidirectional LSTMs (Bi-LSTMs) on the o…

Three Factors Influencing Minima in SGD Open

Stanisław Jastrzȩbski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer , et al. · 2017

Mathematics

We investigate the dynamical and convergent properties of stochastic gradient descent (SGD) applied to Deep Neural Networks (DNNs). Characterizing the relation between learning rate, batch size and the properties of the final minima, such …

Fraternal Dropout Open

Konrad Żołna, Devansh Arpit, Dendi Suhubdy, Yoshua Bengio · 2017

Psychology Political science Computer science

Recurrent neural networks (RNNs) are important class of architectures among neural networks useful for language modeling and sequential prediction. However, optimizing RNNs is known to be harder compared to feed-forward neural networks. A …

Fraternal Dropout Open

Konrad Żołna, Devansh Arpit, Dendi Suhubdy, Yoshua Bengio · 2017

Computer science Geography

Recurrent neural networks (RNNs) are important class of architectures among neural networks useful for language modeling and sequential prediction. However, optimizing RNNs is known to be harder compared to feed-forward neural networks. A …

Devansh Arpit YOU? Author Swipe