Devansh Arpit
YOU?
Author Swipe
View article: Causal Layering via Conditional Entropy
Causal Layering via Conditional Entropy Open
Causal discovery aims to recover information about an unobserved causal graph from the observable data it generates. Layerings are orderings of the variables which place causes before effects. In this paper, we provide ways to recover laye…
View article: Editing Arbitrary Propositions in LLMs without Subject Labels
Editing Arbitrary Propositions in LLMs without Subject Labels Open
Large Language Model (LLM) editing modifies factual information in LLMs. Locate-and-Edit (L\&E) methods accomplish this by finding where relevant information is stored within the neural network, and editing the weights at that location. Th…
View article: BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents Open
The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs). An LAA is able to generate actions with its core LLM and interact with environments, which facilitates the …
View article: Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization Open
Recent months have seen the emergence of a powerful new trend in which large language models (LLMs) are augmented to become autonomous language agents capable of performing objective oriented multi-step tasks on their own, rather than mere…
View article: REX: Rapid Exploration and eXploitation for AI Agents
REX: Rapid Exploration and eXploitation for AI Agents Open
In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-maki…
View article: On the Unlikelihood of D-Separation
On the Unlikelihood of D-Separation Open
Causal discovery aims to recover a causal graph from data generated by it; constraint based methods do so by searching for a d-separating conditioning set of nodes in the graph via an oracle. In this paper, we provide analytic evidence tha…
View article: Salesforce CausalAI Library: A Fast and Scalable Framework for Causal Analysis of Time Series and Tabular Data
Salesforce CausalAI Library: A Fast and Scalable Framework for Causal Analysis of Time Series and Tabular Data Open
We introduce the Salesforce CausalAI Library, an open-source library for causal analysis using observational data. It supports causal discovery and causal inference for tabular and time series data, of discrete, continuous and heterogeneou…
View article: Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization
Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization Open
In Domain Generalization (DG) settings, models trained independently on a given set of training domains have notoriously chaotic performance on distribution shifted test domains, and stochasticity in optimization (e.g. seed) plays a big ro…
View article: Learning Rich Nearest Neighbor Representations from Self-supervised Ensembles
Learning Rich Nearest Neighbor Representations from Self-supervised Ensembles Open
Pretraining convolutional neural networks via self-supervision, and applying them in transfer learning, is an incredibly fast-growing field that is rapidly and iteratively improving performance across practically all image domains. Meanwhi…
View article: Momentum Contrastive Autoencoder: Using Contrastive Learning for Latent Space Distribution Matching in WAE
Momentum Contrastive Autoencoder: Using Contrastive Learning for Latent Space Distribution Matching in WAE Open
Wasserstein autoencoder (WAE) shows that matching two distributions is equivalent to minimizing a simple autoencoder (AE) loss under the constraint that the latent space of this AE matches a pre-specified prior distribution. This latent sp…
View article: Merlion: A Machine Learning Library for Time Series
Merlion: A Machine Learning Library for Time Series Open
We introduce Merlion, an open-source machine learning library for time series. It features a unified interface for many commonly used models and datasets for anomaly detection and forecasting on both univariate and multivariate time series…
View article: Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization
Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization Open
The early phase of training a deep neural network has a dramatic effect on the local curvature of the loss function. For instance, using a small learning rate does not guarantee stable optimization because the optimization trajectory has a…
View article: Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts\n Generalization
Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts\n Generalization Open
The early phase of training a deep neural network has a dramatic effect on\nthe local curvature of the loss function. For instance, using a small learning\nrate does not guarantee stable optimization because the optimization trajectory\nha…
View article: The Break-Even Point on Optimization Trajectories of Deep Neural Networks
The Break-Even Point on Optimization Trajectories of Deep Neural Networks Open
The early phase of training of deep neural networks is critical for their final performance. In this work, we study how the hyperparameters of stochastic gradient descent (SGD) used in the early phase of training affect the rest of the opt…
View article: Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning
Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning Open
We introduce a parameterization method called Neural Bayes which allows computing statistical quantities that are in general difficult to compute and opens avenues for formulating new objectives for unsupervised representation learning. Sp…
View article: Predicting with High Correlation Features
Predicting with High Correlation Features Open
It has been shown that instead of learning actual object features, deep networks tend to exploit non-robust (spurious) discriminative features that are shared between training and test sets. Therefore, while they achieve state of the art p…
View article: Entropy Penalty: Towards Generalization Beyond the IID Assumption
Entropy Penalty: Towards Generalization Beyond the IID Assumption Open
It has been shown that instead of learning actual object features, deep networks tend to exploit non-robust (spurious) discriminative features that are shared between training and test sets. Therefore, while they achieve state of the art p…
View article: How to Initialize your Network? Robust Initialization for WeightNorm & ResNets
How to Initialize your Network? Robust Initialization for WeightNorm & ResNets Open
Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter initialization strategies have not been studied previously for weight normalized networks and, in practice…
View article: How to Initialize your Network? Robust Initialization for WeightNorm & ResNets
How to Initialize your Network? Robust Initialization for WeightNorm & ResNets Open
Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter initialization strategies have not been studied previously for weight normalized networks and, in practice…
View article: The Benefits of Over-parameterization at Initialization in Deep ReLU Networks
The Benefits of Over-parameterization at Initialization in Deep ReLU Networks Open
It has been noted in existing literature that over-parameterization in ReLU networks generally improves performance. While there could be several factors involved behind this, we prove some desirable theoretical properties at initializatio…
View article: h-detach: Modifying the LSTM Gradient Towards Better Optimization
h-detach: Modifying the LSTM Gradient Towards Better Optimization Open
Recurrent neural networks are known for their notorious exploding and vanishing gradient problem (EVGP). This problem becomes more evident in tasks where the information needed to correctly solve them exist over long time scales, because E…
View article: On the Spectral Bias of Deep Neural Networks
On the Spectral Bias of Deep Neural Networks Open
It is well known that over-parametrized deep neural networks (DNNs) are an
overly expressive class of functions that can memorize even random data with
$100\%$ training accuracy. This raises the question why they do not easily
overfit real…
View article: On the Spectral Bias of Neural Networks
On the Spectral Bias of Neural Networks Open
Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy. In this work, we present properties of neural networks that complement this aspect of expressivity.…
View article: A Walk with SGD
A Walk with SGD Open
We present novel empirical observations regarding how stochastic gradient descent (SGD) navigates the loss landscape of over-parametrized deep neural networks (DNNs). These observations expose the qualitatively different roles of learning …
View article: A Walk with SGD.
A Walk with SGD. Open
We present novel empirical observations regarding how stochastic gradient descent (SGD) navigates the loss landscape of over-parametrized deep neural networks (DNNs). These observations expose the qualitatively different roles of learning …
View article: Variational Bi-LSTMs
Variational Bi-LSTMs Open
Recurrent neural networks like long short-term memory (LSTM) are important architectures for sequential prediction tasks. LSTMs (and RNNs in general) model sequences along the forward time direction. Bidirectional LSTMs (Bi-LSTMs) on the o…
View article: Three Factors Influencing Minima in SGD
Three Factors Influencing Minima in SGD Open
We investigate the dynamical and convergent properties of stochastic gradient descent (SGD) applied to Deep Neural Networks (DNNs). Characterizing the relation between learning rate, batch size and the properties of the final minima, such …
View article: Fraternal Dropout
Fraternal Dropout Open
Recurrent neural networks (RNNs) are important class of architectures among neural networks useful for language modeling and sequential prediction. However, optimizing RNNs is known to be harder compared to feed-forward neural networks. A …
View article: Fraternal Dropout
Fraternal Dropout Open
Recurrent neural networks (RNNs) are important class of architectures among neural networks useful for language modeling and sequential prediction. However, optimizing RNNs is known to be harder compared to feed-forward neural networks. A …