Explanipedia

OpenAI o1 System Card Open

OpenAI, :, Aaron Jaech, Adam Tauman Kalai, Adam Lerer , et al. · 2024

Computer science

The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our mo…

GPT-4o System Card Open

OpenAI, :, A. M. Hurst, Adam Lerer, Adam P. Goucher , et al. · 2024

Computer science

GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning…

Order Matters in the Presence of Dataset Imbalance for Multilingual Learning Open

Dami Choi, Derrick Xin, Hamid Dadkhahi, Justin Gilmer, Ankush Garg , et al. · 2023

Computer science Medicine Economics

In this paper, we empirically study the optimization dynamics of multi-task learning, particularly focusing on those that govern a collection of tasks with significant data imbalance. We present a simple yet effective method of pre-trainin…

Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum Bayes Risk Decoding for Machine Translation Open

Markus Freitag, Behrooz Ghorbani, Patrick Fernandes · 2023

Computer science Mathematics Chemistry

Recent advances in machine translation (MT) have shown that Minimum Bayes Risk (MBR) decoding can be a powerful alternative to beam search decoding, especially when combined with neural-based utility functions. However, the performance of …

Scaling Laws for Multilingual Neural Machine Translation Open

Patrick Fernandes, Behrooz Ghorbani, Xavier García, Markus Freitag, Orhan Fırat · 2023

Computer science Mathematics Medicine

In this work, we provide a large-scale empirical study of the scaling properties of multilingual neural machine translation models. We examine how increases in the model size affect the model performance and investigate the role of the tra…

Binarized Neural Machine Translation Open

Yichi Zhang, Ankush Garg, Yuan Cao, Łukasz Lew, Behrooz Ghorbani , et al. · 2023

Computer science Mathematics Engineering

The rapid scaling of language models is motivating research using low-bitwidth quantization. In this work, we propose a novel binarization technique for Transformers applied to machine translation (BMT), the first of its kind. We identify …

Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum Bayes Risk Decoding for Machine Translation Open

Markus Freitag, Behrooz Ghorbani, Patrick Fernandes · 2023

Computer science Mathematics Chemistry

Recent advances in machine translation (MT) have shown that Minimum Bayes Risk (MBR) decoding can be a powerful alternative to beam search decoding, especially when combined with neural-based utility functions. However, the performance of …

Do Current Multi-Task Optimization Methods in Deep Learning Even Help? Open

Derrick Xin, Behrooz Ghorbani, Ankush Garg, Orhan Fırat, Justin Gilmer · 2022

Computer science Mathematics Engineering

Recent research has proposed a series of specialized optimization algorithms for deep multi-task models. It is often claimed that these multi-task optimization (MTO) methods yield solutions that are superior to the ones found by simply opt…

Adaptive Gradient Methods at the Edge of Stability Open

Jeremy M. Cohen, Behrooz Ghorbani, Shankar Krishnan, Naman Agarwal, Sourabh Medapati , et al. · 2022

Computer science Mathematics Physics

Very little is known about the training dynamics of adaptive gradient methods like Adam in deep learning. In this paper, we shed light on the behavior of these algorithms in the full-batch and sufficiently large batch settings. Specificall…

Data Scaling Laws in NMT: The Effect of Noise and Architecture Open

Yamini Bansal, Behrooz Ghorbani, Ankush Garg, Biao Zhang, Maxim Krikun , et al. · 2022

Computer science Mathematics Engineering

In this work, we study the effect of varying the architecture and training data quality on the data scaling properties of Neural Machine Translation (NMT). First, we establish that the test loss of encoder-decoder transformer models scales…

Examining Scaling and Transfer of Language Model Architectures for Machine Translation Open

Biao Zhang, Behrooz Ghorbani, Ankur Bapna, Yong Cheng, Xavier García , et al. · 2022

Computer science Chemistry Art

Natural language understanding and generation models follow one of the two dominant architectural paradigms: language models (LMs) that process concatenated sequences in a single stack of layers, and encoder-decoder models (EncDec) that ut…

When do neural networks outperform kernel methods?* Open

Behrooz Ghorbani, Mei Song, Theodor Misiakiewicz, Andrea Montanari · 2021

Computer science Mathematics Philosophy

For a certain scaling of the initialization of stochastic gradient descent (SGD), wide neural networks (NN) have been shown to be well approximated by reproducing kernel Hilbert space (RKHS) methods. Recent empirical work showed that, for …

A Loss Curvature Perspective on Training Instability in Deep Learning Open

Justin Gilmer, Behrooz Ghorbani, Ankush Garg, Sneha Kudugunta, Behnam Neyshabur , et al. · 2021

Computer science Mathematics Sociology

In this work, we study the evolution of the loss Hessian across many classification tasks in order to understand the effect the curvature of the loss has on the training dynamics. Whereas prior work has focused on how different learning ra…

Scaling Laws for Neural Machine Translation Open

Behrooz Ghorbani, Orhan Fırat, Markus Freitag, Ankur Bapna, Maxim Krikun , et al. · 2021

Computer science Mathematics Physics

We present an empirical study of scaling properties of encoder-decoder Transformer models used in neural machine translation (NMT). We show that cross-entropy loss as a function of model size follows a certain scaling law. Specifically (i)…

Linearized two-layers neural networks in high dimension Open

Behrooz Ghorbani, Mei Song, Theodor Misiakiewicz, Andrea Montanari · 2021

Mathematics Physics

We consider the problem of learning an unknown function $f_{\star}$ on the $d$-dimensional sphere with respect to the square loss, given i.i.d. samples $\{(y_i,{\boldsymbol x}_i)\}_{i\le n}$ where ${\boldsymbol x}_i$ is a feature vector un…

Discussion of: “Nonparametric regression using deep neural networks with ReLU activation function” Open

Behrooz Ghorbani, Mei Song, Theodor Misiakiewicz, Andrea Montanari · 2020

Mathematics Computer science Biology

We congratulate Johannes Schmidt-Hieber for his elegant and thought-provoking results.His article uses deep-learning-inspired methods in the context of nonparametric regression.Schmidt-Hieber defines a rich class of composition-based funct…

Limitations of Lazy Training of Two-layers Neural Networks Open

Behrooz Ghorbani, Mei Song, Theodor Misiakiewicz, Andrea Montanari · 2019

Computer science Mathematics Physics

We study the supervised learning problem under either of the following two models: (1) Feature vectors ${\boldsymbol x}_i$ are $d$-dimensional Gaussians and responses are $y_i = f_*({\boldsymbol x}_i)$ for $f_*$ an unknown quadratic functi…

An Investigation into Neural Net Optimization via Hessian Eigenvalue Density Open

Behrooz Ghorbani, Shankar Krishnan, Ying Xiao · 2019

Computer science Mathematics Physics

To understand the dynamics of optimization in deep neural networks, we develop a tool to study the evolution of the entire Hessian spectrum throughout the optimization process. Using this, we study a number of hypotheses concerning smoothn…

Optimal Covariance Estimation for Condition Number Loss in the Spiked Model Open

David L. Donoho, Behrooz Ghorbani · 2018

Mathematics Economics

We study estimation of the covariance matrix under relative condition number loss $κ(Σ^{-1/2} \hatΣ Σ^{-1/2})$, where $κ(Δ)$ is the condition number of matrix $Δ$, and $\hatΣ$ and $Σ$ are the estimated and theoretical covariance matrices. …

An Instability in Variational Inference for Topic Models Open

Behrooz Ghorbani, Hamid Javadi, Andrea Montanari · 2018

Mathematics Computer science Biology

Topic models are Bayesian models that are frequently used to capture the latent structure of certain corpora of documents or images. Each data element in such a corpus (for instance each item in a collection of scientific articles) is rega…

Behrooz Ghorbani YOU? Author Swipe