Oriol Vinyals
YOU?
Author Swipe
View article: Gemini Robotics: Bringing AI into the Physical World
Gemini Robotics: Bringing AI into the Physical World Open
Recent advancements in large multimodal models have led to the emergence of remarkable generalist capabilities in digital domains, yet their translation to physical agents such as robots remains a significant challenge. This report introdu…
View article: Understanding the Impact of Value Selection Heuristics in Scheduling Problems
Understanding the Impact of Value Selection Heuristics in Scheduling Problems Open
It has been observed that value selection heuristics have less impact than other heuristic choices when solving hard combinatorial optimization (CO) problems. It is often thought that this is because more time is spent on unsatisfiable sub…
View article: A Practitioner's Guide to Continual Multimodal Pretraining
A Practitioner's Guide to Continual Multimodal Pretraining Open
Multimodal foundation models serve numerous applications at the intersection of vision and language. Still, despite being pretrained on extensive data, they become outdated over time. To keep models updated, research into continual pretrai…
View article: Capabilities of Gemini Models in Medicine
Capabilities of Gemini Models in Medicine Open
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong genera…
View article: Gemma: Open Models Based on Gemini Research and Technology
Gemma: Open Models Based on Gemini Research and Technology Open
This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language unde…
View article: Learning skillful medium-range global weather forecasting
Learning skillful medium-range global weather forecasting Open
Global medium-range weather forecasting is critical to decision-making across many social and economic domains. Traditional numerical weather prediction uses increased compute resources to improve forecast accuracy but does not directly us…
View article: Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model Open
Training deep networks requires various design decisions regarding for instance their architecture, data augmentation, or optimization. In this work, we find these training variations to result in networks learning unique feature sets from…
View article: AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning Open
StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time l…
View article: Waffling around for Performance: Visual Classification with Random Words and Broad Concepts
Waffling around for Performance: Visual Classification with Random Words and Broad Concepts Open
The visual classification performance of vision-language models such as CLIP has been shown to benefit from additional semantic knowledge from large language models (LLMs) such as GPT-3. In particular, averaging over LLM-generated class de…
View article: Faster sorting algorithms discovered using deep reinforcement learning
Faster sorting algorithms discovered using deep reinforcement learning Open
View article: Optimizing Memory Mapping Using Deep Reinforcement Learning
Optimizing Memory Mapping Using Deep Reinforcement Learning Open
Resource scheduling and allocation is a critical component of many high impact systems ranging from congestion control to cloud computing. Finding more optimal solutions to these problems often has significant impact on resource and time s…
View article: GraphCast: Learning skillful medium-range global weather forecasting
GraphCast: Learning skillful medium-range global weather forecasting Open
Global medium-range weather forecasting is critical to decision-making across many social and economic domains. Traditional numerical weather prediction uses increased compute resources to improve forecast accuracy, but cannot directly use…
View article: Competition-level code generation with AlphaCode
Competition-level code generation with AlphaCode Open
Programming is a powerful and ubiquitous problem-solving tool. Systems that can assist programmers or even generate programs themselves could make programming more productive and accessible. Recent transformer-based neural network models s…
View article: Emergent Abilities of Large Language Models
Emergent Abilities of Large Language Models Open
Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of larg…
View article: Integrating Language Guidance into Vision-based Deep Metric Learning
Integrating Language Guidance into Vision-based Deep Metric Learning Open
Deep Metric Learning (DML) proposes to learn metric spaces which encode semantic similarities as embedding space distances. These spaces should be transferable to classes beyond those seen during training. Commonly, DML methods task networ…
View article: A Generalist Agent
A Generalist Agent Open
Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi…
View article: Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning Open
Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this …
View article: Training Compute-Optimal Large Language Models
Training Compute-Optimal Large Language Models Open
We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus…
View article: Non-isotropy Regularization for Proxy-based Deep Metric Learning
Non-isotropy Regularization for Proxy-based Deep Metric Learning Open
Deep Metric Learning (DML) aims to learn representation spaces on which semantic relations can simply be expressed through predefined distance metrics. Best performing approaches commonly leverage class proxies as sample stand-ins for bett…
View article: HiP: Hierarchical Perceiver
HiP: Hierarchical Perceiver Open
General perception systems such as Perceivers can process arbitrary modalities in any combination and are able to handle up to a few hundred thousand inputs. They achieve this generality by using exclusively global attention operations. Th…
View article: General-purpose, long-context autoregressive modeling with Perceiver AR
General-purpose, long-context autoregressive modeling with Perceiver AR Open
Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively …
View article: MuZero with Self-competition for Rate Control in VP9 Video Compression
MuZero with Self-competition for Rate Control in VP9 Video Compression Open
Video streaming usage has seen a significant rise as entertainment, education, and business increasingly rely on online video. Optimizing video compression has the potential to increase access and quality of content to users, and reduce en…
View article: Unified Scaling Laws for Routed Language Models
Unified Scaling Laws for Routed Language Models Open
The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that conditionally use only a subset of their parame…
View article: Guest Editorial: Non-Euclidean Machine Learning
Guest Editorial: Non-Euclidean Machine Learning Open
Over the past decade, deep learning has had a revolutionary impact on a broad range of fields such as computer vision and image processing, computational photography, medical imaging and speech and language analysis and synthesis etc. Deep…
View article: Improving language models by retrieving from trillions of tokens
Improving language models by retrieving from trillions of tokens Open
We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) ob…
View article: Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Scaling Language Models: Methods, Analysis & Insights from Training Gopher Open
Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based…
View article: Applying and improving <scp>AlphaFold</scp> at <scp>CASP14</scp>
Applying and improving <span>AlphaFold</span> at <span>CASP14</span> Open
We describe the operation and improvement of AlphaFold, the system that was entered by the team AlphaFold2 to the “human” category in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The AlphaFold system entered in CA…
View article: Efficient Visual Pretraining with Contrastive Detection
Efficient Visual Pretraining with Contrastive Detection Open
Self-supervised pretraining has been shown to yield powerful representations for transfer learning. These performance gains come at a large computational cost however, with state-of-the-art methods requiring an order of magnitude more comp…
View article: Author response for "Applying and improving AlphaFold at CASP14"
Author response for "Applying and improving AlphaFold at CASP14" Open
View article: Perceiver IO: A General Architecture for Structured Inputs & Outputs
Perceiver IO: A General Architecture for Structured Inputs & Outputs Open
A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake i…