Explanipedia

Gemini Robotics: Bringing AI into the Physical World Open

Gemini Team, Saminda Abeyruwan, Joshua Ainslie, Jean-Baptiste Alayrac, Montserrat Gonzalez Arenas , et al. · 2025

Recent advancements in large multimodal models have led to the emergence of remarkable generalist capabilities in digital domains, yet their translation to physical agents such as robots remains a significant challenge. This report introdu…

Understanding the Impact of Value Selection Heuristics in Scheduling Problems Open

Peter Battaglia, Jessica B. Hamrick, Victor Bapst, Álvaro Sánchez‐González, Vinícius Zambaldi , et al. · 2025

It has been observed that value selection heuristics have less impact than other heuristic choices when solving hard combinatorial optimization (CO) problems. It is often thought that this is because more time is spent on unsatisfiable sub…

A Practitioner's Guide to Continual Multimodal Pretraining Open

Karsten Roth, Vishaal Udandarao, Sebastian Dziadzio, Ameya Prabhu, Mehdi Cherti , et al. · 2024

Multimodal foundation models serve numerous applications at the intersection of vision and language. Still, despite being pretrained on extensive data, they become outdated over time. To keep models updated, research into continual pretrai…

Capabilities of Gemini Models in Medicine Open

Khaled Saab, Tao Tu, Wei‐Hung Weng, Ryutaro Tanno, David Stutz , et al. · 2024

Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong genera…

Gemma: Open Models Based on Gemini Research and Technology Open

Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju , et al. · 2024

This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language unde…

Learning skillful medium-range global weather forecasting Open

Rémi Lam, Álvaro Sánchez‐González, Matthew Willson, Peter Wirnsberger, Meire Fortunato , et al. · 2023

Global medium-range weather forecasting is critical to decision-making across many social and economic domains. Traditional numerical weather prediction uses increased compute resources to improve forecast accuracy but does not directly us…

Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model Open

Karsten Roth, Lukas Thede, Almut Sophia Koepke, Oriol Vinyals, Olivier J. Hénaff , et al. · 2023

Training deep networks requires various design decisions regarding for instance their architecture, data augmentation, or optimization. In this work, we find these training variations to result in networks learning unique feature sets from…

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning Open

Michaël Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Çaǧlar Gülçehre, Shangtong Zhang , et al. · 2023

StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time l…

Waffling around for Performance: Visual Classification with Random Words and Broad Concepts Open

Karsten Roth, Jae Myung Kim, A. Sophia Koepke, Oriol Vinyals, Cordelia Schmid , et al. · 2023

The visual classification performance of vision-language models such as CLIP has been shown to benefit from additional semantic knowledge from large language models (LLMs) such as GPT-3. In particular, averaging over LLM-generated class de…

Faster sorting algorithms discovered using deep reinforcement learning Open

Daniel J. Mankowitz, Andrea Michi, Anton Zhernov, Marco Gelmi, M. Selvi , et al. · 2023

Optimizing Memory Mapping Using Deep Reinforcement Learning Open

Pengming Wang, Mikita Sazanovich, Berkin Ilbeyi, Phitchaya Mangpo Phothilimthana, Manish Purohit , et al. · 2023

Resource scheduling and allocation is a critical component of many high impact systems ranging from congestion control to cloud computing. Finding more optimal solutions to these problems often has significant impact on resource and time s…

GraphCast: Learning skillful medium-range global weather forecasting Open

Rémi Lam, Álvaro Sánchez‐González, Matthew Willson, Peter Wirnsberger, Meire Fortunato , et al. · 2022

Global medium-range weather forecasting is critical to decision-making across many social and economic domains. Traditional numerical weather prediction uses increased compute resources to improve forecast accuracy, but cannot directly use…

Competition-level code generation with AlphaCode Open

Yujia Li, David Choi, Jun‐Young Chung, Nate Kushman, Julian Schrittwieser , et al. · 2022

Programming is a powerful and ubiquitous problem-solving tool. Systems that can assist programmers or even generate programs themselves could make programming more productive and accessible. Recent transformer-based neural network models s…

Emergent Abilities of Large Language Models Open

Jason Lee, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph , et al. · 2022

Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of larg…

Integrating Language Guidance into Vision-based Deep Metric Learning Open

Karsten Roth, Oriol Vinyals, Zeynep Akata · 2022

Deep Metric Learning (DML) proposes to learn metric spaces which encode semantic similarities as embedding space distances. These spaces should be transferable to classes beyond those seen during training. Commonly, DML methods task networ…

A Generalist Agent Open

Scott Reed, Konrad Żołna, Emilio Parisotto, Sergio Gómez Colmenarejo, Alexander S. Novikov , et al. · 2022

Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi…

Flamingo: a Visual Language Model for Few-Shot Learning Open

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr , et al. · 2022

Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this …

Training Compute-Optimal Large Language Models Open

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai , et al. · 2022

We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus…

Non-isotropy Regularization for Proxy-based Deep Metric Learning Open

Karsten Roth, Oriol Vinyals, Zeynep Akata · 2022

Deep Metric Learning (DML) aims to learn representation spaces on which semantic relations can simply be expressed through predefined distance metrics. Best performing approaches commonly leverage class proxies as sample stand-ins for bett…

HiP: Hierarchical Perceiver Open

João Carreira, Skanda Koppula, Daniel Zoran, Adrià Recasens, Catalin Ionescu , et al. · 2022

General perception systems such as Perceivers can process arbitrary modalities in any combination and are able to handle up to a few hundred thousand inputs. They achieve this generality by using exclusively global attention operations. Th…

General-purpose, long-context autoregressive modeling with Perceiver AR Open

Curtis Hawthorne, Andrew Jaegle, Cătălina Cangea, Sebastian Borgeaud, Charlie Nash , et al. · 2022

Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively …

MuZero with Self-competition for Rate Control in VP9 Video Compression Open

Amol Mandhane, Anton Zhernov, Maribeth Rauh, Chenjie Gu, Miaosen Wang , et al. · 2022

Video streaming usage has seen a significant rise as entertainment, education, and business increasingly rely on online video. Optimizing video compression has the potential to increase access and quality of content to users, and reduce en…

Unified Scaling Laws for Routed Language Models Open

Aidan Clark, Diego de las Casas, Aurelia Guy, Arthur Mensch, M. Paganini , et al. · 2022

The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that conditionally use only a subset of their parame…

Guest Editorial: Non-Euclidean Machine Learning Open

Stefanos Zafeiriou, Michael M. Bronstein, Taco Cohen, Oriol Vinyals, Le Song , et al. · 2022

Over the past decade, deep learning has had a revolutionary impact on a broad range of fields such as computer vision and image processing, computational photography, medical imaging and speech and language analysis and synthesis etc. Deep…

Improving language models by retrieving from trillions of tokens Open

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford , et al. · 2021

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) ob…

Scaling Language Models: Methods, Analysis & Insights from Training Gopher Open

Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann , et al. · 2021

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based…

Applying and improving <span>AlphaFold</span> at <span>CASP14</span> Open

John Jumper, K Taki, Alexander Pritzel, Tim Green, Michael Figurnov , et al. · 2021

We describe the operation and improvement of AlphaFold, the system that was entered by the team AlphaFold2 to the “human” category in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The AlphaFold system entered in CA…

Efficient Visual Pretraining with Contrastive Detection Open

Olivier J. Hénaff, Skanda Koppula, Jean-Baptiste Alayrac, Aäron van den Oord, Oriol Vinyals , et al. · 2021

Self-supervised pretraining has been shown to yield powerful representations for transfer learning. These performance gains come at a large computational cost however, with state-of-the-art methods requiring an order of magnitude more comp…

Author response for "Applying and improving AlphaFold at CASP14" Open

John Jumper, Richard J. Evans, Alexander Pritzel, Tim Green, Michael Figurnov , et al. · 2021

Perceiver IO: A General Architecture for Structured Inputs & Outputs Open

Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu , et al. · 2021

A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake i…

Oriol Vinyals YOU? Author Swipe