Explanipedia

Discrete Flow Matching Open

Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen , et al. · 2024

Computer science Mathematics

Despite Flow Matching and diffusion models having emerged as powerful generative paradigms for continuous variables such as images and videos, their application to high-dimensional discrete data, such as language, is still limited. In this…

The Larger the Better? Improved LLM Code-Generation via Budget Reallocation Open

Michael Hassid, Tal Remez, Jonas Gehring, Roy Schwartz, Yossi Adi · 2024

Computer science Environmental science Business

It is a common belief that large language models (LLMs) are better than smaller-sized ones. However, larger models also require significantly more time and compute during inference. This begs the question: what happens when both models ope…

Masked Audio Generation using a Single Non-Autoregressive Transformer Open

Alon Ziv, Itai Gat, Gaël Le Lan, Tal Remez, Felix Kreuk , et al. · 2024

Computer science Mathematics Engineering

We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens. Unlike prior work, MAGNeT is comprised of a single-stage, non-autoregressive transformer. During training, we pr…

Code Llama: Open Foundation Models for Code Open

Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat , et al. · 2023

Computer science Geography Engineering

We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following abil…

Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis Open

Tu Anh Nguyen, Wei-Ning Hsu, Antony D’Avirro, Bowen Shi, Itai Gat , et al. · 2023

Computer science Geography

International audience

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis Open

Tu Anh Nguyen, Wei-Ning Hsu, Antony D’Avirro, Bowen Shi, Itai Gat , et al. · 2023

Computer science Geography Philosophy

Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech …

Simple and Controllable Music Generation Open

Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant , et al. · 2023

Computer science Physics Art

We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised …

Textually Pretrained Speech Language Models Open

Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau , et al. · 2023

Computer science Geography Mathematics

Speech language models (SpeechLMs) process and generate acoustic data only, without textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models. We show …

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement Open

Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi · 2022

Computer science Philosophy Physics

Prior works on improving speech quality with visual input typically study each type of auditory distortion separately (e.g., separation, inpainting, video-to-speech) and present tailored algorithms. This paper proposes to unify these subje…

AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation Open

Efthymios Tzinis, Scott Wisdom, Tal Remez, John R. Hershey · 2022

Computer science Geology

We introduce AudioScopeV2, a state-of-the-art universal audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos. We identify …

More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech Open

Michael Hassid, Michelle Tadmor Ramanovich, Brendan Shillingford, Miaosen Wang, Jia Ye , et al. · 2021

Computer science Chemistry History

In this paper we present VDTTS, a Visually-Driven Text-to-Speech model. Motivated by dubbing, VDTTS takes advantage of video frames as an additional input alongside text, and generates speech that matches the video signal. We demonstrate h…

Translatotron 2: Robust direct speech-to-speech translation. Open

Jia Ye, Michelle Tadmor Ramanovich, Tal Remez, Roi Pomerantz · 2021

Computer science Physics Chemistry

We present Translatotron 2, a neural direct speech-to-speech translation model that can be trained end-to-end. Translatotron 2 consists of a speech encoder, a phoneme decoder, a mel-spectrogram synthesizer, and an attention module that con…

Translatotron 2: High-quality direct speech-to-speech translation with voice preservation Open

Jia Ye, Michelle Tadmor Ramanovich, Tal Remez, Roi Pomerantz · 2021

Computer science Chemistry Philosophy

We present Translatotron 2, a neural direct speech-to-speech translation model that can be trained end-to-end. Translatotron 2 consists of a speech encoder, a linguistic decoder, an acoustic synthesizer, and a single attention module that …

Improving On-Screen Sound Separation for Open-Domain Videos with Audio-Visual Self-Attention Open

Efthymios Tzinis, Scott Wisdom, Tal Remez, John R. Hershey · 2021

Computer science Chemistry Philosophy

We introduce a state-of-the-art audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos. We identify limitations of previous …

Improving On-Screen Sound Separation for Open-Domain Videos with\n Audio-Visual Self-Attention Open

Efthymios Tzinis, Scott Wisdom, Tal Remez, John R. Hershey · 2021

Computer science Physics

We introduce a state-of-the-art audio-visual on-screen sound separation\nsystem which is capable of learning to separate sounds and associate them with\non-screen objects by looking at in-the-wild videos. We identify limitations of\nprevio…

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds Open

Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez , et al. · 2021

Computer science Physics

Recent progress in deep learning has enabled many advances in sound separation and visual scene understanding. However, extracting sound sources which are apparent in natural videos remains an open problem. In this work, we present AudioSc…

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds Open

Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez , et al. · 2020

Computer science

Recent progress in deep learning has enabled many advances in sound separation and visual scene understanding. However, extracting sound sources which are apparent in natural videos remains an open problem. In this work, we present AudioSc…

Shape Correspondence with Isometric and Non-Isometric Deformations Open

Roberto M. Dyke, Chris Stride, Yu‐Kun Lai, Paul L. Rosin, Mathieu Aubry , et al. · 2019

Mathematics Medicine

The registration of surfaces with non-rigid deformation, especially non-isometric deformations, is a challenging problem. When applying such techniques to real scans, the problem is compounded by topological and geometric inconsistencies b…

Deep Functional Maps: Structured Prediction for Dense Shape Correspondence Open

Or Litany, Tal Remez, Emanuele Rodolà, Alex Bronstein, Michael M. Bronstein · 2017

Computer science Mathematics Philosophy

We introduce a new framework for learning dense correspondence between deformable 3D shapes. Existing learning based approaches model shape correspondence as a labelling problem, where each point of a query shape receives a label identifyi…

Efficient Deformable Shape Correspondence via Kernel Matching Open

Zorah Lähner, Matthias Vestner, Amit Boyarski, Or Litany, Ron Slossberg , et al. · 2017

Mathematics Computer science Engineering

We present a method to match three dimensional shapes under non-isometric deformations, topology changes and partiality. We formulate the problem as matching between a set of pair-wise and point-wise descriptors, imposing a continuity prio…

Efficient Deformable Shape Correspondence via Kernel Matching Open

Zorah Lähner, Matthias Vestner, Amit Boyarski, Or Litany, Ron Slossberg , et al. · 2017

Mathematics Computer science Engineering

We present a method to match three dimensional shapes under non-isometric deformations, topology changes and partiality. We formulate the problem as matching between a set of pair-wise and point-wise descriptors, imposing a continuity prio…

Deep Class Aware Denoising Open

Tal Remez, Or Litany, Raja Giryes, Alexander M. Bronstein · 2017

Computer science Philosophy

The increasing demand for high image quality in mobile devices brings forth the need for better computational enhancement techniques, and image denoising in particular. At the same time, the images captured by these devices can be categori…

Deep Convolutional Denoising of Low-Light Images Open

Tal Remez, Or Litany, Raja Giryes, Alexander M. Bronstein · 2017

Computer science Mathematics

Poisson distribution is used for modeling noise in photon-limited imaging. While canonical examples include relatively exotic types of sensing like spectral imaging or astronomy, the problem is relevant to regular photography now more than…

Cloud Dictionary: Sparse Coding and Modeling for Point Clouds Open

Or Litany, Tal Remez, Alexander M. Bronstein · 2016

Computer science Geography Mathematics

With the development of range sensors such as LIDAR and time-of-flight cameras, 3D point cloud scans have become ubiquitous in computer vision applications, the most prominent ones being gesture recognition and autonomous driving. Parsimon…

FPGA system for real-time computational extended depth of field imaging using phase aperture coding Open

Tal Remez, Or Litany, Shachar Yoseff, Harel Haim, Alexander M. Bronstein · 2016

Computer science Physics

We present a proof-of-concept end-to-end system for computational extended depth of field (EDOF) imaging. The acquisition is performed through a phase-coded aperture implemented by placing a thin wavelength-dependent optical mask inside th…

Image reconstruction from dense binary pixels Open

Or Litany, Tal Remez, Alexander M. Bronstein · 2015

Computer science Mathematics Physics

Recently, the dense binary pixel Gigavision camera had been introduced, emulating a digital version of the photographic film. While seems to be a promising solution for HDR imaging, its output is not directly usable and requires an image r…

Spatially Coherent Random Forests Open

Tal Remez, Shai Avidan · 2015

Computer science Mathematics Biology

Spatially Coherent Random Forest (SCRF) extends Random Forest to create spatially coherent labeling. Each split function in SCRF is evaluated based on a traditional information gain measure that is regularized by a spatial coherency term. …

Tal Remez YOU? Author Swipe