Opher Lieber YOU? Author Swipe

Last 10y

Open Invitation to Help Curate This Field & Enhance Impact .ORG

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale Open

Jamba Team, Barak Lenz, Adit Arazi, Amir Bergman, А. И. Маневич , et al. · 2024

We present Jamba-1.5, new instruction-tuned large language models based on our Jamba architecture. Jamba is a hybrid Transformer-Mamba mixture of experts architecture, providing high throughput and low memory usage across context lengths, …

Jamba: A Hybrid Transformer-Mamba Language Model Open

Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin , et al. · 2024

Computer science Engineering

We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model …

MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning Open

Ehud Karpas, Omri Abend, Yonatan Belinkov, Barak Lenz, Opher Lieber , et al. · 2022

Computer science Art

Huge language models (LMs) have ushered in a new era for AI, serving as a gateway to natural-language-based knowledge tasks. Although an essential element of modern AI, LMs are also inherently limited in a number of ways. We discuss these …

Standing on the Shoulders of Giant Frozen Language Models Open

Yoav Levine, Itay Dalmedigos, Ori Ram, Yoel Zeldes, Daniel Jannai , et al. · 2022

Computer science Geology Physics

Huge pretrained language models (LMs) have demonstrated surprisingly good zero-shot capabilities on a wide variety of tasks. This gives rise to the appealing vision of a single, versatile model with a wide range of functionalities across d…

PMI-Masking: Principled masking of correlated spans Open

Yoav Levine, Barak Lenz, Opher Lieber, Omri Abend, Kevin Leyton‐Brown , et al. · 2020

Computer science Psychology Art

Masking tokens uniformly at random constitutes a common flaw in the pretraining of Masked Language Models (MLMs) such as BERT. We show that such uniform masking allows an MLM to minimize its training objective by latching onto shallow loca…

Creating related items for first view…