Oren Pereg — Impact Story

Oren Pereg 2025 Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies 2025 • Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchansky, Oren Pereg, Gaurav Jain, Roy Schwartz, Moshe Wasserblat, David Harel 2024 Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large Language Models 2024 • Jonathan Mamou, Oren Pereg, Daniel Korat, Moshe Berchansky, Nadav Timor, Moshe Wasserblat, Roy Schwartz 2024 Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference 2024 • Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchansky, Oren Pereg, Moshe Wasserblat, Tomer Galanti, Michal Gordon, David Harel 2022 Efficient Few-Shot Learning Without Prompts 2022 • Lewis Tunstall, Nils Reimers, Unso Eun Seo Jo, Luke Bates, Daniel Korat, Moshe Wasserblat, Oren Pereg 2025 Out-of-Vocabulary Sampling Boosts Speculative Decoding 2025 • Nadav Timor, Jonathan Mamou, Oren Pereg, Hongyang Zhang, David Harel Open Author Page

Exploring foci of 2025-06-02 Out-of-Vocabulary Sampling Boosts Speculative Decoding 2025-06-02 • Nadav Timor, Jonathan Mamou, Oren Pereg, Hongyang Zhang, David Harel Speculative decoding relies on fast and accurate drafters. Recent state-of-the-art language models employ larger and larger vocabularies, which significantly slows down drafters. One promising approach to boost the efficiency of speculative decoding is to use drafters with smaller vocabularies. However, existing sampling methods cannot draw out-of-vocabulary tokens, creating a tradeoff between drafters' vocabulary size and acceptance rates. This paper introduces Redistributing Drafter Kernels (RDK), the first out-… Open Article Page

Sampling (Music) Stratified Sampling Sampling (Statistics) Cluster Sampling Inverse Transform Sampling Latin Hypercube Sampling Convenience Sampling Sampling Frame Systematic Sampling Open Article Page

Exploring foci of 2025-01-31 Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies 2025-01-31 • Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchansky, Oren Pereg, Gaurav Jain, Roy Schwartz, Moshe Wasserblat, David Harel Accelerating the inference of large language models (LLMs) is a critical challenge in generative AI. Speculative decoding (SD) methods offer substantial efficiency gains by generating multiple tokens using a single target forward pass. However, existing SD approaches require the drafter and target models to share the same vocabulary, thus limiting the pool of possible drafters, often necessitating the training of a drafter from scratch. We present three new SD methods that remove this shared-vocabulary constraint.… Open Article Page

My Love Story With Yamada-Kun At Lv999 Die Hard With A Vengeance Wheatfield With Crows A Dance With Dragons A Room With A View You Don't Mess With The Zohan List Of People With Bipolar Disorder How To Get Away With Murder (Season 3) Tuesdays With Morrie Open Article Page

Exploring foci of 2024-05-07 Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large Language Models 2024-05-07 • Jonathan Mamou, Oren Pereg, Daniel Korat, Moshe Berchansky, Nadav Timor, Moshe Wasserblat, Roy Schwartz Speculative decoding is commonly used for reducing the inference latency of large language models. Its effectiveness depends highly on the speculation lookahead (SL)-the number of tokens generated by the draft model at each iteration. In this work we show that the common practice of using the same SL for all iterations (static SL) is suboptimal. We introduce DISCO (DynamIc SpeCulation lookahead Optimization), a novel method for dynamically selecting the SL. Our experiments with four datasets show that DISCO reache… Open Article Page

Dynamic Web Page Dynamic Random-Access Memory Dynamic Programming Dynamic Positioning Dynamic Time Warping Synchronous Dynamic Random-Access Memory Dept. Of Speculation Dynamic Html Dynamic Systems Development Method Open Article Page

Exploring foci of 2024-05-23 Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference 2024-05-23 • Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchansky, Oren Pereg, Moshe Wasserblat, Tomer Galanti, Michal Gordon, David Harel This paper introduces distributed speculative inference (DSI), a novel inference algorithm that is provably faster than speculative inference (SI) [leviathan2023, chen2023, miao2024, sun2025, timor2025] and standard autoregressive inference (non-SI). Like other SI algorithms, DSI operates on frozen language models (LMs), requiring no training or architectural modifications, and it preserves the target distribution. Prior studies on SI have demonstrated empirical speedups over non-SI--but rely on sufficiently fast … Open Article Page

T-Distributed Stochastic Neighbor Embedding Fiber Distributed Data Interface Speculative Fiction Statistical Inference Comparison Of Distributed File Systems Speculative Realism Distributed Database Distributed Antenna System Bayesian Inference Open Article Page

Exploring foci of 2022-09-22 Efficient Few-Shot Learning Without Prompts 2022-09-22 • Lewis Tunstall, Nils Reimers, Unso Eun Seo Jo, Luke Bates, Daniel Korat, Moshe Wasserblat, Oren Pereg Recent few-shot methods, such as parameter-efficient fine-tuning (PEFT) and pattern exploiting training (PET), have achieved impressive results in label-scarce settings. However, they are difficult to employ since they are subject to high variability from manually crafted prompts, and typically require billion-parameter language models to achieve high accuracy. To address these shortcomings, we propose SetFit (Sentence Transformer Fine-tuning), an efficient and prompt-free framework for few-shot fine-tuning of Sen… Open Article Page

Quantum Machine Learning Association Rule Learning Decision Tree Learning Efficient Frontier Reinforcement Learning Practice (Learning Method) Ensemble Learning Rote Learning Temporal Difference Learning Open Article Page