Exploring foci of
2024-02-01
A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
2024-02-01 • Alon Jacovi, Yonatan Bitton, Bernd Bohnet, Jonathan Herzig, Or Honovich, Michael S. Tseng, Michael Collins, Roee Aharoni, Mor Geva
Prompting language models to provide step-by-step answers (e.g., "Chain-of-Thought") is the prominent approach for complex reasoning tasks, where more accurate reasoning chains typically improve downstream task performance. Recent literature discusses automatic methods to verify reasoning to evaluate and improve their correctness. However, no fine-grained step-level datasets are available to enable thorough evaluation of such verification methods, hindering progress in this direction. We introduce REVEAL: Reasonin…
The Weakest Tamer Began A Journey To Pick Up Trash
Augustus Ii The Strong
Strong Girl Nam-Soon
Tara Strong Filmography
Jeremy Strong
Strong Zero
Sis (Soft Is Strong)
Brenda Strong
Strong's Concordance
Exploring foci of
2024-10-20
Keep Guessing? When Considering Inference Scaling, Mind the Baselines
2024-10-20 • Gal Yona, Or Honovich, Omer Levy, Roee Aharoni
Scaling inference compute in large language models (LLMs) through repeated sampling consistently increases the coverage (fraction of problems solved) as the number of samples increases. We conjecture that this observed improvement is partially due to the answer distribution of standard evaluation benchmarks, which is skewed towards a relatively small set of common answers. To test this conjecture, we define a baseline that enumerates answers according to their prevalence in the training set. Experiments spanning t…
Wake Me Up When September Ends
When Life Gives You Tangerines
When The Party's Over (Billie Eilish Song)
Keep Yourself Alive
And When The Sky Was Opened
List Of Higurashi When They Cry Episodes
The Company You Keep (Film)
When We Were Young (Adele Song)
When The Camellia Blooms
Exploring foci of
2023-01-01
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
2023-01-01 • Or Honovich, Thomas Scialom, Omer Levy, Timo Schick
Instruction tuning enables pretrained language models to perform new tasks from inference-time natural language descriptions. These approaches rely on vast amounts of human supervision in the form of crowdsourced datasets or user interactions. In this work, we introduce Unnatural Instructions: a large dataset of creative and diverse instructions, collected with virtually no human labor. We collect 64,000 examples by prompting a language model with three seed examples of instructions and eliciting a fourth. This se…
Pythagorean Tuning
The Sixth Extinction: An Unnatural History
Open D Tuning
Instructions Not Included
Tuning Fork
Car Tuning
Drop C Tuning
Musical Tuning
Drop D Tuning
Exploring foci of
2023-01-01
Instruction Induction: From Few Examples to Natural Language Task Descriptions
2023-01-01 • Or Honovich, Uri Shaham, Samuel R. Bowman, Omer Levy
Large language models are able to perform a task by conditioning on a few input-output demonstrations - a paradigm known as in-context learning. We show that language models can explicitly infer an underlying task from a few demonstrations by prompting them to generate a natural language instruction that fits the examples. To explore this ability, we introduce the instruction induction challenge, compile a dataset consisting of 24 tasks, and define a novel evaluation metric based on executing the generated instruc…
Hello Muddah, Hello Fadduh (A Letter From Camp)
So Far From God
Tales From Planet Earth
Plan 9 From Outer Space
Away From Her
The Girl From Everywhere
The View From Saturday
Suicide Notes From Beautiful Girls
From Out Of Nowhere (Jeff Lynne's Elo Album)
Exploring foci of
2023-01-01
DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering
2023-01-01 • Ella Neeman, Roee Aharoni, Or Honovich, Leshem Choshen, Idan Szpektor, Omri Abend
Ella Neeman, Roee Aharoni, Or Honovich, Leshem Choshen, Idan Szpektor, Omri Abend. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023.
Parametric Design
Spontaneous Parametric Down-Conversion
Parametric Statistics
Parametric Equation
Parametric Model
Parametric Polymorphism