Explanipedia

ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments Open

Gili Lior, Eliya Habba, Shahar Levy, Avi Caciularu, Gabriel Stanovsky · 2025

LLMs are highly sensitive to prompt phrasing, yet standard benchmarks typically report performance using a single prompt, raising concerns about the reliability of such evaluations. In this work, we argue for a stochastic method of moments…

More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG Open

Shahar Levy, Nir Mazor, Michael Hassid · 2025

Retrieval-Augmented Generation (RAG) enhances the accuracy of Large Language Model (LLM) responses by leveraging relevant external documents during generation. Although previous studies noted that retrieving many documents can degrade perf…

SEAM: A Stochastic Benchmark for Multi-Document Tasks Open

Gili Lior, Avi Caciularu, Arie Cattan, Shahar Levy, Ori Shapira , et al. · 2024

Computer science Geography

Various tasks, such as summarization, multi-hop question answering, or coreference resolution, are naturally phrased over collections of real-world documents. Such tasks present a unique set of challenges, revolving around the lack of cohe…

Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation Open

Shahar Levy, Koren Lazar, Gabriel Stanovsky · 2021

Computer science Psychology Physics

Recent works have found evidence of gender bias in models of machine translation and coreference resolution using mostly synthetic diagnostic datasets. While these quantify bias in a controlled experiment, they often do so on a small scale…

Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution\n and Machine Translation Open

Shahar Levy, Koren Lazar, Gabriel Stanovsky · 2021

Computer science Psychology Physics

Recent works have found evidence of gender bias in models of machine\ntranslation and coreference resolution using mostly synthetic diagnostic\ndatasets. While these quantify bias in a controlled experiment, they often do\nso on a small sc…

Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation Open

Shahar Levy, Koren Lazar, Gabriel Stanovsky · 2021

Computer science Psychology Physics

Recent works have found evidence of gender bias in models of machine translation and coreference resolution using mostly synthetic diagnostic datasets. While these quantify bias in a controlled experiment, they often do so on a small scale…

Cell-type specific outcome representation in primary motor cortex Open

Maria Lavzin, Shahar Levy, Hadas Benisty, Uri Dubin, Zohar Brosh , et al. · 2020

Psychology Computer science Medicine

Adaptive movements are critical to animal survival. To guide future actions, the brain monitors different outcomes, including achievement of movement and appetitive goals. The nature of outcome signals and their neuronal and network realiz…

Rigorous Analytical Model for Metasurface Microscopic Design with Interlayer Coupling Open

Shahar Levy, Yaniv Kerzhner, A. J. Epstein · 2019

Computer science Physics Engineering

We present a semianalytical method for designing meta-atoms in multilayered metasurfaces (MSs), relying on a rigorous model developed for multielement metagratings. Notably, this model properly accounts for near-field coupling effects, all…

Rigorous Analytical Model for Metasurface Microscopic Design with\n Interlayer Coupling Open

Shahar Levy, Yaniv Kerzhner, A. J. Epstein · 2019

Computer science Physics Engineering

We present a semianalytical method for designing meta-atoms in multilayered\nmetasurfaces (MSs), relying on a rigorous model developed for multielement\nmetagratings. Notably, this model properly accounts for near-field coupling\neffects, …

Shahar Levy YOU? Author Swipe