Explanipedia

Diffusion-Driven Inertial Generated Data for Smartphone Location Classification Open

Noa Cohen, Rotem Dror, Itzik Klein · 2025

Despite the crucial role of inertial measurements in motion tracking and navigation systems, the time-consuming and resource-intensive nature of collecting extensive inertial data has hindered the development of robust machine learning mod…

The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs Open

Nitay Calderon, Roi Reichart, Rotem Dror · 2025

The "LLM-as-an-annotator" and "LLM-as-a-judge" paradigms employ Large Language Models (LLMs) as annotators, judges, and evaluators in tasks traditionally performed by humans. LLM annotations are widely used, not only in NLP research but al…

State of What Art? A Call for Multi-Prompt LLM Evaluation Open

Moran Mizrahi, Guy Kaplan, Dan Malkin, Rotem Dror, Dafna Shahaf , et al. · 2024

Computer science Engineering Physics

Recent advances in LLMs have led to an abundance of evaluation benchmarks, which typically rely on a single instruction template per task. We create a large-scale collection of instruction paraphrases and comprehensively analyze the brittl…

State of What Art? A Call for Multi-Prompt LLM Evaluation Open

Moran Mizrahi, Guy Kaplan, Dan Malkin, Rotem Dror, Dafna Shahaf , et al. · 2023

Computer science Engineering Business

Recent advances in large language models (LLMs) have led to the development of various evaluation benchmarks. These benchmarks typically rely on a single instruction template for evaluating all LLMs on a specific task. In this paper, we co…

DMLR: Data-centric Machine Learning Research -- Past, Present and Future Open

Luis Oala, Manil Maskey, Lilith Bat-Leah, Alicia Parrish, Nezihe Merve Gürel , et al. · 2023

Computer science Political science Mathematics

Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets tha…

The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics Open

Christoph Leiter, Juri Opitz, Daniel Deutsch, Yang Gao, Rotem Dror , et al. · 2023

Computer science Geography Economics

With an increasing number of parameters and pre-training data, generative large language models (LLMs) have shown remarkable capabilities to solve tasks with minimal or no task-related examples. Notably, LLMs have been successfully employe…

Human-in-the-loop Schema Induction Open

Tianyi Zhang, Isaac Tham, Zhaoyi Hou, Jiaxuan Ren, Liyang Zhou , et al. · 2023

Computer science Engineering

Schema induction builds a graph representation explaining how events unfold in a scenario. Existing approaches have been based on information retrieval (IR) and information extraction (IE), often with limited human curation. We demonstrate…

The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics Open

Christoph Leiter, Juri Opitz, Daniel Deutsch, Yang Gao, Rotem Dror , et al. · 2023

Computer science Economics Biology

Generative large language models (LLMs) have seen many breakthroughs over the last year. With an increasing number of parameters and pre-training data, they have shown remarkable capabilities to solve tasks with minimal or no task-related …

Human-in-the-loop Schema Induction Open

Tianyi Zhang, Isaac Tham, Zhaoyi Hou, Jiaxuan Ren, Leon Zhou , et al. · 2023

Computer science Art History

Tianyi Zhang, Isaac Tham, Zhaoyi Hou, Jiaxuan Ren, Leon Zhou, Hainiu Xu, Li Zhang, Lara Martin, Rotem Dror, Sha Li, Heng Ji, Martha Palmer, Susan Windisch Brown, Reece Suchocki, Chris Callison-Burch. Proceedings of the 61st Annual Meeting …

Zero-Shot On-the-Fly Event Schema Induction Open

Rotem Dror, Haoyu Wang, Dan Roth · 2023

Computer science Physics

What are the events involved in a pandemic outbreak? What steps should be taken when planning a wedding? The answers to these questions can be found by collecting many documents on the complex event of interest, extracting relevant informa…

On the Limitations of Reference-Free Evaluations of Generated Text Open

Daniel Deutsch, Rotem Dror, Dan Roth · 2022

Computer science Philosophy Economics

There is significant interest in developing evaluation metrics which accurately estimate the quality of generated text without the aid of a human-written reference text, which can be time consuming and expensive to collect or entirely unav…

Zero-Shot On-the-Fly Event Schema Induction Open

Rotem Dror, Haoyu Wang, Dan Roth · 2022

Computer science Physics

What are the events involved in a pandemic outbreak? What steps should be taken when planning a wedding? The answers to these questions can be found by collecting many documents on the complex event of interest, extracting relevant informa…

Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics Open

Daniel Deutsch, Rotem Dror, Dan Roth · 2022

Computer science Mathematics Philosophy

How reliably an automatic summarization evaluation metric replicates human judgments of summary quality is quantified by system-level correlations. We identify two ways in which the definition of the system-level correlation is inconsisten…

On the Limitations of Reference-Free Evaluations of Generated Text Open

Daniel Deutsch, Rotem Dror, Dan Roth · 2022

Computer science Philosophy Economics

There is significant interest in developing evaluation metrics which accurately estimate the quality of generated text without the aid of a human-written reference text, which can be time consuming and expensive to collect or entirely unav…

RESIN-11: Schema-guided Event Prediction for 11 Newsworthy Scenarios Open

Xinya Du, Zixuan Zhang, Sha Li, Pengfei Yu, Hongwei Wang , et al. · 2022

Art Philosophy Computer science

Xinya Du, Zixuan Zhang, Sha Li, Pengfei Yu, Hongwei Wang, Tuan Lai, Xudong Lin, Ziqi Wang, Iris Liu, Ben Zhou, Haoyang Wen, Manling Li, Darryl Hannan, Jie Lei, Hyounghun Kim, Rotem Dror, Haoyu Wang, Michael Regan, Qi Zeng, Qing Lyu, Charle…

Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics Open

Daniel Deutsch, Rotem Dror, Dan Roth · 2022

Computer science Mathematics Philosophy

How reliably an automatic summarization evaluation metric replicates human judgments of summary quality is quantified by system-level correlations. We identify two ways in which the definition of the system-level correlation is inconsisten…

A Statistical Analysis of Summarization Evaluation Metrics using Resampling Methods Open

Daniel Deutsch, Rotem Dror, Dan Roth · 2021

Computer science Mathematics Economics

The quality of a summarization evaluation metric is quantified by calculating the correlation between its scores and human annotations across a large number of summaries. Currently, it is unclear how precise these correlation estimates are…

A Statistical Analysis of Summarization Evaluation Metrics Using Resampling Methods Open

Daniel Deutsch, Rotem Dror, Dan Roth · 2021

Computer science Mathematics Physics

The quality of a summarization evaluation metric is quantified by calculating the correlation between its scores and human annotations across a large number of summaries. Currently, it is unclear how precise these correlation estimates are…

The Structured Weighted Violations MIRA Open

Dor Ringel, Rotem Dror, Roi Reichart · 2020

Computer science

We present the Structured Weighted Violation MIRA (SWVM), a new structured prediction algorithm that is based on an hybridization between MIRA (Crammer and Singer, 2003) and the structured weighted violations perceptron (SWVP) (Dror and Re…

Deep Dominance - How to Properly Compare Deep Neural Models Open

Rotem Dror, Segev Shlomov, Roi Reichart · 2019

Computer science Economics

Comparing between Deep Neural Network (DNN) models based on their performance on unseen data is crucial for the progress of the NLP field. However, these models have a large number of hyper-parameters and, being non-convex, their convergen…

Appendix - Recommended Statistical Significance Tests for NLP Tasks Open

Rotem Dror, Roi Reichart · 2018

Computer science Mathematics Biology

Statistical significance testing plays an important role when drawing conclusions from experimental results in NLP papers. Particularly, it is a valuable tool when one would like to establish the superiority of one algorithm over another. …

The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing Open

Rotem Dror, Gili Baumer, Segev Shlomov, Roi Reichart · 2018

Computer science Mathematics Biology

Statistical significance testing is a standard statistical tool designed to ensure that experimental results are not coincidental. In this opinion/ theoretical paper we discuss the role of statistical significance testing in Natural Langua…

Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets Open

Rotem Dror, Gili Baumer, Marina Bogomolov, Roi Reichart · 2017

Computer science Mathematics Philosophy

With the ever growing amount of textual data from a large variety of languages, domains, and genres, it has become standard to evaluate NLP algorithms on multiple datasets in order to ensure a consistent performance across heterogeneous se…

Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets Open

Rotem Dror, Gili Baumer, Marina Bogomolov, Roi Reichart · 2017

Computer science Mathematics Philosophy

With the ever-growing amounts of textual data from a large variety of languages, domains, and genres, it has become standard to evaluate NLP algorithms on multiple datasets in order to ensure consistent performance across heterogeneous set…

The Structured Weighted Violations Perceptron Algorithm Open

Rotem Dror, Roi Reichart · 2016

Computer science Mathematics Political science

We present the Structured Weighted Violations Perceptron (SWVP) algorithm, a new structured prediction algorithm that generalizes the Collins Structured Perceptron (CSP). Unlike CSP, the update rule of SWVP explicitly exploits the internal…

The Structured Weighted Violations Perceptron Algorithm Open

Rotem Dror, Roi Reichart · 2016

Computer science Mathematics

We present the Structured Weighted Violations Perceptron (SWVP) algorithm, a new perceptron algorithm for structured prediction, that generalizes the Collins Structured Perceptron (CSP, (Collins, 2002)). Unlike CSP, the update rule of SWVP…

Rotem Dror YOU? Author Swipe