Rotem Dror
YOU?
Author Swipe
View article: Diffusion-Driven Inertial Generated Data for Smartphone Location Classification
Diffusion-Driven Inertial Generated Data for Smartphone Location Classification Open
Despite the crucial role of inertial measurements in motion tracking and navigation systems, the time-consuming and resource-intensive nature of collecting extensive inertial data has hindered the development of robust machine learning mod…
View article: The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs
The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs Open
The "LLM-as-an-annotator" and "LLM-as-a-judge" paradigms employ Large Language Models (LLMs) as annotators, judges, and evaluators in tasks traditionally performed by humans. LLM annotations are widely used, not only in NLP research but al…
View article: State of What Art? A Call for Multi-Prompt LLM Evaluation
State of What Art? A Call for Multi-Prompt LLM Evaluation Open
Recent advances in LLMs have led to an abundance of evaluation benchmarks, which typically rely on a single instruction template per task. We create a large-scale collection of instruction paraphrases and comprehensively analyze the brittl…
View article: State of What Art? A Call for Multi-Prompt LLM Evaluation
State of What Art? A Call for Multi-Prompt LLM Evaluation Open
Recent advances in large language models (LLMs) have led to the development of various evaluation benchmarks. These benchmarks typically rely on a single instruction template for evaluating all LLMs on a specific task. In this paper, we co…
View article: DMLR: Data-centric Machine Learning Research -- Past, Present and Future
DMLR: Data-centric Machine Learning Research -- Past, Present and Future Open
Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets tha…
View article: The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics Open
With an increasing number of parameters and pre-training data, generative large language models (LLMs) have shown remarkable capabilities to solve tasks with minimal or no task-related examples. Notably, LLMs have been successfully employe…
View article: Human-in-the-loop Schema Induction
Human-in-the-loop Schema Induction Open
Schema induction builds a graph representation explaining how events unfold in a scenario. Existing approaches have been based on information retrieval (IR) and information extraction (IE), often with limited human curation. We demonstrate…
View article: The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics Open
Generative large language models (LLMs) have seen many breakthroughs over the last year. With an increasing number of parameters and pre-training data, they have shown remarkable capabilities to solve tasks with minimal or no task-related …
View article: Human-in-the-loop Schema Induction
Human-in-the-loop Schema Induction Open
Tianyi Zhang, Isaac Tham, Zhaoyi Hou, Jiaxuan Ren, Leon Zhou, Hainiu Xu, Li Zhang, Lara Martin, Rotem Dror, Sha Li, Heng Ji, Martha Palmer, Susan Windisch Brown, Reece Suchocki, Chris Callison-Burch. Proceedings of the 61st Annual Meeting …
View article: Zero-Shot On-the-Fly Event Schema Induction
Zero-Shot On-the-Fly Event Schema Induction Open
What are the events involved in a pandemic outbreak? What steps should be taken when planning a wedding? The answers to these questions can be found by collecting many documents on the complex event of interest, extracting relevant informa…
View article: On the Limitations of Reference-Free Evaluations of Generated Text
On the Limitations of Reference-Free Evaluations of Generated Text Open
There is significant interest in developing evaluation metrics which accurately estimate the quality of generated text without the aid of a human-written reference text, which can be time consuming and expensive to collect or entirely unav…
View article: Zero-Shot On-the-Fly Event Schema Induction
Zero-Shot On-the-Fly Event Schema Induction Open
What are the events involved in a pandemic outbreak? What steps should be taken when planning a wedding? The answers to these questions can be found by collecting many documents on the complex event of interest, extracting relevant informa…
View article: Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics
Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics Open
How reliably an automatic summarization evaluation metric replicates human judgments of summary quality is quantified by system-level correlations. We identify two ways in which the definition of the system-level correlation is inconsisten…
View article: On the Limitations of Reference-Free Evaluations of Generated Text
On the Limitations of Reference-Free Evaluations of Generated Text Open
There is significant interest in developing evaluation metrics which accurately estimate the quality of generated text without the aid of a human-written reference text, which can be time consuming and expensive to collect or entirely unav…
View article: RESIN-11: Schema-guided Event Prediction for 11 Newsworthy Scenarios
RESIN-11: Schema-guided Event Prediction for 11 Newsworthy Scenarios Open
Xinya Du, Zixuan Zhang, Sha Li, Pengfei Yu, Hongwei Wang, Tuan Lai, Xudong Lin, Ziqi Wang, Iris Liu, Ben Zhou, Haoyang Wen, Manling Li, Darryl Hannan, Jie Lei, Hyounghun Kim, Rotem Dror, Haoyu Wang, Michael Regan, Qi Zeng, Qing Lyu, Charle…
View article: Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics
Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics Open
How reliably an automatic summarization evaluation metric replicates human judgments of summary quality is quantified by system-level correlations. We identify two ways in which the definition of the system-level correlation is inconsisten…
View article: A Statistical Analysis of Summarization Evaluation Metrics using Resampling Methods
A Statistical Analysis of Summarization Evaluation Metrics using Resampling Methods Open
The quality of a summarization evaluation metric is quantified by calculating the correlation between its scores and human annotations across a large number of summaries. Currently, it is unclear how precise these correlation estimates are…
View article: A Statistical Analysis of Summarization Evaluation Metrics Using Resampling Methods
A Statistical Analysis of Summarization Evaluation Metrics Using Resampling Methods Open
The quality of a summarization evaluation metric is quantified by calculating the correlation between its scores and human annotations across a large number of summaries. Currently, it is unclear how precise these correlation estimates are…
View article: The Structured Weighted Violations MIRA
The Structured Weighted Violations MIRA Open
We present the Structured Weighted Violation MIRA (SWVM), a new structured prediction algorithm that is based on an hybridization between MIRA (Crammer and Singer, 2003) and the structured weighted violations perceptron (SWVP) (Dror and Re…
View article: Deep Dominance - How to Properly Compare Deep Neural Models
Deep Dominance - How to Properly Compare Deep Neural Models Open
Comparing between Deep Neural Network (DNN) models based on their performance on unseen data is crucial for the progress of the NLP field. However, these models have a large number of hyper-parameters and, being non-convex, their convergen…
View article: Appendix - Recommended Statistical Significance Tests for NLP Tasks
Appendix - Recommended Statistical Significance Tests for NLP Tasks Open
Statistical significance testing plays an important role when drawing conclusions from experimental results in NLP papers. Particularly, it is a valuable tool when one would like to establish the superiority of one algorithm over another. …
View article: The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing
The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing Open
Statistical significance testing is a standard statistical tool designed to ensure that experimental results are not coincidental. In this opinion/ theoretical paper we discuss the role of statistical significance testing in Natural Langua…
View article: Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets
Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets Open
With the ever growing amount of textual data from a large variety of languages, domains, and genres, it has become standard to evaluate NLP algorithms on multiple datasets in order to ensure a consistent performance across heterogeneous se…
View article: Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets
Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets Open
With the ever-growing amounts of textual data from a large variety of languages, domains, and genres, it has become standard to evaluate NLP algorithms on multiple datasets in order to ensure consistent performance across heterogeneous set…
View article: The Structured Weighted Violations Perceptron Algorithm
The Structured Weighted Violations Perceptron Algorithm Open
We present the Structured Weighted Violations Perceptron (SWVP) algorithm, a new structured prediction algorithm that generalizes the Collins Structured Perceptron (CSP). Unlike CSP, the update rule of SWVP explicitly exploits the internal…
View article: The Structured Weighted Violations Perceptron Algorithm
The Structured Weighted Violations Perceptron Algorithm Open
We present the Structured Weighted Violations Perceptron (SWVP) algorithm, a new perceptron algorithm for structured prediction, that generalizes the Collins Structured Perceptron (CSP, (Collins, 2002)). Unlike CSP, the update rule of SWVP…