Explanipedia

Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models Open

Lovish Madaan, David Esiobu, Pontus Stenetorp, Barbara Plank, Dieuwke Hupkes · 2024

In the recent past, a popular way of evaluating natural language understanding (NLU), was to consider a model's ability to perform natural language inference (NLI) tasks. In this paper, we investigate if NLI tasks, that are rarely used for…

Evaluation data contamination in LLMs: how do we measure it and (when) does it matter? Open

Aaditya K. Singh, Muhammed Yusuf Kocyigit, Andrew M. Poulton, David Esiobu, María Lomelí , et al. · 2024

Environmental science Computer science Biology

Hampering the interpretation of benchmark scores, evaluation data contamination has become a growing concern in the evaluation of LLMs, and an active area of research studies its effects. While evaluation data contamination is easily under…

ROBBIE: Robust Bias Evaluation of Large Generative Language Models Open

David Esiobu, Xiaoqing Tan, Saghar Hosseini, Megan Ung, Yuchen Zhang , et al. · 2023

Computer science Psychology Business

As generative large language models (LLMs) grow more performant and prevalent, we must develop comprehensive enough tools to measure and improve their fairness. Different prompt-based datasets can be used to measure social bias across mult…

Llama 2: Open Foundation and Fine-Tuned Chat Models Open

Hugo Touvron, Louis Martin, Kevin H. Stone, Peter J. Albert, Amjad Almahairi , et al. · 2023

Computer science Psychology Engineering

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dial…

A Theory on Adam Instability in Large-Scale Machine Learning Open

Igor Molybog, Peter J. Albert, Moya Chen, Zachary DeVito, David Esiobu , et al. · 2023

Computer science Mathematics Geography

We present a theory for the previously unexplained divergent behavior noticed in the training of large language models. We argue that the phenomenon is an artifact of the dominant optimization algorithm used for training, called Adam. We o…

ROBBIE: Robust Bias Evaluation of Large Generative Language Models Open

David Esiobu, Xiaoqing Tan, Saghar Hosseini, Megan Ung, Yuchen Zhang , et al. · 2023

Computer science Sociology Psychology

David Esiobu, Xiaoqing Tan, Saghar Hosseini, Megan Ung, Yuchen Zhang, Jude Fernandes, Jane Dwivedi-Yu, Eleonora Presani, Adina Williams, Eric Smith. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 20…

David Esiobu YOU? Author Swipe