Bertie Vidgen
YOU?
Author Swipe
View article: Classification is a RAG problem: A case study on hate speech detection
Classification is a RAG problem: A case study on hate speech detection Open
Robust content moderation requires classification systems that can quickly adapt to evolving policies without costly retraining. We present classification using Retrieval-Augmented Generation (RAG), which shifts traditional classification …
View article: Why human–AI relationships need socioaffective alignment
Why human–AI relationships need socioaffective alignment Open
Humans strive to design safe AI systems that align with our goals and remain under our control. However, as AI capabilities advance, we face a new challenge: the emergence of deeper, more persistent relationships between humans and AI syst…
View article: SafetyPrompts: A Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
SafetyPrompts: A Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety Open
The last two years have seen a rapid growth in concerns around the safety of large language models (LLMs). Researchers and practitioners have met these concerns by creating an abundance of datasets for evaluating and improving LLM safety. …
View article: Why human-AI relationships need socioaffective alignment
Why human-AI relationships need socioaffective alignment Open
Humans strive to design safe AI systems that align with our goals and remain under our control. However, as AI capabilities advance, we face a new challenge: the emergence of deeper, more persistent relationships between humans and AI syst…
View article: MSTS: A Multimodal Safety Test Suite for Vision-Language Models
MSTS: A Multimodal Safety Test Suite for Vision-Language Models Open
Vision-language models (VLMs), which process image and text inputs, are increasingly integrated into chat assistants and other consumer AI applications. Without proper safeguards, however, VLMs may give harmful advice (e.g. how to self-har…
View article: LMUnit: Fine-grained Evaluation with Natural Language Unit Tests
LMUnit: Fine-grained Evaluation with Natural Language Unit Tests Open
As language models become integral to critical workflows, assessing their behavior remains a fundamental challenge -- human evaluation is costly and noisy, while automated metrics provide only coarse, difficult-to-interpret signals. We int…
View article: The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources
The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources Open
Foundation model development attracts a rapidly expanding body of contributors, scientists, and applications. To help shape responsible development practices, we introduce the Foundation Model Development Cheatsheet: a growing collection o…
View article: Risks and Opportunities of Open-Source Generative AI
Risks and Opportunities of Open-Source Generative AI Open
Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks…
View article: WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting Open
We introduce WorkBench: a benchmark dataset for evaluating agents' ability to execute tasks in a workplace setting. WorkBench contains a sandbox environment with five databases, 26 tools, and 690 tasks. These tasks represent common busines…
View article: Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Near to Mid-term Risks and Opportunities of Open-Source Generative AI Open
In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about poten…
View article: The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models Open
Human feedback is central to the alignment of Large Language Models (LLMs). However, open questions remain about methods (how), domains (where), people (who) and objectives (to what end) of feedback processes. To navigate these questions, …
View article: Introducing v0.5 of the AI Safety Benchmark from MLCommons
Introducing v0.5 of the AI Safety Benchmark from MLCommons Open
This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models.…
View article: SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety Open
The last two years have seen a rapid growth in concerns around the safety of large language models (LLMs). Researchers and practitioners have met these concerns by creating an abundance of datasets for evaluating and improving LLM safety. …
View article: XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models Open
Without proper safeguards, large language models will readily follow malicious instructions and generate toxic content. This risk motivates safety efforts such as red-teaming and large-scale feedback learning, which aim to make models both…
View article: FinanceBench: A New Benchmark for Financial Question Answering
FinanceBench: A New Benchmark for Financial Question Answering Open
FinanceBench is a first-of-its-kind test suite for evaluating the performance of LLMs on open book financial question answering (QA). It comprises 10,231 questions about publicly traded companies, with corresponding answers and evidence st…
View article: SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models Open
The past year has seen rapid acceleration in the development of large language models (LLMs). However, without proper steering and safeguards, LLMs will readily follow malicious instructions, provide unsafe advice, and generate toxic conte…
View article: The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values Open
Human feedback is increasingly used to steer the behaviours of Large Language Models (LLMs). However, it is unclear how to collect and incorporate feedback in a way that is efficient, effective and unbiased, especially for highly subjectiv…
View article: The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models Open
In this paper, we address the concept of "alignment" in large language models (LLMs) through the lens of post-structuralist socio-political theory, specifically examining its parallels to empty signifiers. To establish a shared vocabulary …
View article: XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models Open
Without proper safeguards, large language models will readily follow malicious instructions and generate toxic content. This risk motivates safety efforts such as red-teaming and large-scale feedback learning, which aim to make models both…
View article: Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback Open
Large language models (LLMs) are used to generate content for a wide range of tasks, and are set to reach a growing audience in coming years due to integration in product interfaces like ChatGPT or search engines like Bing. This intensifie…
View article: SemEval-2023 Task 10: Explainable Detection of Online Sexism
SemEval-2023 Task 10: Explainable Detection of Online Sexism Open
Online sexism is a widespread and harmful phenomenon. Automated tools can assist the detection of sexism at scale. Binary detection, however, disregards the diversity of sexist content, and fails to provide clear explanations for why somet…
View article: The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values Open
Human feedback is increasingly used to steer the behaviours of Large Language Models (LLMs). However, it is unclear how to collect and incorporate feedback in a way that is efficient, effective and unbiased, especially for highly subjectiv…
View article: Improving the Detection of Multilingual Online Attacks with Rich Social Media Data from Singapore
Improving the Detection of Multilingual Online Attacks with Rich Social Media Data from Singapore Open
Janosch Haber, Bertie Vidgen, Matthew Chapman, Vibhor Agarwal, Roy Ka-Wei Lee, Yong Keong Yap, Paul Röttger. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023.
View article: SemEval-2023 Task 10: Explainable Detection of Online Sexism
SemEval-2023 Task 10: Explainable Detection of Online Sexism Open
Online sexism is a widespread and harmful phenomenon. Automated tools can assist the detection of sexism at scale. Binary detection, however, disregards the diversity of sexist content, and fails to provide clear explanations for why somet…
View article: How can we combat online misinformation? A systematic overview of current interventions and their efficacy
How can we combat online misinformation? A systematic overview of current interventions and their efficacy Open
The spread of misinformation is a pressing global problem that has elicited a range of responses from researchers, policymakers, civil society and industry. Over the past decade, these stakeholders have developed many interventions to tack…
View article: Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning
Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning Open
Annotating abusive language is expensive, logistically complex and creates a risk of psychological harm. However, most machine learning research has prioritized maximizing effectiveness (i.e., F1 or accuracy score) rather than data efficie…
View article: Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models Open
Hate speech detection models are typically evaluated on held-out test sets. However, this risks painting an incomplete and potentially misleading picture of model performance because of increasingly well-documented systematic gaps and bias…
View article: Radical Right On Twitter (ROT)
Radical Right On Twitter (ROT) Open
We collected the Radical Right On Twitter dataset (ROT7) to advance research into radical right activity online. The resource addresses a lack of data in this field, particularly data that relates to the activity of radical right actors. T…