Alyssa Lees
YOU?
Author Swipe
View article: A New Generation of Perspective API: Efficient Multilingual Character-level Transformers
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers Open
On the world wide web, toxic content detectors are a crucial line of defense against potentially hateful and offensive messages. As such, building highly effective classifiers that enable a safer internet is an important research area. Mor…
View article: A New Generation of Perspective API: Efficient Multilingual Character-level Transformers
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers Open
On the world wide web, toxic content detectors are a crucial line of defense against potentially hateful and offensive messages. As such, building highly effective classifiers that enable a safer internet is an important research area. Mor…
View article: Lost in Distillation: A Case Study in Toxicity Modeling
Lost in Distillation: A Case Study in Toxicity Modeling Open
In an era of increasingly large pre-trained language models, knowledge distillation is a powerful tool for transferring information from a large model to a smaller one. In particular, distillation is of tremendous benefit when it comes to …
View article: SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification
SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification Open
The paper describes the SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification (MAMI),which explores the detection of misogynous memes on the web by taking advantage of available texts and images. The task has been organised in …
View article: Sentence/Table Pair Data from Wikipedia for Pre-training with Distant-Supervision
Sentence/Table Pair Data from Wikipedia for Pre-training with Distant-Supervision Open
This is the dataset used for pre-training in "ReasonBERT: Pre-trained to Reason with Distant Supervision", EMNLP'21. There are two files: sentence_pairs_for_pretrain_no_tokenization.tar.gz -> Contain only sentences as evidence, Text-only t…
View article: Sentence/Table Pair Data from Wikipedia for Pre-training with Distant-Supervision
Sentence/Table Pair Data from Wikipedia for Pre-training with Distant-Supervision Open
This is the dataset used for pre-training in "ReasonBERT: Pre-trained to Reason with Distant Supervision", EMNLP'21. There are two files: sentence_pairs_for_pretrain_no_tokenization.tar.gz -> Contain only sentences as evidence, Text-only t…
View article: ReasonBERT: Pre-trained to Reason with Distant Supervision
ReasonBERT: Pre-trained to Reason with Distant Supervision Open
We present ReasonBert, a pre-training method that augments language models with the ability to reason over long-range relations and multiple, possibly hybrid contexts. Unlike existing pre-training methods that only harvest learning signals…
View article: TURL: Table Understanding through Representation Learning
TURL: Table Understanding through Representation Learning Open
Relational tables on the Web store a vast amount of knowledge. Owing to the wealth of such tables, there has been tremendous progress on a variety of tasks in the area of table understanding. However, existing work generally relies on heav…
View article: TURL: Table Understanding through Representation Learning
TURL: Table Understanding through Representation Learning Open
Relational tables on the Web store a vast amount of knowledge. Owing to the wealth of such tables, there has been tremendous progress on a variety of tasks in the area of table understanding. However, existing work generally relies on heav…
View article: Embedding Semantic Taxonomies
Embedding Semantic Taxonomies Open
A common step in developing an understanding of a vertical domain, e.g. shopping, dining, movies, medicine, etc., is curating a taxonomy of categories specific to the domain. These human created artifacts have been the subject of research …
View article: Jigsaw @ AMI and HaSpeeDe2: Fine-Tuning a Pre-Trained Comment-Domain BERT Model
Jigsaw @ AMI and HaSpeeDe2: Fine-Tuning a Pre-Trained Comment-Domain BERT Model Open
The Google Jigsaw team produced submissions for two of the EVALITA 2020 (Basile et al. 2020) shared tasks, based in part on the technology that powers the publicly available PerspectiveAPI comment evaluation service. We present a basic des…
View article: What is Fair? Exploring Pareto-Efficiency for Fairness Constraint Classifiers
What is Fair? Exploring Pareto-Efficiency for Fairness Constraint Classifiers Open
The potential for learned models to amplify existing societal biases has been broadly recognized. Fairness-aware classifier constraints, which apply equality metrics of performance across subgroups defined on sensitive attributes such as r…
View article: What is Fair? Exploring Pareto-Efficiency for Fairness Constrained Classifiers
What is Fair? Exploring Pareto-Efficiency for Fairness Constrained Classifiers Open
The potential for learned models to amplify existing societal biases has been broadly recognized. Fairness-aware classifier constraints, which apply equality metrics of performance across subgroups defined on sensitive attributes such as r…
View article: Fairness Sample Complexity and the Case for Human Intervention
Fairness Sample Complexity and the Case for Human Intervention Open
With the aim of building machine learning systems that incorporate standards of fairness and accountability, we explore explicit subgroup sample complexity bounds. The work is motivated by the observation that classifier predictions for re…