Benoît Crabbé
YOU?
Author Swipe
View article: RETROcode: Leveraging a Code Database for Improved Natural Language to Code Generation
RETROcode: Leveraging a Code Database for Improved Natural Language to Code Generation Open
As text and code resources have expanded, large-scale pre-trained models have shown promising capabilities in code generation tasks, typically employing supervised fine-tuning with problem statement-program pairs. However, increasing model…
View article: NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data
NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data Open
Large Language Models (LLMs) have shown impressive abilities in data annotation, opening the way for new approaches to solve classic NLP problems. In this paper, we show how to use LLMs to create NuNER, a compact language representation mo…
View article: Assessing the Capacity of Transformer to Abstract Syntactic Representations: A Contrastive Analysis Based on Long-distance Agreement
Assessing the Capacity of Transformer to Abstract Syntactic Representations: A Contrastive Analysis Based on Long-distance Agreement Open
Many studies have shown that transformers are able to predict subject-verb agreement, demonstrating their ability to uncover an abstract representation of the sentence in an unsupervised way. Recently, Li et al. (2021) found that transform…
View article: Assessing the Capacity of Transformer to Abstract Syntactic Representations: A Contrastive Analysis Based on Long-distance Agreement
Assessing the Capacity of Transformer to Abstract Syntactic Representations: A Contrastive Analysis Based on Long-distance Agreement Open
The long-distance agreement, evidence for syntactic structure, is increasingly used to assess the syntactic generalization of Neural Language Models. Much work has shown that transformers are capable of high accuracy in varied agreement ta…
View article: The impact of lexical and grammatical processing on generating code from natural language
The impact of lexical and grammatical processing on generating code from natural language Open
Considering the seq2seq architecture of TranX for natural language to code translation, we identify four key components of importance: grammatical constraints, lexical preprocessing, input representations, and copy mechanisms. To study the…
View article: How Distributed are Distributed Representations? An Observation on the Locality of Syntactic Information in Verb Agreement Tasks
How Distributed are Distributed Representations? An Observation on the Locality of Syntactic Information in Verb Agreement Tasks Open
This work addresses the question of the localization of syntactic information encoded in the transformers representations. We tackle this question from two perspectives, considering the object-past participle agreement in French, by identi…
View article: Unifying Parsing and Tree-Structured Models for Generating Sentence Semantic Representations
Unifying Parsing and Tree-Structured Models for Generating Sentence Semantic Representations Open
International audience
View article: Are Transformers a Modern Version of ELIZA? Observations on French Object Verb Agreement
Are Transformers a Modern Version of ELIZA? Observations on French Object Verb Agreement Open
Many recent works have demonstrated that unsupervised sentence representations of neural networks encode syntactic information by observing that neural language models are able to predict the agreement between a verb and its subject. We ta…
View article: Word order in French: the role of animacy
Word order in French: the role of animacy Open
A major goal of the quantitative study of syntax has been to identify factors that have predictive power on speaker choices in the face of word-order or valence alternations (e.g. Arnold et al. 2000; Bresnan et al. 2007; Bresnan & Ford…
View article: Can RNNs learn Recursive Nested Subject-Verb Agreements?
Can RNNs learn Recursive Nested Subject-Verb Agreements? Open
One of the fundamental principles of contemporary linguistics states that language processing requires the ability to extract recursively nested tree structures. However, it remains unclear whether and how this code could be implemented in…
View article: Contrasting distinct structured views to learn sentence embeddings
Contrasting distinct structured views to learn sentence embeddings Open
We propose a self-supervised method that builds sentence embeddings from the combination of diverse explicit syntactic structures of a sentence. We assume structure is crucial to building consistent representations as we expect sentence me…
View article: How Many Layers and Why? An Analysis of the Model Depth in Transformers
How Many Layers and Why? An Analysis of the Model Depth in Transformers Open
Antoine Simoulin, Benoit Crabbé. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop. 2021.
View article: Are Transformers a Modern Version of ELIZA? Observations on French Object Verb Agreement
Are Transformers a Modern Version of ELIZA? Observations on French Object Verb Agreement Open
Many recent works have demonstrated that unsupervised sentence representations of neural networks encode syntactic information by observing that neural language models are able to predict the agreement between a verb and its subject. We ta…
View article: Unlexicalized Transition-based Discontinuous Constituency Parsing
Unlexicalized Transition-based Discontinuous Constituency Parsing Open
Lexicalized parsing models are based on the assumptions that (i) constituents are organized around a lexical head and (ii) bilexical statistics are crucial to solve ambiguities. In this paper, we introduce an unlexicalized transition-based…
View article: Unlexicalized Transition-based Discontinuous Constituency Parsing
Unlexicalized Transition-based Discontinuous Constituency Parsing Open
Lexicalized parsing models are based on the assumptions that (i) constituents are organized around a lexical head (ii) bilexical statistics are crucial to solve ambiguities. In this paper, we introduce an unlexicalized transition-based par…
View article: Using Wiktionary as a resource for WSD : the case of French verbs
Using Wiktionary as a resource for WSD : the case of French verbs Open
As opposed to word sense induction, word sense disambiguation (WSD) has the advantage of us-ing interpretable senses, but requires annotated data, which are quite rare for most languages except English (Miller et al. 1993; Fellbaum, 1998).…
View article: Taraldsen’s generalization in diachrony : evidence from a diachronic corpus
Taraldsen’s generalization in diachrony : evidence from a diachronic corpus Open
International audience
View article: Multilingual Lexicalized Constituency Parsing with Word-Level Auxiliary Tasks
Multilingual Lexicalized Constituency Parsing with Word-Level Auxiliary Tasks Open
We introduce a constituency parser based on a bi-LSTM encoder adapted from recent work (Cross and Huang, 2016b; Kiperwasser and Goldberg, 2016), which can incorporate a lower level character biLSTM (Ballesteros et al., 2015; Plank et al., …
View article: Incremental Discontinuous Phrase Structure Parsing with the GAP Transition
Incremental Discontinuous Phrase Structure Parsing with the GAP Transition Open
This article introduces a novel transition system for discontinuous lexicalized constituent parsing called SR-GAP. It is an extension of the shift-reduce algorithm with an additional gap transition. Evaluation on two German treebanks shows…
View article: Boosting for Efficient Model Selection for Syntactic Parsing
Boosting for Efficient Model Selection for Syntactic Parsing Open
International audience
View article: Natural Language Processing, 60 years after the Chomsky-Schützenberger hierarchy
Natural Language Processing, 60 years after the Chomsky-Schützenberger hierarchy Open
Overview of Natural Language Processing, 60 years after the Chomsky-Schutzenberger hierarchy
View article: Neural Greedy Constituent Parsing with Dynamic Oracles
Neural Greedy Constituent Parsing with Dynamic Oracles Open
Dynamic oracle training has shown substantial improvements for dependency parsing in various settings, but has not been explored for constituent parsing.The present article introduces a dynamic oracle for transition-based constituent parsi…