Thomas François
YOU?
Author Swipe
View article: UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment
UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment Open
We introduce UniversalCEFR, a large-scale multilingual and multidimensional dataset of texts annotated with CEFR (Common European Framework of Reference) levels in 13 languages. To enable open research in automated readability and language…
View article: Generating Contexts for ESP Vocabulary Exercises with LLMs
Generating Contexts for ESP Vocabulary Exercises with LLMs Open
The current paper addresses the need for language students and teachers to have access to a large number of pedagogically sound contexts for vocabulary acquisition and testing. We investigate the automatic derivation of contexts for a voca…
View article: iRead4Skills Dataset 1: corpora by complexity level for FR, PT and SP
iRead4Skills Dataset 1: corpora by complexity level for FR, PT and SP Open
The iRead4Skills Dataset 1: corpora by level of complexity for FR, PT and SP is a collection of written texts of several genres and levels of complexity, in txt format, compiled under the scope of the project iReadSkills – Intelligent Read…
View article: Approaching Semantic Text Similarity with Hybrid Methods: a Case Study on French
Approaching Semantic Text Similarity with Hybrid Methods: a Case Study on French Open
International audience
View article: Jargon: A Suite of Language Models and Evaluation Tasks for French Specialized Domains
Jargon: A Suite of Language Models and Evaluation Tasks for French Specialized Domains Open
Pretrained Language Models (PLMs) are the de facto backbone of most state-of-the-art NLP systems. In this paper, we introduce a family of domain-specific pretrained PLMs for French, focusing on three important domains: transcribed speech, …
View article: iRead4Skills Dataset 1: corpora by complexity level for FR, PT and SP
iRead4Skills Dataset 1: corpora by complexity level for FR, PT and SP Open
The iRead4Skills Dataset 1: corpora by level of complexity for FR, PT and SP is a collection of written texts of several genres and levels of complexity, in txt format, compiled under the scope of the project iReadSkills – Intelligent Read…
View article: Word Sense Disambiguation for Automatic Translation of Medical Dialogues into Pictographs
Word Sense Disambiguation for Automatic Translation of Medical Dialogues into Pictographs Open
Word sense disambiguation is an NLP task embedded in different applications. We propose to evaluate its contribution to the automatic translation of French texts into pictographs, in the context of communication between doctors and patient…
View article: TCFLE-8: a Corpus of Learner Written Productions for French as a Foreign Language and its Application to Automated Essay Scoring
TCFLE-8: a Corpus of Learner Written Productions for French as a Foreign Language and its Application to Automated Essay Scoring Open
Automated Essay Scoring (AES) aims to automatically assess the quality of essays. Automation enables large-scale assessment, improvements in consistency, reliability, and standardization. Those characteristics are of particular relevance i…
View article: Proceedings of the 11th Workshop on Natural Language Processing for Computer-Assisted Language Learning (NLP4CALL 2022)
Proceedings of the 11th Workshop on Natural Language Processing for Computer-Assisted Language Learning (NLP4CALL 2022) Open
The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL system…
View article: Towards a Verb Profile: distribution of verbal tenses in FFL textbooks and in learner productions
Towards a Verb Profile: distribution of verbal tenses in FFL textbooks and in learner productions Open
Morphological inflection is known to be difficult to master for L2 learners. In this paper, we examine the state of the use of inflection in the verbal tense system among learners of French, and contrast it with the use in FFL textbooks. T…
View article: Revisiting simplification in corpus-based translation studies: Insights from readability research
Revisiting simplification in corpus-based translation studies: Insights from readability research Open
Ever since the publication of Laviosa’s (1998a; 1998b) pioneering work, the study of lexico-syntactic simplification has held centre stage in corpus translation research concerned with the typical features of translated texts. The simplifi…
View article: Dialogue systems for language learning: A meta-analysis
Dialogue systems for language learning: A meta-analysis Open
The present study offers a meta-analysis of effectiveness studies on dialogue-based CALL, systems affording a learner practice in a foreign language (L2) by interacting with a conversational agent (“bot”). Through a systematic inclusion an…
View article: Plume-induced sinking of the intracontinental lithosphereas a fundamentally new mechanism of subduction initiation.
Plume-induced sinking of the intracontinental lithosphereas a fundamentally new mechanism of subduction initiation. Open
<p>Although many different mechanisms for subduction initiation have been proposed, few of them are viable in terms of agreement with observations and reproducibility in numerical experiments. In particular, it has recently been demo…
View article: Simplification of literary and scientific texts to improve reading fluency and comprehension in beginning readers of French
Simplification of literary and scientific texts to improve reading fluency and comprehension in beginning readers of French Open
Reading comprehension and fluency are crucial for successful academic learning and achievement. Yet, a rather large percentage of children still have enormous difficulties in understanding a written text at the end of primary school. In th…
View article: CENTAL at TSAR-2022 Shared Task: How Does Context Impact BERT-Generated Substitutions for Lexical Simplification?
CENTAL at TSAR-2022 Shared Task: How Does Context Impact BERT-Generated Substitutions for Lexical Simplification? Open
Rodrigo Wilkens, David Alfter, Rémi Cardon, Isabelle Gribomont, Adrien Bibal, Watrin Patrick, Marie-Catherine De marneffe, Thomas François. Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022). 202…
View article: PADDLe: a Platform to Identify Complex Words for Learners of French as a Foreign Language (FFL)
PADDLe: a Platform to Identify Complex Words for Learners of French as a Foreign Language (FFL) Open
Annotations of word difficulty by readers provide invaluable insights into lexical complexity. Yet, there is currently a paucity of tools allowing researchers to gather such annotations in an adaptable and simple manner. This article prese…
View article: HECTOR: A Hybrid TExt SimplifiCation TOol for Raw Texts in French
HECTOR: A Hybrid TExt SimplifiCation TOol for Raw Texts in French Open
Reducing the complexity of texts by applying an Automatic Text Simplification (ATS) system has been sparking interest in the area of Natural Language Processing (NLP) for several years and a number of methods and evaluation campaigns have …
View article: Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022)
Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022) Open
We present MozoLM, an open-source language model microservice package intended for use in AAC text-entry applications, with a particular focus on the design principles of the library.The intent of the library is to allow the ensembling of …
View article: Is Attention Explanation? An Introduction to the Debate
Is Attention Explanation? An Introduction to the Debate Open
Adrien Bibal, Rémi Cardon, David Alfter, Rodrigo Wilkens, Xiaoou Wang, Thomas François, Patrick Watrin. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022.
View article: Investigating the Medical Coverage of a Translation System into Pictographs for Patients with an Intellectual Disability
Investigating the Medical Coverage of a Translation System into Pictographs for Patients with an Intellectual Disability Open
Communication between physician and patients can lead to misunderstandings, especially for disabled people. An automatic system that translates natural language into a pictographic language is one of the solutions that could help to overco…
View article: Linguistic Corpus Annotation for Automatic Text Simplification Evaluation
Linguistic Corpus Annotation for Automatic Text Simplification Evaluation Open
Rémi Cardon, Adrien Bibal, Rodrigo Wilkens, David Alfter, Magali Norré, Adeline Müller, Watrin Patrick, Thomas François. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022.
View article: Concept-based instruction for applied L2 acquisition
Concept-based instruction for applied L2 acquisition Open
This study’s objective is to present an overview of experimental applications of Concept-Based Instruction (CBI) for Second Language Acquisition. CBI aims to describe complex grammatical notions in a thorough manner in order to facilitate …
View article: Plain language practices of professional writers in Quebec
Plain language practices of professional writers in Quebec Open
This article investigates the plain language practices of professional writers in Quebec, using a survey. We contacted 55 professional writers and asked them to complete an online survey about how they apply plain language in their work, a…
View article: Experiments for the adaptation of Text2Picto to French
Experiments for the adaptation of Text2Picto to French Open
The Dutch Text2Picto system (Sevens, 2018; Vandeghinste et al., 2015) aims to automatically translate text into pictographs for people with an intellectual disability in the context of Augmentative and Alternative Communication (AAC). The …
View article: Extending a Text-to-Pictograph System to French and to Arasaac
Extending a Text-to-Pictograph System to French and to Arasaac Open
We present an adaptation of the Text-to-Picto system, initially designed for Dutch, and extended to English and Spanish. The original system, aimed at people with an intellectual disability, automatically translates text into pictographs (…
View article: FrenLyS: A Tool for the Automatic Simplification of French General Language Texts
FrenLyS: A Tool for the Automatic Simplification of French General Language Texts Open
Lexical simplification (LS) aims at replacing words considered complex in a sentence by simpler equivalents.In this paper, we present the first automatic LS service for French, FrenLyS, which offers different techniques to generate, select…
View article: Reading with maculopathy: the inhibitory effect of word neighborhood size is modulated by word predictability and reading proficiency
Reading with maculopathy: the inhibitory effect of word neighborhood size is modulated by word predictability and reading proficiency Open
Background: For normally sighted readers, word neighborhood size (i.e., the total number of words that can be formed from a single word by changing only one letter) has a facilitator effect on word recognition. When reading with central fi…