Vinçent Pérot
YOU?
Author Swipe
View article: Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting Open
Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses. Recent RAG advancements focus on improving retrieval …
View article: CodecLM: Aligning Language Models with Tailored Synthetic Data
CodecLM: Aligning Language Models with Tailored Synthetic Data Open
Instruction tuning has emerged as the key in aligning large language models (LLMs) with specific task instructions, thereby mitigating the discrepancy between the next-token prediction objective and users' actual goals. To reduce the labor…
View article: Noise-Aware Training of Layout-Aware Language Models
Noise-Aware Training of Layout-Aware Language Models Open
A visually rich document (VRD) utilizes visual features along with linguistic cues to disseminate information. Training a custom extractor that identifies named entities from a document requires a large number of instances of the target do…
View article: Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding Open
Table-based reasoning with large language models (LLMs) is a promising direction to tackle many table understanding tasks, such as table-based question answering and fact verification. Compared with generic reasoning, table-based reasoning…
View article: LMDX: Language Model-based Document Information Extraction and Localization
LMDX: Language Model-based Document Information Extraction and Localization Open
Large Language Models (LLM) have revolutionized Natural Language Processing (NLP), improving state-of-the-art and exhibiting emergent capabilities across various tasks. However, their application in extracting information from visually ric…
View article: FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction Open
The recent advent of self-supervised pre-training techniques has led to a surge in the use of multimodal learning in form document understanding. However, existing approaches that extend the mask language modeling to other modalities requi…
View article: FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction Open
Chen-Yu Lee, Chun-Liang Li, Hao Zhang, Timothy Dozat, Vincent Perot, Guolong Su, Xiang Zhang, Kihyuk Sohn, Nikolay Glushnev, Renshen Wang, Joshua Ainslie, Shangbang Long, Siyang Qin, Yasuhisa Fujii, Nan Hua, Tomas Pfister. Proceedings of t…
View article: QueryForm: A Simple Zero-shot Form Entity Query Framework
QueryForm: A Simple Zero-shot Form Entity Query Framework Open
Zero-shot transfer learning for document understanding is a crucial yet under-investigated scenario to help reduce the high cost involved in annotating document entities. We present a novel query-based framework, QueryForm, that extracts e…
View article: QueryForm: A Simple Zero-shot Form Entity Query Framework
QueryForm: A Simple Zero-shot Form Entity Query Framework Open
Zero-shot transfer learning for document understanding is a crucial yet under-investigated scenario to help reduce the high cost involved in annotating document entities. We present a novel query-based framework, QueryForm, that extracts e…
View article: DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning
DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning Open
Continual learning aims to enable a single model to learn a sequence of tasks without catastrophic forgetting. Top-performing methods usually require a rehearsal buffer to store past pristine examples for experience replay, which, however,…
View article: FormNet: Structural Encoding beyond Sequential Modeling in Form Document\n Information Extraction
FormNet: Structural Encoding beyond Sequential Modeling in Form Document\n Information Extraction Open
Sequence modeling has demonstrated state-of-the-art performance on natural\nlanguage and document understanding tasks. However, it is challenging to\ncorrectly serialize tokens in form-like documents in practice due to their\nvariety of la…
View article: FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction Open
Sequence modeling has demonstrated state-of-the-art performance on natural language and document understanding tasks. However, it is challenging to correctly serialize tokens in form-like documents in practice due to their variety of layou…
View article: Learning to Prompt for Continual Learning
Learning to Prompt for Continual Learning Open
The mainstream paradigm behind continual learning has been to adapt the model parameters to non-stationary data distributions, where catastrophic forgetting is the central challenge. Typical methods rely on a rehearsal buffer or known task…