Wenhan Xiong
YOU?
Author Swipe
View article: Law of the Weakest Link: Cross Capabilities of Large Language Models
Law of the Weakest Link: Cross Capabilities of Large Language Models Open
The development and evaluation of Large Language Models (LLMs) have largely focused on individual capabilities. However, this overlooks the intersection of multiple abilities across different types of expertise that are often required for …
View article: FLAME: Factuality-Aware Alignment for Large Language Models
FLAME: Factuality-Aware Alignment for Large Language Models Open
Alignment is a standard procedure to fine-tune pre-trained large language models (LLMs) to follow natural language instructions and serve as helpful AI assistants. We have observed, however, that the conventional alignment process fails to…
View article: Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Open
The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Tra…
View article: Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning
Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning Open
This paper introduces Scene-LLM, a 3D-visual-language model that enhances embodied agents' abilities in interactive 3D indoor environments by integrating the reasoning strengths of Large Language Models (LLMs). Scene-LLM adopts a hybrid 3D…
View article: The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task Open
The study explores the effectiveness of the Chain-of-Thought approach, known for its proficiency in language tasks by breaking them down into sub-tasks and intermediate steps, in improving vision-language tasks that demand sophisticated pe…
View article: Sub-network Discovery and Soft-masking for Continual Learning of Mixed Tasks
Sub-network Discovery and Soft-masking for Continual Learning of Mixed Tasks Open
Continual learning (CL) has two main objectives: preventing catastrophic forgetting (CF) and encouraging knowledge transfer (KT). The existing literature mainly focused on overcoming CF. Some work has also been done on KT when the tasks ar…
View article: Effective Long-Context Scaling of Foundation Models
Effective Long-Context Scaling of Foundation Models Open
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our model series are built through continual pretraining from Llama 2 with longer training sequences and on a dataset where long texts …
View article: LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models
LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models Open
Today's large language models (LLMs) typically train on short text segments (e.g., <4K tokens) due to the quadratic complexity of their Transformer architectures. As a result, their performance suffers drastically on inputs longer than tho…
View article: Code Llama: Open Foundation Models for Code
Code Llama: Open Foundation Models for Code Open
We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following abil…
View article: Prompting Large Language Models with Speech Recognition Abilities
Prompting Large Language Models with Speech Recognition Abilities Open
Large language models have proven themselves highly flexible, able to solve a wide range of generative tasks, such as abstractive summarization and open-ended question answering. In this paper we extend the capabilities of LLMs by directly…
View article: Text-guided 3D Human Generation from 2D Collections
Text-guided 3D Human Generation from 2D Collections Open
3D human modeling has been widely used for engaging interaction in gaming, film, and animation. The customization of these characters is crucial for creativity and scalability, which highlights the importance of controllability. In this wo…
View article: Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality Open
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning, leading to state-of-the-art models for various downstream multimodal tasks. However, recent research has highlig…
View article: Multi-Head State Space Model for Speech Recognition
Multi-Head State Space Model for Speech Recognition Open
State space models (SSMs) have recently shown promising results on small-scale sequence and language modelling tasks, rivalling and outperforming many attention-based approaches. In this paper, we propose a multi-head state space (MH-SSM) …
View article: VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation
VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation Open
We propose a new two-stage pre-training framework for video-to-text generation tasks such as video captioning and video question answering: A generative encoder-decoder model is first jointly pre-trained on massive image-text data to learn…
View article: 3DGen: Triplane Latent Diffusion for Textured Mesh Generation
3DGen: Triplane Latent Diffusion for Textured Mesh Generation Open
Latent diffusion models for image generation have crossed a quality threshold which enabled them to achieve mass adoption. Recently, a series of works have made advancements towards replicating this success in the 3D domain, introducing te…
View article: CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding
CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding Open
Indoor scene synthesis involves automatically picking and placing furniture appropriately on a floor plan, so that the scene looks realistic and is functionally plausible. Such scenes can serve as homes for immersive 3D experiences, or be …
View article: Sub-network Discovery and Soft-masking for Continual Learning of Mixed Tasks
Sub-network Discovery and Soft-masking for Continual Learning of Mixed Tasks Open
Continual learning (CL) has two main objectives: preventing catastrophic forgetting (CF) and encouraging knowledge transfer (KT). The existing literature mainly focused on overcoming CF. Some work has also been done on KT when the tasks ar…
View article: Adapting Pretrained Text-to-Text Models for Long Text Sequences
Adapting Pretrained Text-to-Text Models for Long Text Sequences Open
We present an empirical study of adapting an existing pretrained text-to-text model for long-sequence inputs. Through a comprehensive study along three axes of the pretraining pipeline – model architecture, optimization objective, and pret…
View article: Text-guided 3D Human Generation from 2D Collections
Text-guided 3D Human Generation from 2D Collections Open
3D human modeling has been widely used for engaging interaction in gaming, film, and animation. The customization of these characters is crucial for creativity and scalability, which highlights the importance of controllability. In this wo…
View article: Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality Open
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning. However, recent research has highlighted severe limitations of these models in their ability to perform composit…
View article: Bridging the Training-Inference Gap for Dense Phrase Retrieval
Bridging the Training-Inference Gap for Dense Phrase Retrieval Open
Building dense retrievers requires a series of standard procedures, including training and validating neural models and creating indexes for efficient search. However, these procedures are often misaligned in that training objectives do no…
View article: Adapting Pretrained Text-to-Text Models for Long Text Sequences
Adapting Pretrained Text-to-Text Models for Long Text Sequences Open
We present an empirical study of adapting an existing pretrained text-to-text model for long-sequence inputs. Through a comprehensive study along three axes of the pretraining pipeline -- model architecture, optimization objective, and pre…
View article: SCROLLS: Standardized CompaRison Over Long Language Sequences
SCROLLS: Standardized CompaRison Over Long Language Sequences Open
NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over…
View article: Simple Local Attentions Remain Competitive for Long-Context Tasks
Simple Local Attentions Remain Competitive for Long-Context Tasks Open
Wenhan Xiong, Barlas Oguz, Anchit Gupta, Xilun Chen, Diana Liskovich, Omer Levy, Scott Yih, Yashar Mehdad. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Te…
View article: SCROLLS: Standardized CompaRison Over Long Language Sequences
SCROLLS: Standardized CompaRison Over Long Language Sequences Open
Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, Omer Levy. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022.
View article: Boosted Dense Retriever
Boosted Dense Retriever Open
Patrick Lewis, Barlas Oguz, Wenhan Xiong, Fabio Petroni, Scott Yih, Sebastian Riedel. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022.
View article: Bridging the Training-Inference Gap for Dense Phrase Retrieval
Bridging the Training-Inference Gap for Dense Phrase Retrieval Open
Building dense retrievers requires a series of standard procedures, including training and validating neural models and creating indexes for efficient search. However, these procedures are often misaligned in that training objectives do no…
View article: Simple Local Attentions Remain Competitive for Long-Context Tasks
Simple Local Attentions Remain Competitive for Long-Context Tasks Open
Many NLP tasks require processing long contexts beyond the length limit of pretrained models. In order to scale these models to longer text sequences, many efficient long-range attention variants have been proposed. Despite the abundance o…
View article: Boosted Dense Retriever
Boosted Dense Retriever Open
We propose DrBoost, a dense retrieval ensemble inspired by boosting. DrBoost is trained in stages: each component model is learned sequentially and specialized by focusing only on retrieval mistakes made by the current ensemble. The final …
View article: Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation Open
Most existing vision-language pre-training methods focus on understanding tasks and use BERT-like objectives (masked language modeling and image-text matching) during pretraining. Although they perform well in many understanding downstream…