Xilin Wei
YOU?
Author Swipe
View article: SIM-CoT: Supervised Implicit Chain-of-Thought
SIM-CoT: Supervised Implicit Chain-of-Thought Open
Implicit Chain-of-Thought (CoT) methods offer a token-efficient alternative to explicit CoT reasoning in Large Language Models (LLMs), but a persistent performance gap has limited their adoption. We identify a core latent instability issue…
View article: VideoRoPE: What Makes for Good Video Rotary Position Embedding?
VideoRoPE: What Makes for Good Video Rotary Position Embedding? Open
While Rotary Position Embedding (RoPE) and its variants are widely adopted for their long-context capabilities, the extension of the 1D RoPE to video, with its complex spatio-temporal structure, remains an open challenge. This work first i…
View article: InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Open
Creating AI systems that can interact with environments over long periods, similar to human cognition, has been a longstanding research goal. Recent advancements in multimodal large language models (MLLMs) have made significant strides in …
View article: MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs Open
Generating natural and meaningful responses to communicate with multi-modal human inputs is a fundamental capability of Large Vision-Language Models(LVLMs). While current open-source LVLMs demonstrate promising performance in simplified sc…
View article: ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Open
We present the ShareGPT4Video series, aiming to facilitate the video understanding of large video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) via dense and precise captions. The series comprises: 1) Sha…
View article: InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model Open
We introduce InternLM-XComposer2, a cutting-edge vision-language model excelling in free-form text-image composition and comprehension. This model goes beyond conventional vision-language understanding, adeptly crafting interleaved text-im…
View article: Exploring a novel seven-gene marker and mitochondrial gene TMEM38A for predicting cervical cancer radiotherapy sensitivity using machine learning algorithms
Exploring a novel seven-gene marker and mitochondrial gene TMEM38A for predicting cervical cancer radiotherapy sensitivity using machine learning algorithms Open
Background Radiotherapy plays a crucial role in the management of Cervical cancer (CC), as the development of resistance by cancer cells to radiotherapeutic interventions is a significant factor contributing to treatment failure in patient…
View article: CARP: Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning
CARP: Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning Open
The CARP dataset consists of 4,886 middle school computation-intensive algebra problems, and each problem is associated with a natural language solution and an annotated EFG. Our annotated EFG explicitly depicts the step-by-step reasoning …
View article: CARP: Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning
CARP: Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning Open
The CARP dataset consists of 4,886 middle school computation-intensive algebra problems, and each problem is associated with a natural language solution and an annotated EFG. Our annotated EFG explicitly depicts the step-by-step reasoning …
View article: Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning
Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning Open
Chain-of-thought prompting~(CoT) and tool augmentation have been validated in recent work as effective practices for improving large language models~(LLMs) to perform step-by-step reasoning on complex math-related tasks. However, most exis…