Explanipedia

Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models Open

Dannong Xu, Wei Pang · 2025

The rapid growth of visual tokens in multimodal large language models (MLLMs) leads to excessive memory consumption and inference latency, especially when handling high-resolution images and videos. Token pruning is a technique used to mit…

Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents Open

Jun Chen, Dannong Xu, Junjie Fei, Chun-Mei Feng, Mohamed Elhoseiny · 2024

Large multimodal models (LMMs) have achieved impressive progress in vision-language understanding, yet they face limitations in real-world applications requiring complex reasoning over a large number of images. Existing benchmarks for mult…

Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents Open

Jun Chen, Dannong Xu, Junjie Fei, Chun-Mei Feng, Mohamed Elhoseiny · 2024

Large multimodal models (LMMs) have achieved impressive progress in vision-language understanding, yet they face limitations in real-world applications requiring complex reasoning over a large number of images. Existing benchmarks for mult…

Comparison and analysis of the machine learning in the movie subtitle document classification Open

Buxiao Chu, Dannong Xu, Dongze Wu · 2023

With the rapid iteration of film and technology, the quantity of movie related subtitle document has been rapidly expanded. However, the issue of classifying these documents is becoming non-negligible. This paper aims to compare different …

Dannong Xu YOU? Author Swipe