Dannong Xu
YOU?
Author Swipe
View article: Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models
Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models Open
The rapid growth of visual tokens in multimodal large language models (MLLMs) leads to excessive memory consumption and inference latency, especially when handling high-resolution images and videos. Token pruning is a technique used to mit…
View article: Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents
Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents Open
Large multimodal models (LMMs) have achieved impressive progress in vision-language understanding, yet they face limitations in real-world applications requiring complex reasoning over a large number of images. Existing benchmarks for mult…
View article: Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents
Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents Open
Large multimodal models (LMMs) have achieved impressive progress in vision-language understanding, yet they face limitations in real-world applications requiring complex reasoning over a large number of images. Existing benchmarks for mult…
View article: Comparison and analysis of the machine learning in the movie subtitle document classification
Comparison and analysis of the machine learning in the movie subtitle document classification Open
With the rapid iteration of film and technology, the quantity of movie related subtitle document has been rapidly expanded. However, the issue of classifying these documents is becoming non-negligible. This paper aims to compare different …