Explanipedia

See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model Open

Pengteng Li, Pinhao Song, Wuyang Li, Weiyu Guo, Huaxia Yao , et al. · 2025

We introduce SEE&TREK, the first training-free prompting framework tailored to enhance the spatial understanding of Multimodal Large Language Models (MLLMS) under vision-only constraints. While prior efforts have incorporated modalities li…

VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding Open

Jianxiang He, Maisheng Hong, Jungang Li, Yijie Xu, Ziyang Chen , et al. · 2025

Multimodal large language models (MLLMs) demonstrate exceptional performance in vision-language tasks, yet their processing of long videos is constrained by input context length and high computational costs. Sparse frame sampling thus beco…

Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation Open

Zixuan Wang, Jinghao Shi, H. Liang, Xiang Shen, Vera Wen , et al. · 2025

Effective content moderation is essential for video platforms to safeguard user experience and uphold community standards. While traditional video classification models effectively handle well-defined moderation tasks, they struggle with c…

Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study Open

Zhengyu Hu, Jianxun Lian, Zheyuan Xiao, Seraphina Zhang, Baiying Lei , et al. · 2025

Large language models (LLMs) have shown impressive capabilities across tasks such as mathematics, coding, and reasoning, yet their learning ability, which is crucial for adapting to dynamic environments and acquiring new knowledge, remains…

Deep Generative Architectures for Automated Music Composition: Optimizing Neural Structures and Multimodal Inputs for Style-Conscious Melody and Harmony Generation Open

Hui Xiong · 2025

This study explores the application of deep generative models in the field of intelligent composition, focusing on the impact of network architecture optimization and multimodal input integration on music style fidelity and emotional expre…

ScIRGen: Synthesize Realistic and Large-Scale RAG Dataset for Scientific Research Open

Junyong Lin, Lu Dai, Ruiqian Han, Yijie Sui, R. Wang , et al. · 2025

Scientific researchers need intensive information about datasets to effectively evaluate and develop theories and methodologies. The information needs regarding datasets are implicitly embedded in particular research tasks, rather than exp…

On the Transferability and Discriminability of Repersentation Learning in Unsupervised Domain Adaptation Open

Wenwen Qiang, Ziyin Gu, Lingyu Si, Jiangmeng Li, Changwen Zheng , et al. · 2025

In this paper, we addressed the limitation of relying solely on distribution alignment and source-domain empirical risk minimization in Unsupervised Domain Adaptation (UDA). Our information-theoretic analysis showed that this standard adve…

LLMs as Better Recommenders with Natural Language Collaborative Signals: A Self-Assessing Retrieval Approach Open

Haoran Xin, Ying Sun, Chao Wang, Weijia Zhang, Hui Xiong · 2025

Incorporating collaborative information (CI) effectively is crucial for leveraging LLMs in recommendation tasks. Existing approaches often encode CI using soft tokens or abstract identifiers, which introduces a semantic misalignment with t…

GCAL: Adapting Graph Models to Evolving Domain Shifts Open

Qianyi Cai, Hao Dong, Jiawei Gu, Meng Xiao, Hui Xiong · 2025

This paper addresses the challenge of graph domain adaptation on evolving, multiple out-of-distribution (OOD) graphs. Conventional graph domain adaptation methods are confined to single-step adaptation, making them ineffective in handling …

Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs Open

Jingyao Wang, Wenwen Qiang, Zeen Song, Changwen Zheng, Hui Xiong · 2025

Large language models (LLMs) excel at complex tasks thanks to advances in their reasoning abilities. However, existing methods overlook the trade-off between reasoning effectiveness and efficiency, often encouraging unnecessarily long reas…

From Events to Enhancement: A Survey on Event-Based Imaging Technologies Open

Yunfan Lu, Xiaogang Xu, Pengteng Li, Yi Cui, Huaxia Yao , et al. · 2025

Event cameras offering high dynamic range and low latency have emerged as disruptive technologies in imaging. Despite growing research on leveraging these benefits for different imaging tasks, a comprehensive study of recently advances and…

Unleashing the Power of Large Language Model for Denoising Recommendation Open

Shuyao Wang, Zhi Zheng, Yongduo Sui, Hui Xiong · 2025

Computer science Philosophy

Recommender systems are crucial for personalizing user experiences but often depend on implicit feedback data, which can be noisy and misleading. Existing denoising studies involve incorporating auxiliary information or learning strategies…

A Survey of Reasoning with Foundation Models: Concepts, Methodologies, and Outlook Open

Junkui Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu , et al. · 2025

Computer science Economics History

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artifi…

TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning Open

Hang Ni, Fan Liu, Xinyu Ma, Lixin Su, Shuaiqiang Wang , et al. · 2025

Large language models (LLMs) have shown promise in automating travel planning, yet they often fall short in addressing nuanced spatiotemporal rationality. While existing benchmarks focus on basic plan validity, they neglect critical aspect…

Cognitive Disentanglement for Referring Multi-Object Tracking Open

Shaofeng Liang, Runwei Guan, Wei Lian, Daizong Liu, Xiaolou Sun , et al. · 2025

As a significant application of multi-source information fusion in intelligent transportation perception systems, Referring Multi-Object Tracking (RMOT) involves localizing and tracking specific objects in video sequences based on language…

From Understanding to Excelling: Template-Free Algorithm Design through Structural-Functional Co-Evolution Open

Zhe Zhao, Han Wen, Pengkun Wang, Wei Ye, Zaixi Zhang , et al. · 2025

Large language models (LLMs) have greatly accelerated the automation of algorithm generation and optimization. However, current methods such as EoH and FunSearch mainly rely on predefined templates and expert-specified functions that focus…

SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models Open

Chuan Qin, Xiangfan Chen, Chengrui Wang, Pengmin Wu, Jingyi Zhao , et al. · 2025

In recent years, the rapid advancement of Artificial Intelligence (AI) technologies, particularly Large Language Models (LLMs), has revolutionized the paradigm of scientific discovery, establishing AI-for-Science (AI4Science) as a dynamic …

Hui Xiong YOU? Author Swipe