Explanipedia

Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum Open

Zhuoning Guo, Mingxin Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie , et al. · 2025

The prevailing video retrieval paradigm is structurally misaligned, as narrow benchmarks incentivize correspondingly limited data and single-task training. Therefore, universal capability is suppressed due to the absence of a diagnostic ev…

E2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker Open

Qi Liu, Yanzhao Zhang, Mingxin Li, Dingkun Long, Pengjun Xie , et al. · 2025

Text embedding models serve as a fundamental component in real-world search applications. By mapping queries and documents into a shared embedding space, they deliver competitive retrieval performance with high efficiency. However, their r…

Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking Open

Ziqi Dai, Dingkun Long, Pengjun Xie, Meishan Zhang, Wenjie Li , et al. · 2025

In information retrieval, training reranking models mainly focuses on two types of objectives: metric learning (e.g. contrastive loss to increase the predicted scores on relevant query-document pairs) and classification (binary label predi…

Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics Open

Michael Song, R H Liu, Xinyu Wang, Yong Jiang, Pengjun Xie , et al. · 2025

RAG (Retrieval-Augmented Generation) systems and web agents are increasingly evaluated on multi-hop deep search tasks, yet current practice suffers from two major limitations. First, most benchmarks leak the reasoning path in the question …

Scaling Generalist Data-Analytic Agents Open

Shuofei Qiao, Yanqiu Zhao, Zhisong Qiu, Xiaobin Wang, Jintian Zhang , et al. · 2025

Data-analytic agents are emerging as a key catalyst for automated scientific discovery and for the vision of Innovating AI. Current approaches, however, rely heavily on prompt engineering over proprietary models, while open-source models s…

Towards General Agentic Intelligence via Environment Scaling Open

Runnan Fang, Shan Cai, Baixuan Li, Jialong Wu, Guangyu Li , et al. · 2025

Advanced agentic intelligence is a prerequisite for deploying Large Language Models in practical, real-world applications. Diverse real-world APIs demand precise, robust function-calling intelligence, which needs agents to develop these ca…

Scaling Agents via Continual Pre-training Open

Liping Su, Zhen Zhang, Guangyu Li, Sanyuan Chen, Chenxi Wang , et al. · 2025

Large language models (LLMs) have evolved into agentic systems capable of autonomous tool use and multi-step reasoning for complex problem-solving. However, post-training approaches building upon general-purpose foundation models consisten…

WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents Open

Zile Qiao, Guoxin Chen, Xuanzhong Chen, Donglei Yu, Wenbiao Yin , et al. · 2025

Recent advances in deep-research systems have demonstrated the potential for AI agents to autonomously discover and synthesize knowledge from external sources. In this paper, we introduce WebResearcher, a novel framework for building such …

WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning Open

Kuan‐Ching Li, Zhongwang Zhang, Haowen Yin, Rui Ye, Yida Zhao , et al. · 2025

Transcending human cognitive limitations represents a critical frontier in LLM training. Proprietary agentic systems like DeepResearch have demonstrated superhuman capabilities on extremely complex information-seeking benchmarks such as Br…

WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research Open

Zijian Li, Xin Guan, Jie Zhang, Shen Huang, Houquan Zhou , et al. · 2025

This paper tackles \textbf{open-ended deep research (OEDR)}, a complex challenge where AI agents must synthesize vast web-scale information into insightful reports. Current approaches are plagued by dual-fold limitations: static research p…

Memp: Exploring Agent Procedural Memory Open

Runnan Fang, Yuan Liang, Jing Wu, Shuofei Qiao, Pengjun Xie , et al. · 2025

Large Language Models (LLMs) based agents excel at diverse tasks, yet they suffer from brittle procedural memory that is manually engineered or entangled in static parameters. In this work, we investigate strategies to endow agents with a …

DynamicBench: Evaluating Real-Time Report Generation in Large Language Models Open

Jingyao Li, Hao Sun, Zile Qiao, Yong Jiang, Pengjun Xie , et al. · 2025

Traditional benchmarks for large language models (LLMs) typically rely on static evaluations through storytelling or opinion expression, which fail to capture the dynamic requirements of real-time information processing in contemporary app…

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Open

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xintong Zhang, Huan Lin , et al. · 2025

In this work, we introduce the Qwen3 Embedding series, a significant advancement over its predecessor, the GTE-Qwen series, in text embedding and reranking capabilities, built upon the Qwen3 foundation models. Leveraging the Qwen3 LLMs' ro…

WebDancer: Towards Autonomous Information Seeking Agency Open

Jing Wu, Baixuan Li, Runnan Fang, Weihua Yin, Liwen Zhang , et al. · 2025

Addressing intricate real-world problems necessitates in-depth information seeking and multi-step reasoning. Recent progress in agentic systems, exemplified by Deep Research, underscores the potential for autonomous multi-step research. In…

VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning Open

Qiuchen Wang, Ruixue Ding, Yu Zeng, Ze-hui Chen, Lin Chen , et al. · 2025

Effectively retrieving, reasoning and understanding visually rich information remains a challenge for RAG methods. Traditional text-based methods cannot handle visual-related information. On the other hand, current vision-based RAG approac…

EvolveSearch: An Iterative Self-Evolving Search Agent Open

Dingchu Zhang, Yida Zhao, Jing Wu, Baixuan Li, Weihua Yin , et al. · 2025

The rapid advancement of large language models (LLMs) has transformed the landscape of agentic information seeking capabilities through the integration of tools such as search engines and web browsers. However, current mainstream approache…

MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability Open

Weiqi Wu, Xin‐Yuan Guan, Shen Huang, Yong Jiang, Pengjun Xie , et al. · 2025

Retrieval-Augmented Language Models (RALMs) represent a classic paradigm where models enhance generative capabilities using external knowledge retrieved via a specialized module. Recent advancements in Agent techniques enable Large Languag…

ZeroSearch: Incentivize the Search Capability of LLMs without Searching Open

Haowei Sun, Zile Qiao, Jiayan Guo, Yingyan Hou, Pengjun Xie , et al. · 2025

Effective information searching is essential for enhancing the reasoning and generation capabilities of large language models (LLMs). Recent research has explored using reinforcement learning (RL) to improve LLMs' search capabilities by in…

Agentic Knowledgeable Self-awareness Open

Shuofei Qiao, Zhisong Qiu, Baolong Ren, Xiangyuan Ru, Ningyu Zhang , et al. · 2025

Large Language Models (LLMs) have achieved considerable performance across various agentic planning tasks. However, traditional agent planning approaches adopt a "flood irrigation" methodology that indiscriminately injects gold trajectorie…

SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement Open

Runnan Fang, Xiaobin Wang, Yuan Liang, Shuofei Qiao, Jing Wu , et al. · 2025

In the interaction between agents and their environments, agents expand their capabilities by planning and executing actions. However, LLM-based agents face substantial challenges when deployed in novel environments or required to navigate…

Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference Open

Zhuo Chen, Xinyu Wang, Yong Jiang, Zhen Zhang, Xinyu Geng , et al. · 2025

Despite the advancements made in Vision Large Language Models (VLLMs), like text Large Language Models (LLMs), they have limitations in addressing questions that require real-time information or are knowledge-intensive. Indiscriminately ad…

ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents Open

Qiuchen Wang, Ruixue Ding, Zehui Chen, WU Wei-qi, Shihang Wang , et al. · 2025

Understanding information from visually rich documents remains a significant challenge for traditional Retrieval-Augmented Generation (RAG) methods. Existing benchmarks predominantly focus on image-based question answering (QA), overlookin…

Towards Text-Image Interleaved Retrieval Open

Xin Zhang, Ziqi Dai, Yongqi Li, Yanzhao Zhang, Dingkun Long , et al. · 2025

Current multimodal information retrieval studies mainly focus on single-image inputs, which limits real-world applications involving multiple images and text-image interleaved content. In this work, we introduce the text-image interleaved …

LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs -- No Silver Bullet for LC or RAG Routing Open

Kuan Li, Liwen Zhang, Yong Jiang, Pengjun Xie, Fei Huang , et al. · 2025

Effectively incorporating external knowledge into Large Language Models (LLMs) is crucial for enhancing their capabilities and addressing real-world needs. Retrieval-Augmented Generation (RAG) offers an effective method for achieving this …

OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking Open

Zekun Xi, Wenbiao Yin, J. Fang, Jialong Wu, Runnan Fang , et al. · 2025

Machine writing with large language models often relies on retrieval-augmented generation. However, these approaches remain confined within the boundaries of the model's predefined scope, limiting the generation of content with rich inform…

Unsupervised Query Routing for Retrieval Augmented Generation Open

Feiteng Mu, Liwen Zhang, Yong Jiang, Wenjie Li, Zhen Zhang , et al. · 2025

Query routing for retrieval-augmented generation aims to assign an input query to the most suitable search engine. Existing works rely heavily on supervised datasets that require extensive manual annotation, resulting in high costs and lim…

WebWalker: Benchmarking LLMs in Web Traversal Open

Jing Wu, Weihua Yin, Yong Jiang, Zhenglin Wang, Zekun Xi , et al. · 2025

Retrieval-augmented generation (RAG) demonstrates remarkable performance across tasks in open-domain question-answering. However, traditional search engines may retrieve shallow content, limiting the ability of LLMs to handle complex, mult…

Towards Text-Image Interleaved Retrieval Open

Xin Zhang, Ziqi Dai, Yongqi Li, Yanzhao Zhang, Dingkun Long , et al. · 2025

ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions Open

Jingheng Ye, Yong Jiang, Xiaobin Wang, Yinghui Li, Yangning Li , et al. · 2025

Agentic Knowledgeable Self-awareness Open

Shuofei Qiao, Zhisong Qiu, Baolong Ren, Xiaobin Wang, Xiangyuan Ru , et al. · 2025

Pengjun Xie YOU? Author Swipe