Pengjun Xie
YOU?
Author Swipe
View article: Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum
Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum Open
The prevailing video retrieval paradigm is structurally misaligned, as narrow benchmarks incentivize correspondingly limited data and single-task training. Therefore, universal capability is suppressed due to the absence of a diagnostic ev…
View article: E2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker
E2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker Open
Text embedding models serve as a fundamental component in real-world search applications. By mapping queries and documents into a shared embedding space, they deliver competitive retrieval performance with high efficiency. However, their r…
View article: Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking
Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking Open
In information retrieval, training reranking models mainly focuses on two types of objectives: metric learning (e.g. contrastive loss to increase the predicted scores on relevant query-document pairs) and classification (binary label predi…
View article: Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics
Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics Open
RAG (Retrieval-Augmented Generation) systems and web agents are increasingly evaluated on multi-hop deep search tasks, yet current practice suffers from two major limitations. First, most benchmarks leak the reasoning path in the question …
View article: Scaling Generalist Data-Analytic Agents
Scaling Generalist Data-Analytic Agents Open
Data-analytic agents are emerging as a key catalyst for automated scientific discovery and for the vision of Innovating AI. Current approaches, however, rely heavily on prompt engineering over proprietary models, while open-source models s…
View article: Towards General Agentic Intelligence via Environment Scaling
Towards General Agentic Intelligence via Environment Scaling Open
Advanced agentic intelligence is a prerequisite for deploying Large Language Models in practical, real-world applications. Diverse real-world APIs demand precise, robust function-calling intelligence, which needs agents to develop these ca…
View article: Scaling Agents via Continual Pre-training
Scaling Agents via Continual Pre-training Open
Large language models (LLMs) have evolved into agentic systems capable of autonomous tool use and multi-step reasoning for complex problem-solving. However, post-training approaches building upon general-purpose foundation models consisten…
View article: WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents Open
Recent advances in deep-research systems have demonstrated the potential for AI agents to autonomously discover and synthesize knowledge from external sources. In this paper, we introduce WebResearcher, a novel framework for building such …
View article: WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning Open
Transcending human cognitive limitations represents a critical frontier in LLM training. Proprietary agentic systems like DeepResearch have demonstrated superhuman capabilities on extremely complex information-seeking benchmarks such as Br…
View article: WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research Open
This paper tackles \textbf{open-ended deep research (OEDR)}, a complex challenge where AI agents must synthesize vast web-scale information into insightful reports. Current approaches are plagued by dual-fold limitations: static research p…
View article: Memp: Exploring Agent Procedural Memory
Memp: Exploring Agent Procedural Memory Open
Large Language Models (LLMs) based agents excel at diverse tasks, yet they suffer from brittle procedural memory that is manually engineered or entangled in static parameters. In this work, we investigate strategies to endow agents with a …
View article: DynamicBench: Evaluating Real-Time Report Generation in Large Language Models
DynamicBench: Evaluating Real-Time Report Generation in Large Language Models Open
Traditional benchmarks for large language models (LLMs) typically rely on static evaluations through storytelling or opinion expression, which fail to capture the dynamic requirements of real-time information processing in contemporary app…
View article: Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Open
In this work, we introduce the Qwen3 Embedding series, a significant advancement over its predecessor, the GTE-Qwen series, in text embedding and reranking capabilities, built upon the Qwen3 foundation models. Leveraging the Qwen3 LLMs' ro…
View article: WebDancer: Towards Autonomous Information Seeking Agency
WebDancer: Towards Autonomous Information Seeking Agency Open
Addressing intricate real-world problems necessitates in-depth information seeking and multi-step reasoning. Recent progress in agentic systems, exemplified by Deep Research, underscores the potential for autonomous multi-step research. In…
View article: VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning Open
Effectively retrieving, reasoning and understanding visually rich information remains a challenge for RAG methods. Traditional text-based methods cannot handle visual-related information. On the other hand, current vision-based RAG approac…
View article: EvolveSearch: An Iterative Self-Evolving Search Agent
EvolveSearch: An Iterative Self-Evolving Search Agent Open
The rapid advancement of large language models (LLMs) has transformed the landscape of agentic information seeking capabilities through the integration of tools such as search engines and web browsers. However, current mainstream approache…
View article: MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability
MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability Open
Retrieval-Augmented Language Models (RALMs) represent a classic paradigm where models enhance generative capabilities using external knowledge retrieved via a specialized module. Recent advancements in Agent techniques enable Large Languag…
View article: ZeroSearch: Incentivize the Search Capability of LLMs without Searching
ZeroSearch: Incentivize the Search Capability of LLMs without Searching Open
Effective information searching is essential for enhancing the reasoning and generation capabilities of large language models (LLMs). Recent research has explored using reinforcement learning (RL) to improve LLMs' search capabilities by in…
View article: Agentic Knowledgeable Self-awareness
Agentic Knowledgeable Self-awareness Open
Large Language Models (LLMs) have achieved considerable performance across various agentic planning tasks. However, traditional agent planning approaches adopt a "flood irrigation" methodology that indiscriminately injects gold trajectorie…
View article: SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement Open
In the interaction between agents and their environments, agents expand their capabilities by planning and executing actions. However, LLM-based agents face substantial challenges when deployed in novel environments or required to navigate…
View article: Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference
Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference Open
Despite the advancements made in Vision Large Language Models (VLLMs), like text Large Language Models (LLMs), they have limitations in addressing questions that require real-time information or are knowledge-intensive. Indiscriminately ad…
View article: ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents Open
Understanding information from visually rich documents remains a significant challenge for traditional Retrieval-Augmented Generation (RAG) methods. Existing benchmarks predominantly focus on image-based question answering (QA), overlookin…
View article: Towards Text-Image Interleaved Retrieval
Towards Text-Image Interleaved Retrieval Open
Current multimodal information retrieval studies mainly focus on single-image inputs, which limits real-world applications involving multiple images and text-image interleaved content. In this work, we introduce the text-image interleaved …
View article: LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs -- No Silver Bullet for LC or RAG Routing
LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs -- No Silver Bullet for LC or RAG Routing Open
Effectively incorporating external knowledge into Large Language Models (LLMs) is crucial for enhancing their capabilities and addressing real-world needs. Retrieval-Augmented Generation (RAG) offers an effective method for achieving this …
View article: OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking Open
Machine writing with large language models often relies on retrieval-augmented generation. However, these approaches remain confined within the boundaries of the model's predefined scope, limiting the generation of content with rich inform…
View article: Unsupervised Query Routing for Retrieval Augmented Generation
Unsupervised Query Routing for Retrieval Augmented Generation Open
Query routing for retrieval-augmented generation aims to assign an input query to the most suitable search engine. Existing works rely heavily on supervised datasets that require extensive manual annotation, resulting in high costs and lim…
View article: WebWalker: Benchmarking LLMs in Web Traversal
WebWalker: Benchmarking LLMs in Web Traversal Open
Retrieval-augmented generation (RAG) demonstrates remarkable performance across tasks in open-domain question-answering. However, traditional search engines may retrieve shallow content, limiting the ability of LLMs to handle complex, mult…
View article: Towards Text-Image Interleaved Retrieval
Towards Text-Image Interleaved Retrieval Open
View article: ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions
ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions Open
View article: Agentic Knowledgeable Self-awareness
Agentic Knowledgeable Self-awareness Open