Zenglin Xu
YOU?
Author Swipe
View article: Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization
Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization Open
Developing a universal and versatile embodied intelligence system presents two primary challenges: the critical embodied data bottleneck, where real-world data is scarce and expensive, and the algorithmic inefficiency of existing methods, …
View article: A Polynomial-time Algorithm for Online Sparse Linear Regression with Improved Regret Bound under Weaker Conditions
A Polynomial-time Algorithm for Online Sparse Linear Regression with Improved Regret Bound under Weaker Conditions Open
In this paper, we study the problem of online sparse linear regression (OSLR) where the algorithms are restricted to accessing only $k$ out of $d$ attributes per instance for prediction, which was proved to be NP-hard. Previous work gave p…
View article: NEXUS-O: An Omni-Perceptive and -Interactive Model for Language, Audio, and Vision
NEXUS-O: An Omni-Perceptive and -Interactive Model for Language, Audio, and Vision Open
View article: IndexNet: Timestamp and Variable-Aware Modeling for Time Series Forecasting
IndexNet: Timestamp and Variable-Aware Modeling for Time Series Forecasting Open
Multivariate time series forecasting (MTSF) plays a vital role in a wide range of real-world applications, such as weather prediction and traffic flow forecasting. Although recent advances have significantly improved the modeling of tempor…
View article: From Implicit Exploration to Structured Reasoning: Leveraging Guideline and Refinement for LLMs
From Implicit Exploration to Structured Reasoning: Leveraging Guideline and Refinement for LLMs Open
Large language models (LLMs) have advanced general-purpose reasoning, showing strong performance across diverse tasks. However, existing methods often rely on implicit exploration, where the model follows stochastic and unguided reasoning …
View article: Introducing Academia AI and Applications: a new platform for responsible and interdisciplinary AI research
Introducing Academia AI and Applications: a new platform for responsible and interdisciplinary AI research Open
View article: FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation
FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation Open
View article: FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation
FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation Open
Modern recommendation systems face significant challenges in processing multimodal sequential data, particularly in temporal dynamics modeling and information flow coordination. Traditional approaches struggle with distribution discrepanci…
View article: Efficient Network Automatic Relevance Determination
Efficient Network Automatic Relevance Determination Open
We propose Network Automatic Relevance Determination (NARD), an extension of ARD for linearly probabilistic models, to simultaneously model sparse relationships between inputs $X \in \mathbb R^{d \times N}$ and outputs $Y \in \mathbb R^{m …
View article: Simple Yet Effective: Extracting Private Data Across Clients in Federated Fine-Tuning of Large Language Models
Simple Yet Effective: Extracting Private Data Across Clients in Federated Fine-Tuning of Large Language Models Open
Federated fine-tuning of large language models (FedLLMs) presents a promising approach for achieving strong model performance while preserving data privacy in sensitive domains. However, the inherent memorization ability of LLMs makes them…
View article: Heterogeneous Group-Based Reinforcement Learning for LLM-based Multi-Agent Systems
Heterogeneous Group-Based Reinforcement Learning for LLM-based Multi-Agent Systems Open
Large Language Models (LLMs) have achieved remarkable success across diverse natural language processing tasks, yet their deployment in real-world applications is hindered by fixed knowledge cutoffs and difficulties in generating controlla…
View article: FL@FM-TheWebConf'25: International Workshop on Federated Foundation Models for the Web 2025
FL@FM-TheWebConf'25: International Workshop on Federated Foundation Models for the Web 2025 Open
View article: GeoPro-Net: Learning Interpretable Spatiotemporal Prediction Models Through Statistically-Guided Geo-Prototyping
GeoPro-Net: Learning Interpretable Spatiotemporal Prediction Models Through Statistically-Guided Geo-Prototyping Open
The problem of forecasting spatiotemporal events such as crimes and accidents is crucial to public safety and city management. Besides accuracy, interpretability is also a key requirement for spatiotemporal forecasting models to justify th…
View article: Position: LLMs Can be Good Tutors in English Education
Position: LLMs Can be Good Tutors in English Education Open
While recent efforts have begun integrating large language models (LLMs) into English education, they often rely on traditional approaches to learning tasks without fully embracing educational methodologies, thus lacking adaptability to la…
View article: From Implicit Exploration to Structured Reasoning: Guideline and Refinement for LLMs
From Implicit Exploration to Structured Reasoning: Guideline and Refinement for LLMs Open
View article: Position: LLMs Can be Good Tutors in English Education
Position: LLMs Can be Good Tutors in English Education Open
View article: Towards Incomplete Multimodal Learning with Prompt-Based Hierarchical Knowledge Distillation
Towards Incomplete Multimodal Learning with Prompt-Based Hierarchical Knowledge Distillation Open
View article: From Continuous Pre-Training to Alignment: A Comprehensive Toolkit for Large Language Models in Federated Learning
From Continuous Pre-Training to Alignment: A Comprehensive Toolkit for Large Language Models in Federated Learning Open
View article: Enhancing Progressive Ensemble Learning via Normalized Extra-Gradient Initialization
Enhancing Progressive Ensemble Learning via Normalized Extra-Gradient Initialization Open
View article: FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making
FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making Open
View article: GeoPro-Net: Learning Interpretable Spatiotemporal Prediction Models through Statistically-Guided Geo-Prototyping
GeoPro-Net: Learning Interpretable Spatiotemporal Prediction Models through Statistically-Guided Geo-Prototyping Open
The problem of forecasting spatiotemporal events such as crimes and accidents is crucial to public safety and city management. Besides accuracy, interpretability is also a key requirement for spatiotemporal forecasting models to justify th…
View article: CENTAUR: Bridging the Impossible Trinity of Privacy, Efficiency, and Performance in Privacy-Preserving Transformer Inference
CENTAUR: Bridging the Impossible Trinity of Privacy, Efficiency, and Performance in Privacy-Preserving Transformer Inference Open
With the growing deployment of pre-trained models like Transformers on cloud platforms, privacy concerns about model parameters and inference data are intensifying. Existing Privacy-Preserving Transformer Inference (PPTI) frameworks face t…
View article: Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack
Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack Open
Recent advancements in pre-trained large language models (LLMs) have significantly influenced various domains. Adapting these models for specific tasks often involves fine-tuning (FT) with private, domain-specific data. However, privacy co…
View article: Understanding Generalization of Federated Learning: the Trade-off between Model Stability and Optimization
Understanding Generalization of Federated Learning: the Trade-off between Model Stability and Optimization Open
Federated Learning (FL) is a distributed learning approach that trains machine learning models across multiple devices while keeping their local data private. However, FL often faces challenges due to data heterogeneity, leading to inconsi…
View article: CUPID: Improving Battle Fairness and Position Satisfaction in Online MOBA Games with a Re-matchmaking System
CUPID: Improving Battle Fairness and Position Satisfaction in Online MOBA Games with a Re-matchmaking System Open
The multiplayer online battle arena (MOBA) genre has gained significant popularity and economic success, attracting considerable research interest within the Human-Computer Interaction community. Enhancing the gaming experience requires a …
View article: MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning
MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning Open
In recent years, multimodal benchmarks for general domains have guided the rapid development of multimodal models on general tasks. However, the financial field has its peculiarities. It features unique graphical images (e.g., candlestick …
View article: FedCVD: The First Real-World Federated Learning Benchmark on Cardiovascular Disease Data
FedCVD: The First Real-World Federated Learning Benchmark on Cardiovascular Disease Data Open
Cardiovascular diseases (CVDs) are currently the leading cause of death worldwide, highlighting the critical need for early diagnosis and treatment. Machine learning (ML) methods can help diagnose CVDs early, but their performance relies o…
View article: TimeCNN: Refining Cross-Variable Interaction on Time Point for Time Series Forecasting
TimeCNN: Refining Cross-Variable Interaction on Time Point for Time Series Forecasting Open
Time series forecasting is extensively applied across diverse domains. Transformer-based models demonstrate significant potential in modeling cross-time and cross-variable interaction. However, we notice that the cross-variable correlation…
View article: M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning
M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning Open
Multimodal Large Language Models (MLLMs) demonstrate remarkable performance across a wide range of domains, with increasing emphasis on enhancing their zero-shot generalization capabilities for unseen tasks across various modalities. Instr…
View article: Can we only use guideline instead of shot in prompt?
Can we only use guideline instead of shot in prompt? Open
Currently, prompting techniques can be mainly divided into two categories:1)shot method implicitly inspires the model to answer the question by mimicing the steps in the given example, e.g., the few-shot CoT. 2) Guideline method explicitly…