Explanipedia

Sensitivity-Aware Post-Training Quantization for Deep Neural Networks Open

Zhicong Zheng, Haokun Li, Yi‐Jen Chen, Mingkui Tan, Qingguo Du · 2025

Model quantization reduces neural network parameter precision to achieve compression, but often compromises accuracy. Existing post-training quantization (PTQ) methods employ iterative parameter updates to preserve accuracy under high comp…

Adapt in the Wild: Test-Time Entropy Minimization with Sharpness and Feature Regularization Open

Shuaicheng Niu, Guohao Chen, Deyu Chen, Yifan Zhang, Jiaxiang Wu , et al. · 2025

Test-time adaptation (TTA) may fail to improve or even harm the model performance when test data have: 1) mixed distribution shifts, 2) small batch sizes, 3) online imbalanced label distribution shifts. This is often a key obstacle prevent…

Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis Open

Zhuokun Chen, Jugang Fan, Zhuowei Yu, Bohan Zhuang, Mingkui Tan · 2025

Visual autoregressive modeling, based on the next-scale prediction paradigm, exhibits notable advantages in image quality and model scalability over traditional autoregressive and diffusion models. It generates images by progressively refi…

Deep Electromagnetic Structure Design Under Limited Evaluation Budgets Open

Shijian Zheng, F. Jin, Shuhai Zhang, Quan Xue, Mingkui Tan · 2025

Electromagnetic structure (EMS) design plays a critical role in developing advanced antennas and materials, but remains challenging due to high-dimensional design spaces and expensive evaluations. While existing methods commonly employ hig…

Curse of High Dimensionality Issue in Transformer for Long-context Modeling Open

Shuhai Zhang, You Zeng, Yuheng Chen, Zhiquan Wen, Qianyue Wang , et al. · 2025

Transformer-based large language models (LLMs) excel in natural language processing tasks by capturing long-range dependencies through self-attention mechanisms. However, long-context modeling faces significant computational inefficiencies…

Test-Time Learning for Large Language Models Open

Jinwu Hu, Zhitian Zhang, Guohao Chen, Xin-Jian Wen, Chao Shuai , et al. · 2025

While Large Language Models (LLMs) have exhibited remarkable emergent capabilities through extensive pre-training, they still face critical limitations in generalizing to specialized domains and handling diverse linguistic variations, know…

Enhancing User-Oriented Proactivity in Open-Domain Dialogues with Critic Guidance Open

Yijie Wang, Jinwu Hu, Ziteng Huang, Kunyang Lin, Zitian Zhang , et al. · 2025

Open-domain dialogue systems aim to generate natural and engaging conversations, providing significant practical value in real applications such as social robotics and personal assistants. The advent of large language models (LLMs) has gre…

CHRIS: Clothed Human Reconstruction with Side View Consistency Open

Dong Liu, Yifan Yang, Zixiong Huang, Yuxin Gao, Mingkui Tan · 2025

Creating a realistic clothed human from a single-view RGB image is crucial for applications like mixed reality and filmmaking. Despite some progress in recent years, mainstream methods often fail to fully utilize side-view information, as …

Dynamic Compressing Prompts for Efficient Inference of Large Language Models Open

Jicheng Hu, Wei Zhang, Yufeng Wang, Yu‐Chen Hu, Bin Xiao , et al. · 2025

Large Language Models (LLMs) have shown outstanding performance across a variety of tasks, partly due to advanced prompting techniques. However, these techniques often require lengthy prompts, which increase computational costs and can hin…

Understanding Emotional Body Expressions via Large Language Models Open

Hai‐Feng Lu, Junwen Chen, Feng Liang, Mingkui Tan, Runhao Zeng , et al. · 2025

Emotion recognition based on body movements is vital in human-computer interaction. However, existing emotion recognition methods predominantly focus on enhancing classification accuracy, often neglecting the provision of textual explanati…

Zero-Shot Skeleton-Based Action Recognition With Prototype-Guided Feature Alignment Open

Kai Zhou, Shuhai Zhang, Zeng You, Jinwu Hu, Mingkui Tan , et al. · 2025

Computer science Philosophy

Zero-shot skeleton-based action recognition aims to classify unseen skeleton-based human actions without prior exposure to such categories during training. This task is extremely challenging due to the difficulty in generalizing from known…

Daily Assistance for Amyotrophic Lateral Sclerosis Patients Based on a Wearable Multimodal Brain-Computer Interface Mouse Open

Ya Jiang, Kendi Li, Y X Liang, Di Chen, Mingkui Tan , et al. · 2024

Medicine Computer science Psychology

Amyotrophic lateral sclerosis (ALS) is a chronic, progressive neurodegenerative disease that mainly causes damage to upper and lower motor neurons. This leads to a progressive deterioration in the voluntary mobility of the upper and lower …

Core Context Aware Transformers for Long Context Language Modeling Open

Yuheng Chen, You Zeng, Shuhai Zhang, Haokun Li, Yuanlong Li , et al. · 2024

Computer science History

Transformer-based Large Language Models (LLMs) have exhibited remarkable success in extensive tasks primarily attributed to self-attention mechanism, which requires a token to consider all preceding tokens as its context to compute attenti…

Understanding Emotional Body Expressions via Large Language Models Open

Hai‐Feng Lu, Junwen Chen, Feng Liang, Mingkui Tan, Runhao Zeng , et al. · 2024

Psychology Computer science

Emotion recognition based on body movements is vital in human-computer interaction. However, existing emotion recognition methods predominantly focus on enhancing classification accuracy, often neglecting the provision of textual explanati…

Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds Open

Shuhai Zhang, Jiahao Yang, Hui Luo, Jie Chen, Li Wang , et al. · 2024

Computer science Mathematics

Deep neural networks (DNNs) are vulnerable to adversarial samples crafted by adding imperceptible perturbations to clean data, potentially leading to incorrect and dangerous predictions. Adversarial purification has been an effective means…

Efficient Dynamic Ensembling for Multiple LLM Experts Open

Jinwu Hu, Yufeng Wang, Shuhai Zhang, Kai Zhou, Guohao Chen , et al. · 2024

Computer science

LLMs have demonstrated impressive performance across various language tasks. However, the strengths of LLMs can vary due to different architectures, model sizes, areas of training data, etc. Therefore, ensemble reasoning for the strengths …

Towards Long Video Understanding via Fine-detailed Video Story Generation Open

You Zeng, Zhiquan Wen, Yaofo Chen, Xin Li, Runhao Zeng , et al. · 2024

Computer science

Long video understanding has become a critical task in computer vision, driving advancements across numerous applications from surveillance to content retrieval. Existing video understanding methods suffer from two challenges when dealing …

Enhancing Perception Capabilities of Multimodal LLMs with Training-Free Fusion Open

Zhuokun Chen, Jennifer Hu, Zeshuai Deng, Yufeng Wang, Bohan Zhuang , et al. · 2024

Psychology Computer science Geography

Multimodal LLMs (MLLMs) equip language models with visual capabilities by aligning vision encoders with language models. Existing methods to enhance the visual perception of MLLMs often involve designing more powerful vision encoders, whic…

LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences Open

Hongyan Zhi, Peihao Chen, Junyan Li, Shuailei Ma, Xinyu Sun , et al. · 2024

Computer science Psychology

Research on 3D Vision-Language Models (3D-VLMs) is gaining increasing attention, which is crucial for developing embodied AI within 3D scenes, such as visual navigation and embodied question answering. Due to the high density of visual fea…

Open-World Drone Active Tracking with Goal-Centered Rewards Open

Haowei Sun, Jinwu Hu, Zhirui Zhang, Haoyuan Tian, Xianyan Xie , et al. · 2024

Computer science Geography Psychology

Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations, providing a more practical solution for effective tracking in dynamic environments. However, accurate D…

Robust 3D Semantic Occupancy Prediction with Calibration-free Spatial Transformation Open

Zhuangwei Zhuang, Ziyin Wang, Sitao Chen, Lizhao Liu, Hui Luo , et al. · 2024

Computer science Mathematics Engineering

3D semantic occupancy prediction, which seeks to provide accurate and comprehensive representations of environment scenes, is important to autonomous driving systems. For autonomous cars equipped with multi-camera and LiDAR, it is critical…

A protein fitness predictive framework based on feature combination and intelligent searching Open

Zhihui Zhang, Zhixuan Li, Qianyue Wang, Hanlin Wu, Manli Yang , et al. · 2024

Computer science Biology Mathematics

Machine learning (ML) constructs predictive models by understanding the relationship between protein sequences and their functions, enabling efficient identification of protein sequences with high fitness values without falling into local …

Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs Open

Yanyuan Qiao, Wenqi Lyu, Hui Wang, Zixu Wang, Zerui Li , et al. · 2024

Computer science Chemistry Philosophy

Vision-and-Language Navigation (VLN) tasks require an agent to follow textual instructions to navigate through 3D environments. Traditional approaches use supervised learning methods, relying heavily on domain-specific datasets to train VL…

CoNav: A Benchmark for Human-Centered Collaborative Navigation Open

Changhao Li, Xinyu Sun, Peihao Chen, Jugang Fan, Zixu Wang , et al. · 2024

Computer science Geography

Human-robot collaboration, in which the robot intelligently assists the human with the upcoming task, is an appealing objective. To achieve this goal, the agent needs to be equipped with a fundamental collaborative navigation ability, wher…

MAGIC: Map-Guided Few-Shot Audio-Visual Acoustics Modeling Open

Diwei Huang, Kunyang Lin, Peihao Chen, Qing Du, Mingkui Tan · 2024

Art Computer science Physics

Few-shot audio-visual acoustics modeling seeks to synthesize the room impulse response in arbitrary locations with few-shot observations. To sufficiently exploit the provided few-shot data for accurate acoustic modeling, we present a *map-…

G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images Open

Zixiong Huang, Qi Chen, Libo Sun, Yifan Yang, Naizhou Wang , et al. · 2024

Computer science Mathematics Sociology

Novel view synthesis aims to generate new view images of a given view image collection. Recent attempts address this problem relying on 3D geometry priors (e.g., shapes, sizes, and positions) learned from multi-view images. However, such m…

HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models Open

Yifan Yang, Dong Liu, Shuhai Zhang, Zeshuai Deng, Zixiong Huang , et al. · 2024

Computer science Mathematics

Reconstructing 3D clothed human involves creating a detailed geometry of individuals in clothing, with applications ranging from virtual try-on, movies, to games. To enable practical and widespread applications, recent advances propose to …

AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework Open

Xiang Li, Zhenyu Li, Shi Chen, Yong Xu, Qing Du , et al. · 2024

Business Computer science Geography

The task of financial analysis primarily encompasses two key areas: stock trend prediction and the corresponding financial question answering. Currently, machine learning and deep learning algorithms (ML&DL) have been widely applied for st…

Mingkui Tan YOU? Author Swipe