Mingkui Tan
YOU?
Author Swipe
View article: Sensitivity-Aware Post-Training Quantization for Deep Neural Networks
Sensitivity-Aware Post-Training Quantization for Deep Neural Networks Open
Model quantization reduces neural network parameter precision to achieve compression, but often compromises accuracy. Existing post-training quantization (PTQ) methods employ iterative parameter updates to preserve accuracy under high comp…
View article: Adapt in the Wild: Test-Time Entropy Minimization with Sharpness and Feature Regularization
Adapt in the Wild: Test-Time Entropy Minimization with Sharpness and Feature Regularization Open
Test-time adaptation (TTA) may fail to improve or even harm the model performance when test data have: 1) mixed distribution shifts, 2) small batch sizes, 3) online imbalanced label distribution shifts. This is often a key obstacle prevent…
View article: Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis
Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis Open
Visual autoregressive modeling, based on the next-scale prediction paradigm, exhibits notable advantages in image quality and model scalability over traditional autoregressive and diffusion models. It generates images by progressively refi…
View article: Deep Electromagnetic Structure Design Under Limited Evaluation Budgets
Deep Electromagnetic Structure Design Under Limited Evaluation Budgets Open
Electromagnetic structure (EMS) design plays a critical role in developing advanced antennas and materials, but remains challenging due to high-dimensional design spaces and expensive evaluations. While existing methods commonly employ hig…
View article: Curse of High Dimensionality Issue in Transformer for Long-context Modeling
Curse of High Dimensionality Issue in Transformer for Long-context Modeling Open
Transformer-based large language models (LLMs) excel in natural language processing tasks by capturing long-range dependencies through self-attention mechanisms. However, long-context modeling faces significant computational inefficiencies…
View article: Test-Time Learning for Large Language Models
Test-Time Learning for Large Language Models Open
While Large Language Models (LLMs) have exhibited remarkable emergent capabilities through extensive pre-training, they still face critical limitations in generalizing to specialized domains and handling diverse linguistic variations, know…
View article: Enhancing User-Oriented Proactivity in Open-Domain Dialogues with Critic Guidance
Enhancing User-Oriented Proactivity in Open-Domain Dialogues with Critic Guidance Open
Open-domain dialogue systems aim to generate natural and engaging conversations, providing significant practical value in real applications such as social robotics and personal assistants. The advent of large language models (LLMs) has gre…
View article: CHRIS: Clothed Human Reconstruction with Side View Consistency
CHRIS: Clothed Human Reconstruction with Side View Consistency Open
Creating a realistic clothed human from a single-view RGB image is crucial for applications like mixed reality and filmmaking. Despite some progress in recent years, mainstream methods often fail to fully utilize side-view information, as …
View article: Dynamic Compressing Prompts for Efficient Inference of Large Language Models
Dynamic Compressing Prompts for Efficient Inference of Large Language Models Open
Large Language Models (LLMs) have shown outstanding performance across a variety of tasks, partly due to advanced prompting techniques. However, these techniques often require lengthy prompts, which increase computational costs and can hin…
View article: Understanding Emotional Body Expressions via Large Language Models
Understanding Emotional Body Expressions via Large Language Models Open
Emotion recognition based on body movements is vital in human-computer interaction. However, existing emotion recognition methods predominantly focus on enhancing classification accuracy, often neglecting the provision of textual explanati…
View article: Zero-Shot Skeleton-Based Action Recognition With Prototype-Guided Feature Alignment
Zero-Shot Skeleton-Based Action Recognition With Prototype-Guided Feature Alignment Open
Zero-shot skeleton-based action recognition aims to classify unseen skeleton-based human actions without prior exposure to such categories during training. This task is extremely challenging due to the difficulty in generalizing from known…
View article: Daily Assistance for Amyotrophic Lateral Sclerosis Patients Based on a Wearable Multimodal Brain-Computer Interface Mouse
Daily Assistance for Amyotrophic Lateral Sclerosis Patients Based on a Wearable Multimodal Brain-Computer Interface Mouse Open
Amyotrophic lateral sclerosis (ALS) is a chronic, progressive neurodegenerative disease that mainly causes damage to upper and lower motor neurons. This leads to a progressive deterioration in the voluntary mobility of the upper and lower …
View article: Core Context Aware Transformers for Long Context Language Modeling
Core Context Aware Transformers for Long Context Language Modeling Open
Transformer-based Large Language Models (LLMs) have exhibited remarkable success in extensive tasks primarily attributed to self-attention mechanism, which requires a token to consider all preceding tokens as its context to compute attenti…
View article: Understanding Emotional Body Expressions via Large Language Models
Understanding Emotional Body Expressions via Large Language Models Open
Emotion recognition based on body movements is vital in human-computer interaction. However, existing emotion recognition methods predominantly focus on enhancing classification accuracy, often neglecting the provision of textual explanati…
View article: Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds
Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds Open
Deep neural networks (DNNs) are vulnerable to adversarial samples crafted by adding imperceptible perturbations to clean data, potentially leading to incorrect and dangerous predictions. Adversarial purification has been an effective means…
View article: Efficient Dynamic Ensembling for Multiple LLM Experts
Efficient Dynamic Ensembling for Multiple LLM Experts Open
LLMs have demonstrated impressive performance across various language tasks. However, the strengths of LLMs can vary due to different architectures, model sizes, areas of training data, etc. Therefore, ensemble reasoning for the strengths …
View article: Towards Long Video Understanding via Fine-detailed Video Story Generation
Towards Long Video Understanding via Fine-detailed Video Story Generation Open
Long video understanding has become a critical task in computer vision, driving advancements across numerous applications from surveillance to content retrieval. Existing video understanding methods suffer from two challenges when dealing …
View article: Enhancing Perception Capabilities of Multimodal LLMs with Training-Free Fusion
Enhancing Perception Capabilities of Multimodal LLMs with Training-Free Fusion Open
Multimodal LLMs (MLLMs) equip language models with visual capabilities by aligning vision encoders with language models. Existing methods to enhance the visual perception of MLLMs often involve designing more powerful vision encoders, whic…
View article: LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences Open
Research on 3D Vision-Language Models (3D-VLMs) is gaining increasing attention, which is crucial for developing embodied AI within 3D scenes, such as visual navigation and embodied question answering. Due to the high density of visual fea…
View article: Open-World Drone Active Tracking with Goal-Centered Rewards
Open-World Drone Active Tracking with Goal-Centered Rewards Open
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations, providing a more practical solution for effective tracking in dynamic environments. However, accurate D…
View article: Robust 3D Semantic Occupancy Prediction with Calibration-free Spatial Transformation
Robust 3D Semantic Occupancy Prediction with Calibration-free Spatial Transformation Open
3D semantic occupancy prediction, which seeks to provide accurate and comprehensive representations of environment scenes, is important to autonomous driving systems. For autonomous cars equipped with multi-camera and LiDAR, it is critical…
View article: A protein fitness predictive framework based on feature combination and intelligent searching
A protein fitness predictive framework based on feature combination and intelligent searching Open
Machine learning (ML) constructs predictive models by understanding the relationship between protein sequences and their functions, enabling efficient identification of protein sequences with high fitness values without falling into local …
View article: Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs
Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs Open
Vision-and-Language Navigation (VLN) tasks require an agent to follow textual instructions to navigate through 3D environments. Traditional approaches use supervised learning methods, relying heavily on domain-specific datasets to train VL…
View article: CoNav: A Benchmark for Human-Centered Collaborative Navigation
CoNav: A Benchmark for Human-Centered Collaborative Navigation Open
Human-robot collaboration, in which the robot intelligently assists the human with the upcoming task, is an appealing objective. To achieve this goal, the agent needs to be equipped with a fundamental collaborative navigation ability, wher…
View article: MAGIC: Map-Guided Few-Shot Audio-Visual Acoustics Modeling
MAGIC: Map-Guided Few-Shot Audio-Visual Acoustics Modeling Open
Few-shot audio-visual acoustics modeling seeks to synthesize the room impulse response in arbitrary locations with few-shot observations. To sufficiently exploit the provided few-shot data for accurate acoustic modeling, we present a *map-…
View article: G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images
G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images Open
Novel view synthesis aims to generate new view images of a given view image collection. Recent attempts address this problem relying on 3D geometry priors (e.g., shapes, sizes, and positions) learned from multi-view images. However, such m…
View article: HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models
HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models Open
Reconstructing 3D clothed human involves creating a detailed geometry of individuals in clothing, with applications ranging from virtual try-on, movies, to games. To enable practical and widespread applications, recent advances propose to …
View article: AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework Open
The task of financial analysis primarily encompasses two key areas: stock trend prediction and the corresponding financial question answering. Currently, machine learning and deep learning algorithms (ML&DL) have been widely applied for st…