Weisi Lin
YOU?
Author Swipe
View article: Spatio-Temporal Characteristics of Ship Carbon Emissions in Port of New York and New Jersey Based on AIS Data
Spatio-Temporal Characteristics of Ship Carbon Emissions in Port of New York and New Jersey Based on AIS Data Open
Shipping is a major source of carbon emissions and faces an urgent need for decarbonization. Research on vessel carbon emissions not only characterizes regional emission patterns but also provides critical evidence for targeted mitigation …
View article: YOLO-SAM an end-to-end framework for efficient real time object detection and segmentation
YOLO-SAM an end-to-end framework for efficient real time object detection and segmentation Open
Although effective and practical YOLO methods have dominated the field of object detection, they rely on predefined and trained object categories, which limits their broad application. To overcome this limitation, YOLO-World enhances YOLO'…
View article: Compressed Feature Quality Assessment: Dataset and Baselines
Compressed Feature Quality Assessment: Dataset and Baselines Open
The widespread deployment of large models in resource-constrained environments has underscored the need for efficient transmission of intermediate feature representations. In this context, feature coding, which compresses features into com…
View article: Domain Crossover Non-Rigid Registration for 3D Human Meshes
Domain Crossover Non-Rigid Registration for 3D Human Meshes Open
Non-rigid registration is essential for reconstructing dynamic and incomplete 3D human meshes, yet traditional methods often fail to achieve robust alignment in the sequence of high-motion deformations and missing geometry. We propose a do…
View article: DT-UFC: Universal Large Model Feature Coding via Peaky-to-Balanced Distribution Transformation
DT-UFC: Universal Large Model Feature Coding via Peaky-to-Balanced Distribution Transformation Open
Like image coding in visual data transmission, feature coding is essential for the distributed deployment of large models by significantly reducing transmission and storage burden. However, prior studies have mostly targeted task- or model…
View article: LLM-TTA: Leveraging Large Language Models for Test-Time Adaptation in Image Quality Assessment
LLM-TTA: Leveraging Large Language Models for Test-Time Adaptation in Image Quality Assessment Open
Recent test-time adaptation (TTA) methods for image quality assessment (IQA) aim to mitigate the distribution shift between training and testing data by adapting the batch normalization layers of the base IQA model. These methods typically…
View article: Cross-Embodiment Dexterous Hand Articulation Generation via Morphology-Aware Learning
Cross-Embodiment Dexterous Hand Articulation Generation via Morphology-Aware Learning Open
Dexterous grasping with multi-fingered hands remains challenging due to high-dimensional articulations and the cost of optimization-based pipelines. Existing end-to-end methods require training on large-scale datasets for specific hands, l…
View article: DiffPCN: Latent Diffusion Model Based on Multi-view Depth Images for Point Cloud Completion
DiffPCN: Latent Diffusion Model Based on Multi-view Depth Images for Point Cloud Completion Open
Latent diffusion models (LDMs) have demonstrated remarkable generative capabilities across various low-level vision tasks. However, their potential for point cloud completion remains underexplored due to the unstructured and irregular natu…
View article: Single Point, Full Mask: Velocity-Guided Level Set Evolution for End-to-End Amodal Segmentation
Single Point, Full Mask: Velocity-Guided Level Set Evolution for End-to-End Amodal Segmentation Open
Amodal segmentation aims to recover complete object shapes, including occluded regions with no visual appearance, whereas conventional segmentation focuses solely on visible areas. Existing methods typically rely on strong prompts, such as…
View article: Shape Distribution Matters: Shape-specific Mixture-of-Experts for Amodal Segmentation under Diverse Occlusions
Shape Distribution Matters: Shape-specific Mixture-of-Experts for Amodal Segmentation under Diverse Occlusions Open
Amodal segmentation targets to predict complete object masks, covering both visible and occluded regions. This task poses significant challenges due to complex occlusions and extreme shape variation, from rigid furniture to highly deformab…
View article: LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMs
LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMs Open
The rapid advancement of Text-guided Image Editing (TIE) enables image modifications through text prompts. However, current TIE models still struggle to balance image quality, editing alignment, and consistency with the original image, lim…
View article: Knowledge-guided Complex Diffusion Model for PolSAR Image Classification in Contourlet Domain
Knowledge-guided Complex Diffusion Model for PolSAR Image Classification in Contourlet Domain Open
Diffusion models have demonstrated exceptional performance across various domains due to their ability to model and generate complicated data distributions. However, when applied to PolSAR data, traditional real-valued diffusion models fac…
View article: Customizable ROI-Based Deep Image Compression
Customizable ROI-Based Deep Image Compression Open
Region of Interest (ROI)-based image compression optimizes bit allocation by prioritizing ROI for higher-quality reconstruction. However, as the users (including human clients and downstream machine tasks) become more diverse, ROI-based im…
View article: Just Noticeable Difference for Large Multimodal Models
Just Noticeable Difference for Large Multimodal Models Open
Just noticeable difference (JND), the minimum change that the human visual system (HVS) can perceive, has been studied for decades. Although recent work has extended this line of research into machine vision, there has been a scarcity of s…
View article: Cross-architecture universal feature coding via distribution alignment
Cross-architecture universal feature coding via distribution alignment Open
Feature coding has become increasingly important in scenarios where semantic representations rather than raw pixels are transmitted and stored. However, most existing methods are architecture-specific, targeting either CNNs or Transformers…
View article: Compressed Feature Quality Assessment: Dataset and Baselines
Compressed Feature Quality Assessment: Dataset and Baselines Open
The widespread deployment of large models in resource-constrained environments has underscored the need for efficient transmission of intermediate feature representations. In this context, feature coding, which compresses features into com…
View article: Image Quality Assessment for Embodied AI
Image Quality Assessment for Embodied AI Open
Embodied AI has developed rapidly in recent years, but it is still mainly deployed in laboratories, with various distortions in the Real-world limiting its application. Traditionally, Image Quality Assessment (IQA) methods are applied to p…
View article: Multi-Feature Lightweight DeeplabV3+ Network for Polarimetric SAR Image Classification with Attention Mechanism
Multi-Feature Lightweight DeeplabV3+ Network for Polarimetric SAR Image Classification with Attention Mechanism Open
Polarimetric Synthetic Aperture Radar (PolSAR) is an advanced remote sensing technology that provides rich polarimetric information. Deep learning methods have been proved an effective tool for PolSAR image classification. However, relying…
View article: FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics
FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics Open
The rapid and unrestrained advancement of generative artificial intelligence (AI) presents a double-edged sword: while enabling unprecedented creativity, it also facilitates the generation of highly convincing deceptive content, underminin…
View article: Embedding Compression Distortion in Video Coding for Machines
Embedding Compression Distortion in Video Coding for Machines Open
Currently, video transmission serves not only the Human Visual System (HVS) for viewing but also machine perception for analysis. However, existing codecs are primarily optimized for pixel-domain and HVS-perception metrics rather than the …
View article: Squeeze Out Tokens from Sample for Finer-Grained Data Governance
Squeeze Out Tokens from Sample for Finer-Grained Data Governance Open
Widely observed data scaling laws, in which error falls off as a power of the training size, demonstrate the diminishing returns of unselective data expansion. Hence, data governance is proposed to downsize datasets through pruning non-inf…
View article: Image Quality Assessment: From Human to Machine Preference
Image Quality Assessment: From Human to Machine Preference Open
Image Quality Assessment (IQA) based on human subjective preferences has undergone extensive research in the past decades. However, with the development of communication protocols, the visual data consumption volume of machines has gradual…
View article: Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA
Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA Open
Amodal segmentation aims to infer the complete shape of occluded objects, even when the occluded region's appearance is unavailable. However, current amodal segmentation methods lack the capability to interact with users through text input…
View article: Teaching LMMs for Image Quality Scoring and Interpreting
Teaching LMMs for Image Quality Scoring and Interpreting Open
Image quality scoring and interpreting are two fundamental components of Image Quality Assessment (IQA). The former quantifies image quality, while the latter enables descriptive question answering about image quality. Traditionally, these…
View article: Multi-Feature Lightweight DeeplabV3 Network for Polarimetric SAR Image Classification with Attention Mechanism
Multi-Feature Lightweight DeeplabV3 Network for Polarimetric SAR Image Classification with Attention Mechanism Open
Polarimetric Synthetic Aperture Radar (PolSAR) is an advanced remote sensing technology that provides rich polarimetric information. Deep learning methods have been proved an effective tool for PolSAR image classification. However, relying…
View article: Riemannian Complex Hermit Positive Definite Convolution Network for Polarimetric SAR Image Classification
Riemannian Complex Hermit Positive Definite Convolution Network for Polarimetric SAR Image Classification Open
Deep learning can learn high-level semantic features in Euclidean space effectively for PolSAR images, while they need to covert the complex covariance matrix into a feature vector or complex-valued vector as the network input. However, th…
View article: BEAT: Balanced Frequency Adaptive Tuning for Long-Term Time-Series Forecasting
BEAT: Balanced Frequency Adaptive Tuning for Long-Term Time-Series Forecasting Open
Time-series forecasting is crucial for numerous real-world applications including weather prediction and financial market modeling. While temporal-domain methods remain prevalent, frequency-domain approaches can effectively capture multi-s…