Yun Fu
YOU?
Author Swipe
View article: Representation Potentials of Foundation Models for Multimodal Alignment: A Survey
Representation Potentials of Foundation Models for Multimodal Alignment: A Survey Open
Foundation models learn highly transferable representations through large-scale pretraining on diverse data. An increasing body of research indicates that these representations exhibit a remarkable degree of similarity across architectures…
View article: MTS-DMAE: Dual-Masked Autoencoder for Unsupervised Multivariate Time Series Representation Learning
MTS-DMAE: Dual-Masked Autoencoder for Unsupervised Multivariate Time Series Representation Learning Open
Unsupervised multivariate time series (MTS) representation learning aims to extract compact and informative representations from raw sequences without relying on labels, enabling efficient transfer to diverse downstream tasks. In this pape…
View article: AdaSports-Traj: Role- and Domain-Aware Adaptation for Multi-Agent Trajectory Modeling in Sports
AdaSports-Traj: Role- and Domain-Aware Adaptation for Multi-Agent Trajectory Modeling in Sports Open
Trajectory prediction in multi-agent sports scenarios is inherently challenging due to the structural heterogeneity across agent roles (e.g., players vs. ball) and dynamic distribution gaps across different sports domains. Existing unified…
View article: A Competency Framework for Electric Vehicle Maintenance Technicians: Addressing the Environmental, Social, and Governance (ESG) Imperatives of the BEV Industry
A Competency Framework for Electric Vehicle Maintenance Technicians: Addressing the Environmental, Social, and Governance (ESG) Imperatives of the BEV Industry Open
The fast expanding market of battery electric vehicles (BEVs) demands industry-specific competence requirements for maintenance technicians. We have therefore generated a knowledge structure of BEV maintenance through a literature review a…
View article: Trajectory Prediction Meets Large Language Models: A Survey
Trajectory Prediction Meets Large Language Models: A Survey Open
Recent advances in large language models (LLMs) have sparked growing interest in integrating language-driven techniques into trajectory prediction. By leveraging their semantic and reasoning capabilities, LLMs are reshaping how autonomous …
View article: A Competency Framework for Electric Vehicle Maintenance Technicians: Addressing the ESG Imperatives of the BEV Industry
A Competency Framework for Electric Vehicle Maintenance Technicians: Addressing the ESG Imperatives of the BEV Industry Open
The rapid evolution of the battery electric vehicle (BEV) industry calls for a robust, specialized competency framework for maintenance technicians. This study develops such a framework through an extensive literature review and analysis o…
View article: Constructing machine learning-based risk prediction model for osteoarthritis in population aged 45 and above: NHANES 2011–2018
Constructing machine learning-based risk prediction model for osteoarthritis in population aged 45 and above: NHANES 2011–2018 Open
Osteoarthritis is a widespread chronic joint disease, becoming increasingly prevalent, particularly among individuals over the age of 45. This condition causes joint pain and dysfunction, significantly disrupting daily life. The objective …
View article: Boosting Large Language Models with Mask Fine-Tuning
Boosting Large Language Models with Mask Fine-Tuning Open
The model is usually kept integral in the mainstream large language model (LLM) fine-tuning protocols. No works have questioned whether maintaining the integrity of the model is indispensable for performance. In this work, we introduce Mas…
View article: REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder
REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder Open
We present a novel perspective on learning video embedders for generative modeling: rather than requiring an exact reproduction of an input video, an effective embedder should focus on synthesizing visually plausible reconstructions. This …
View article: Slicing Vision Transformer for Flexible Inference
Slicing Vision Transformer for Flexible Inference Open
Vision Transformers (ViT) is known for its scalability. In this work, we target to scale down a ViT to fit in an environment with dynamic-changing resource constraints. We observe that smaller ViTs are intrinsically the sub-networks of a l…
View article: LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field
LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field Open
Recent works have shown that neural radiance fields (NeRFs) on top of parametric models have reached SOTA quality to build photorealistic head avatars from a monocular video. However, one major limitation of the NeRF-based avatars is the s…
View article: High-throughput Image-based Clustering of CAR-T/Tumor Cocultures for Rapid and Facile Hit Identification
High-throughput Image-based Clustering of CAR-T/Tumor Cocultures for Rapid and Facile Hit Identification Open
A bstract Chimeric antigen receptor T cell is important because of its potential to treat various diseases. As deep learning continues to advance, using unsupervised methods to classify medical images has become a significant focus because…
View article: Graphical Abstract: Angew. Chem. Int. Ed. 21/2024
Graphical Abstract: Angew. Chem. Int. Ed. 21/2024 Open
Metal-Organic FrameworksIn their Research Article (e202319177), Lei Li, Qiang Xu et al. report a metal-organic framework-derived Zr/Ti bimetallic oxide solid solution anchored with Au nanoparticles through crystal engineering and derivatio…
View article: Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement Open
Visual program synthesis is a promising approach to exploit the reasoning abilities of large language models for compositional computer vision tasks. Previous work has used few-shot prompting with frozen LLMs to synthesize visual programs.…
View article: OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising
OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising Open
Trajectory prediction is fundamental in computer vision and autonomous driving, particularly for understanding pedestrian behavior and enabling proactive decision-making. Existing approaches in this field often assume precise and complete …
View article: SkipDiff: Adaptive Skip Diffusion Model for High-Fidelity Perceptual Image Super-resolution
SkipDiff: Adaptive Skip Diffusion Model for High-Fidelity Perceptual Image Super-resolution Open
It is well-known that image quality assessment usually meets with the problem of perception-distortion (p-d) tradeoff. The existing deep image super-resolution (SR) methods either focus on high fidelity with pixel-level objectives or high …
View article: Don't Judge by the Look: Towards Motion Coherent Video Representation
Don't Judge by the Look: Towards Motion Coherent Video Representation Open
Current training pipelines in object recognition neglect Hue Jittering when doing data augmentation as it not only brings appearance changes that are detrimental to classification, but also the implementation is inefficient in practice. In…
View article: Encryption and Decryption Using Deep Neural Network
Encryption and Decryption Using Deep Neural Network Open
An auto-encoder which can be split into two parts is designed. The two parts can work well separately. The top half is an abstract network which is trained by supervised learning and can be used to classify and regress. The bottom half is …
View article: Rethinking Neighborhood Consistency Learning on Unsupervised Domain Adaptation
Rethinking Neighborhood Consistency Learning on Unsupervised Domain Adaptation Open
Unsupervised domain adaptation (UDA) involves predicting unlabeled data in a target domain by using labeled data from the source domain. However, recent advances in pseudo-labeling (PL) methods have been hampered by noisy pseudo-labels tha…
View article: Layout Sequence Prediction From Noisy Mobile Modality
Layout Sequence Prediction From Noisy Mobile Modality Open
Trajectory prediction plays a vital role in understanding pedestrian movement\nfor applications such as autonomous driving and robotics. Current trajectory\nprediction models depend on long, complete, and accurately observed sequences\nfro…
View article: Exploring Question Decomposition for Zero-Shot VQA
Exploring Question Decomposition for Zero-Shot VQA Open
Visual question answering (VQA) has traditionally been treated as a single-step task where each question receives the same amount of effort, unlike natural human question-answering strategies. We explore a question decomposition strategy f…
View article: Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection
Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection Open
Camouflaged objects that blend into natural scenes pose significant challenges for deep-learning models to detect and synthesize. While camouflaged object detection is a crucial task in computer vision with diverse real-world applications,…
View article: BEV-DG: Cross-Modal Learning under Bird's-Eye View for Domain Generalization of 3D Semantic Segmentation
BEV-DG: Cross-Modal Learning under Bird's-Eye View for Domain Generalization of 3D Semantic Segmentation Open
Cross-modal Unsupervised Domain Adaptation (UDA) aims to exploit the complementarity of 2D-3D data to overcome the lack of annotation in a new domain. However, UDA methods rely on access to the target domain during training, meaning the tr…
View article: Graphical Abstract: Angew. Chem. Int. Ed. 30/2023
Graphical Abstract: Angew. Chem. Int. Ed. 30/2023 Open
Supramolecular ChemistryIn their Communication (e202305525), Mark J. Mac-Lachlan use a single PtÀPt bond to glue two cyclometalated hosts together and generate a flytrap-like molecule.This compound traps inorganic ions, assembles and disas…
View article: Hybrid Pixel-Unshuffled Network for Lightweight Image Super-resolution
Hybrid Pixel-Unshuffled Network for Lightweight Image Super-resolution Open
Convolutional neural network (CNN) has achieved great success on image super-resolution (SR). However, most deep CNN-based SR models take massive computations to obtain high performance. Downsampling features for multi-resolution fusion is…
View article: Layout Representation Learning with Spatial and Structural Hierarchies
Layout Representation Learning with Spatial and Structural Hierarchies Open
We present a novel hierarchical modeling method for layout representation learning, the core of design documents (e.g., user interface, poster, template). Existing works on layout representation often ignore element hierarchies, which is a…
View article: Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images! Open
Finetuning a large vision language model (VLM) on a target dataset after large scale pretraining is a dominant paradigm in visual question answering (VQA). Datasets for specialized tasks such as knowledge-based VQA or VQA in non natural-im…
View article: SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds
SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds Open
Text-to-image diffusion models can create stunning images from natural language descriptions that rival the work of professional artists and photographers. However, these models are large, with complex network architectures and tens of den…