Explanipedia

BLIP3o-NEXT: Next Frontier of Native Image Generation Open

Jiuhai Chen, Le Xue, Zhiyang Xu, Xichen Pan, Shusheng Yang , et al. · 2025

We present BLIP3o-NEXT, a fully open-source foundation model in the BLIP3 series that advances the next frontier of native image generation. BLIP3o-NEXT unifies text-to-image generation and image editing within a single architecture, demon…

UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG Open

Can Qin, Zeyuan Chen, Ran Xu · 2025

Multimodal retrieval-augmented generation (MM-RAG) is a key approach for applying large language models (LLMs) and agents to real-world knowledge bases, yet current evaluations are fragmented, focusing on either text or images in isolation…

CoDA: Coding LM via Diffusion Adaptation Open

Haolin Chen, Shiyu Wang, Can Qin, Bo Pang, Zuxin Liu , et al. · 2025

Diffusion language models promise bidirectional context and infilling capabilities that autoregressive coders lack, yet practical systems remain heavyweight. We introduce CoDA, a 1.7B-parameter diffusion coder trained on TPU with a fully o…

HoliTom: Holistic Token Merging for Fast Video Large Language Models Open

Kang Shao, Kai Tao, Can Qin, Haoxuan You, Sui Yang , et al. · 2025

Video large language models (video LLMs) excel at video comprehension but face significant computational inefficiency due to redundant video tokens. Existing token pruning methods offer solutions. However, approaches operating within the L…

Design and Adaptability Analysis of Integrated Pressurization–Gas Lifting Multifunctional Compressor for Enhanced Shale Gas Production Flexibility Open

Kunyi Wu, Lin Qu, Jun Zhou, Yan He, Yuting Wu , et al. · 2025

Shale gas development has made significant contributions to the increase in natural gas production capacity in recent years, particularly in promoting the transformation of the energy structure and enhancing energy autonomy. However, with …

Two-Stage Rapid Expansion Optimization Method for Complex Natural Gas Pipeline Networks Integrating Congestion Identification and Multiple Expansion Modes Open

Jinghong Peng, Jing Yang, Jun Zhou, Can Qin, Guangchuan Liang , et al. · 2025

DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models Open

Kai Tao, Can Qin, Haoxuan You, Yang Sui, Huan Wang · 2024

Video large language models (VLLMs) have significantly advanced recently in processing complex video content, yet their inference efficiency remains constrained because of the high computational cost stemming from the thousands of visual t…

A Data Augmentation Method and the Embedding Mechanism for Detection of Pulmonary Nodules on Small Samples Open

Yang Liu, Yong Hou, Can Qin, Xiaomei Li, S J Li , et al. · 2024

Lung Computed Tomography (CT) screening for pulmonary nodules provides an effective method for early diagnosis. The deep-learning-based computer-aided detection (CAD) system effectively identifies and precisely localizes suspicious pulmona…

Too much social media? Unveiling the effects of determinants in social media fatigue Open

Can Qin, Ying Li, Tian Wang, Jing Zhao, Ling Tong , et al. · 2024

Introduction With the boom in social media, many people spend a lot of time on these platforms. Among them, some developed negative emotions, such as fatigue, depression, or disinterest in communicating, and used social media temporarily o…

STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical Question-Answering Open

Guohao Sun, Can Qin, Huazhu Fu, Linwei Wang, Zhiqiang Tao · 2024

Large Vision-Language Models (LVLMs) have shown significant potential in assisting medical diagnosis by leveraging extensive biomedical datasets. However, the advancement of medical image understanding and reasoning critically depends on b…

M3SOT: Multi-Frame, Multi-Field, Multi-Space 3D Single Object Tracking Open

Jiaming Liu, Yue Wu, Maoguo Gong, Qiguang Miao, Wenping Ma , et al. · 2024

3D Single Object Tracking (SOT) stands a forefront task of computer vision, proving essential for applications like autonomous driving. Sparse and occluded data in scene point clouds introduce variations in the appearance of tracked object…

Multi-Period Optimal Configuration and Scheduling of Natural Gas Storage Facilities: A Holistic Approach to Ensure Pipeline Network Supply Stability and Economy Open

Jinghong Peng, Shitao Liu, Jing Yang, Jun Zhou, Guangchuan Liang , et al. · 2024

M3SOT: Multi-frame, Multi-field, Multi-space 3D Single Object Tracking Open

Jiaming Liu, Yue Wu, Maoguo Gong, Qiguang Miao, Wenping Ma , et al. · 2023

3D Single Object Tracking (SOT) stands a forefront task of computer vision, proving essential for applications like autonomous driving. Sparse and occluded data in scene point clouds introduce variations in the appearance of tracked object…

Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection Open

Haichao Zhang, Can Qin, Yu Yin, Yun Fu · 2023

Camouflaged objects that blend into natural scenes pose significant challenges for deep-learning models to detect and synthesize. While camouflaged object detection is a crucial task in computer vision with diverse real-world applications,…

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild Open

Can Qin, Shu Zhang, Ning Yu, Yihao Feng, Xinyi Yang , et al. · 2023

Achieving machine autonomy and human control often represent divergent objectives in the design of interactive AI systems. Visual generative foundation models such as Stable Diffusion show promise in navigating these goals, especially when…

Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations Open

Vibashan VS, Ning Yu, Xing Chen, Can Qin, Mingfei Gao , et al. · 2023

Existing instance segmentation models learn task-specific information using manual mask annotations from base (training) categories. These mask annotations require tremendous human effort, limiting the scalability to annotate novel (new) c…

GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation Open

Can Qin, Ning Yu, Xing Chen, Shu Zhang, Zeyuan Chen , et al. · 2023

Text-to-image (T2I) models based on diffusion processes have achieved remarkable success in controllable image generation using user-provided captions. However, the tight coupling between the current text encoder and image decoder in T2I m…

HIVE: Harnessing Human Feedback for Instructional Visual Editing Open

Shu Zhang, Xinyi Yang, Yihao Feng, Can Qin, Chia-Chih Chen , et al. · 2023

Incorporating human feedback has been shown to be crucial to align text generated by large language models to human preferences. We hypothesize that state-of-the-art instructional image editing models, where outputs are generated based on …

Image as Set of Points Open

Xu Ma, Yuqian Zhou, Huan Wang, Can Qin, Bin Sun , et al. · 2023

What is an image and how to extract latent features? Convolutional Networks (ConvNets) consider an image as organized pixels in a rectangular shape and extract features via convolutional operation in local region; Vision Transformers (ViTs…

Making Reconstruction-based Method Great Again for Video Anomaly Detection Open

Yizhou Wang, Can Qin, Yue Bai, Yi Xu, Xu Ma , et al. · 2023

Anomaly detection in videos is a significant yet challenging problem. Previous approaches based on deep neural networks employ either reconstruction-based or prediction-based approaches. Nevertheless, existing reconstruction-based methods …

Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Pruning Open

Huan Wang, Can Qin, Yue Bai, Yun Fu · 2023

The state of neural network pruning has been noticed to be unclear and even confusing for a while, largely due to "a lack of standardized benchmarks and metrics" [3]. To standardize benchmarks, first, we need to answer: what kind of compar…

Unveiling the power of transfer learning towards efficient artificial intelligence Open

Can Qin · 2023

Large-scale models, abundant data, and dense computation are the pivotal pillars of deep neural networks. The present-day deep learning models have made significant strides in various areas such as Computer Vision (CV), Natural Language Pr…

A Close Look at Spatial Modeling: From Attention to Convolution Open

Xu Ma, Huan Wang, Can Qin, Kunpeng Li, Xingchen Zhao , et al. · 2022

Vision Transformers have shown great promise recently for many vision tasks due to the insightful architecture design and attention mechanism. By revisiting the self-attention responses in Transformers, we empirically observe two interesti…

Detection and ranging of small targets on water based on binocular camera and improved YOLOv5 algorithm Open

Yongguo Li, Caiyin Xu, Can Qin, Xiangyan Li, Xuan Tang · 2022

In order to meet the needs of intelligent ships to capture and grasp small targets while navigating on water and to be able to sense and avoid small targets, a water target detection method based on the YOLOv5-s algorithm is proposed, and …

MemREIN: Rein the Domain Shift for Cross-Domain Few-Shot Learning Open

Yi Xu, Lichen Wang, Yizhou Wang, Can Qin, Yulun Zhang , et al. · 2022

Few-shot learning aims to enable models generalize to new categories (query instances) with only limited labeled samples (support instances) from each category. Metric-based mechanism is a promising direction which compares feature embeddi…

Recent Advances on Neural Network Pruning at Initialization Open

Huan Wang, Can Qin, Yue Bai, Yulun Zhang, Yun Fu · 2022

Neural network pruning typically removes connections or neurons from a pretrained converged model; while a new pruning paradigm, pruning at initialization (PaI), attempts to prune a randomly initialized network. This paper offers the first…

Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework Open

Xu Ma, Can Qin, Haoxuan You, Haoxi Ran, Yun Fu · 2022

Point cloud analysis is challenging due to irregularity and unordered data structure. To capture the 3D geometries, prior works mainly rely on exploring sophisticated local geometric extractors using convolution, graph, or attention mechan…

Self-directed online machine learning for topology optimization Open

Changyu Deng, Yizhou Wang, Can Qin, Yun Fu, Wei Lu · 2022

Semi-Supervised Domain Adaptive Structure Learning Open

Can Qin, Lichen Wang, Qianqian Ma, Yu Yin, Huan Wang , et al. · 2022

Semi-supervised domain adaptation (SSDA) is quite a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains. Unfortunately, a simple combination of domain…

SLA$^2$P: Self-supervised Anomaly Detection with Adversarial Perturbation Open

Yizhou Wang, Can Qin, Rongzhe Wei, Yi Xu, Yue Bai , et al. · 2021

Anomaly detection is a fundamental yet challenging problem in machine learning due to the lack of label information. In this work, we propose a novel and powerful framework, dubbed as SLA$^2$P, for unsupervised anomaly detection. After ext…

Can Qin YOU? Author Swipe