Gui-Song Xia
YOU?
Author Swipe
View article: Efficient Point-Based Neural Network For Finish Line Extraction
Efficient Point-Based Neural Network For Finish Line Extraction Open
Aim or purpose: Accurate extraction of finish lines from prepared teeth is critical for optimizing CAD/CAM workflows and is essential for enabling controlled automatic crown design. This study proposes a lightweight point-based neural netw…
View article: DragOSM: Extract Building Roofs and Footprints from Aerial Images by Aligning Historical Labels
DragOSM: Extract Building Roofs and Footprints from Aerial Images by Aligning Historical Labels Open
Extracting polygonal roofs and footprints from remote sensing images is critical for large-scale urban analysis. Most existing methods rely on segmentation-based models that assume clear semantic boundaries of roofs, but these approaches s…
View article: Creative Abrasion: Cultural Distance and Migrant Entrepreneurship
Creative Abrasion: Cultural Distance and Migrant Entrepreneurship Open
View article: CC-Former: Urban Flood Mapping from InSAR Coherence with Vision Transformer: Libya and Storm Daniel as Test Case
CC-Former: Urban Flood Mapping from InSAR Coherence with Vision Transformer: Libya and Storm Daniel as Test Case Open
Urban flooding is a recurring and distressing issue with severe consequences, including the destruction of densely populated infrastructure and loss of life. Mapping inundated urban areas using synthetic aperture radar (SAR) data is crucia…
View article: Evolving superpixel-level affinity based on contrastive learning and good neighbors for hyperspectral image clustering
Evolving superpixel-level affinity based on contrastive learning and good neighbors for hyperspectral image clustering Open
View article: Vision-Language Modeling Meets Remote Sensing: <i>Models, datasets, and perspectives</i>
Vision-Language Modeling Meets Remote Sensing: <i>Models, datasets, and perspectives</i> Open
Vision-language modeling (VLM) aims to bridge the information gap between images and natural language. Under the new paradigm of first pre-training on massive image-text pairs and then fine-tuning on task-specific data, VLM in the remote s…
View article: Seeing through Satellite Images at Street Views
Seeing through Satellite Images at Street Views Open
This paper studies the task of SatStreet-view synthesis, which aims to render photorealistic street-view panorama images and videos given any satellite image and specified camera positions or trajectories. We formulate to learn neural radi…
View article: SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model
SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model Open
Remote sensing has become critical for understanding environmental dynamics, urban planning, and disaster management. However, traditional remote sensing workflows often rely on explicit segmentation or detection methods, which struggle to…
View article: VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis Open
This paper develops a Versatile and Honest vision language Model (VHM) for remote sensing image analysis. VHM is built on a large-scale remote sensing image-text dataset with rich-content captions (VersaD), and an honest instruction datase…
View article: Learning Fine-grained Domain Generalization via Hyperbolic State Space Hallucination
Learning Fine-grained Domain Generalization via Hyperbolic State Space Hallucination Open
Fine-grained domain generalization (FGDG) aims to learn a fine-grained representation that can be well generalized to unseen target domains when only trained on the source domain data. Compared with generic domain generalization, FGDG is p…
View article: Learning Fine-grained Domain Generalization via Hyperbolic State Space Hallucination
Learning Fine-grained Domain Generalization via Hyperbolic State Space Hallucination Open
Fine-grained domain generalization (FGDG) aims to learn a fine-grained representation that can be well generalized to unseen target domains when only trained on the source domain data. Compared with generic domain generalization, FGDG is p…
View article: UniCalib: Targetless LiDAR-Camera Calibration via Probabilistic Flow on Unified Depth Representations
UniCalib: Targetless LiDAR-Camera Calibration via Probabilistic Flow on Unified Depth Representations Open
Precise LiDAR-camera calibration is crucial for integrating these two sensors into robotic systems to achieve robust perception. In applications like autonomous driving, online targetless calibration enables a prompt sensor misalignment co…
View article: Model Hemorrhage and the Robustness Limits of Large Language Models
Model Hemorrhage and the Robustness Limits of Large Language Models Open
Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment through quantization, pruning, or decoding strategy adjustme…
View article: Mitigating the Impact of Prominent Position Shift in Drone-based RGBT Object Detection
Mitigating the Impact of Prominent Position Shift in Drone-based RGBT Object Detection Open
Drone-based RGBT object detection plays a crucial role in many around-the-clock applications. However, real-world drone-viewed RGBT data suffers from the prominent position shift problem, i.e., the position of a tiny object differs greatly…
View article: RefDrone: A Challenging Benchmark for Referring Expression Comprehension in Drone Scenes
RefDrone: A Challenging Benchmark for Referring Expression Comprehension in Drone Scenes Open
Drones have become prevalent robotic platforms with diverse applications, showing significant potential in Embodied Artificial Intelligence (Embodied AI). Referring Expression Comprehension (REC) enables drones to locate objects based on n…
View article: Mask Clustering-Based Annotation Engine for Large-Scale Submeter Land Cover Mapping
Mask Clustering-Based Annotation Engine for Large-Scale Submeter Land Cover Mapping Open
Recent advances in remote sensing technology have made submeter resolution imagery increasingly accessible, offering remarkable detail for fine-grained land cover analysis. However, its full potential remains underutilized - particularly f…
View article: Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient
Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient Open
View article: UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation
UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation Open
Recently, text-to-image generation models have achieved remarkable advancements, particularly with diffusion models facilitating high-quality image synthesis from textual descriptions. However, these models often struggle with achieving pr…
View article: Institutional Environment and Productive Entrepreneurship
Institutional Environment and Productive Entrepreneurship Open
In the context of intensifying global competition, productive entrepreneurship plays an important role in industrial upgrading and sustainable economic development. This study explores how the institutional environment affects productive e…
View article: Oriented Tiny Object Detection: A Dataset, Benchmark, and Dynamic Unbiased Learning
Oriented Tiny Object Detection: A Dataset, Benchmark, and Dynamic Unbiased Learning Open
Detecting oriented tiny objects, which are limited in appearance information yet prevalent in real-world applications, remains an intricate and under-explored problem. To address this, we systemically introduce a new dataset, benchmark, an…
View article: Tiny Object Detection with Single Point Supervision
Tiny Object Detection with Single Point Supervision Open
Tiny objects, with their limited spatial resolution, often resemble point-like distributions. As a result, bounding box prediction using point-level supervision emerges as a natural and cost-effective alternative to traditional box-level s…
View article: QuadricsReg: Large-Scale Point Cloud Registration using Quadric Primitives
QuadricsReg: Large-Scale Point Cloud Registration using Quadric Primitives Open
In the realm of large-scale point cloud registration, designing a compact symbolic representation is crucial for efficiently processing vast amounts of data, ensuring registration robustness against significant viewpoint variations and occ…
View article: MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras
MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras Open
Making multi-camera visual SLAM systems easier to set up and more robust to the environment is attractive for vision robots. Existing monocular and binocular vision SLAM systems have narrow sensing Field-of-View (FoV), resulting in degener…
View article: LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References
LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References Open
Change detection, which typically relies on the comparison of bi-temporal images, is significantly hindered when only a single image is available. Comparing a single image with an existing map, such as OpenStreetMap, which is continuously …
View article: Intensity Field Decomposition for Tissue-Guided Neural Tomography
Intensity Field Decomposition for Tissue-Guided Neural Tomography Open
Cone-beam computed tomography (CBCT) typically requires hundreds of X-ray projections, which raises concerns about radiation exposure. While sparse-view reconstruction reduces the exposure by using fewer projections, it struggles to achiev…
View article: Partial Distribution Matching via Partial Wasserstein Adversarial Networks
Partial Distribution Matching via Partial Wasserstein Adversarial Networks Open
This paper studies the problem of distribution matching (DM), which is a fundamental machine learning problem seeking to robustly align two probability distributions. Our approach is established on a relaxed formulation, called partial dis…
View article: Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation
Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation Open
This paper explores scene affinity (AIScene), namely intra-scene consistency and inter-scene correlation, for semi-supervised LiDAR semantic segmentation in driving scenes. Adopting teacher-student training, AIScene employs a teacher netwo…
View article: Time-series satellite remote sensing reveals gradually increasing war damage in the Gaza Strip
Time-series satellite remote sensing reveals gradually increasing war damage in the Gaza Strip Open
War-related urban destruction is a significant global concern, impacting national security, social stability, people’s survival and economic development. The effects of urban geomorphology and complex geological contexts during conflicts, …
View article: DMTG: One-Shot Differentiable Multi-Task Grouping
DMTG: One-Shot Differentiable Multi-Task Grouping Open
We aim to address Multi-Task Learning (MTL) with a large number of tasks by Multi-Task Grouping (MTG). Given N tasks, we propose to simultaneously identify the best task groups from 2^N candidates and train the model weights simultaneously…
View article: Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference
Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference Open
Humans can easily deduce the relative pose of a previously unseen object, without labeling or training, given only a single query-reference image pair. This is arguably achieved by incorporating i) 3D/2.5D shape perception from a single im…