Hiroshi Murase
YOU?
Author Swipe
View article: Partial CLIP is Enough: Chimera-Seg for Zero-shot Semantic Segmentation
Partial CLIP is Enough: Chimera-Seg for Zero-shot Semantic Segmentation Open
Zero-shot Semantic Segmentation (ZSS) aims to segment both seen and unseen classes using supervision from only seen classes. Beyond adaptation-based methods, distillation-based approaches transfer vision-language alignment of vision-langua…
View article: BiXFormer: A Robust Framework for Maximizing Modality Effectiveness in Multi-Modal Semantic Segmentation
BiXFormer: A Robust Framework for Maximizing Modality Effectiveness in Multi-Modal Semantic Segmentation Open
Utilizing multi-modal data enhances scene understanding by providing complementary semantic and geometric information. Existing methods fuse features or distill knowledge from multiple modalities into a unified representation, improving ro…
View article: CQVPR: Landmark-aware Contextual Queries for Visual Place Recognition
CQVPR: Landmark-aware Contextual Queries for Visual Place Recognition Open
Visual Place Recognition (VPR) aims to estimate the location of the given query image within a database of geo-tagged images. To identify the exact location in an image, detecting landmarks is crucial. However, in some scenarios, such as u…
View article: Toward Explainable End-to-End Driving Models via Simplified Objectification Constraints
Toward Explainable End-to-End Driving Models via Simplified Objectification Constraints Open
The end-to-end driving models (E2EDMs) convert environmental information into driving actions using a complex transformation which makes E2EDMs have high prediction accuracy. Due to the black-box nature of transformation, the E2EDMs have l…
View article: Frozen is better than learning: A new design of prototype-based classifier for semantic segmentation
Frozen is better than learning: A new design of prototype-based classifier for semantic segmentation Open
Semantic segmentation models comprise an encoder to extract features and a classifier for prediction. However, the learning of the classifier suffers from the ambiguity which is caused by two factors: (1) the weights of a classifier for si…
View article: Generalizable Semantic Vision Query Generation for Zero-shot Panoptic and Semantic Segmentation
Generalizable Semantic Vision Query Generation for Zero-shot Panoptic and Semantic Segmentation Open
Zero-shot Panoptic Segmentation (ZPS) aims to recognize foreground instances and background stuff without images containing unseen categories in training. Due to the visual data sparsity and the difficulty of generalizing from seen to unse…
View article: Category-Level Object Pose Estimation in Heavily Cluttered Scenes by Generalized Two-Stage Shape Reconstructor
Category-Level Object Pose Estimation in Heavily Cluttered Scenes by Generalized Two-Stage Shape Reconstructor Open
In this paper, we propose a method for robust estimation of the pose of an unknown object instance in an object category from a depth image, even if it is occluded. In cluttered scenes, objects are often mutually occluded, and at the same …
View article: Subjective Baggage-Weight Estimation Based on Human Walking Behavior
Subjective Baggage-Weight Estimation Based on Human Walking Behavior Open
We address a new computer vision problem of subjective baggage-weight estimation, where the term subjective weight is defined as how heavy the person feels. In this paper, we propose a method named G2SW+ (Gait to Subjective Weight plus), w…
View article: Improving the Estimation of Partial Discharge Direction Using a Four-Terminal Surface Current Sensor
Improving the Estimation of Partial Discharge Direction Using a Four-Terminal Surface Current Sensor Open
To improve the insulation diagnosis of a medium-voltage (MV) switchgear, we have developed a four-terminal surface current sensor and a method to estimate the direction of partial discharge occurrence. In this method, the accuracy of the d…
View article: CLIP Is Also a Good Teacher: A New Learning Framework for Inductive Zero-shot Semantic Segmentation
CLIP Is Also a Good Teacher: A New Learning Framework for Inductive Zero-shot Semantic Segmentation Open
Generalized Zero-shot Semantic Segmentation aims to segment both seen and unseen categories only under the supervision of the seen ones. To tackle this, existing methods adopt the large-scale Vision Language Models (VLMs) which obtain outs…
View article: More Persuasive Explanation Method for End-to-End Driving Models
More Persuasive Explanation Method for End-to-End Driving Models Open
With the rapid development of autonomous driving technology, a variety of high-performance end-to-end driving models (E2EDMs) are being proposed. In order to understand the computational methods of E2EDMs, pixel-level explanations methods …
View article: Evolution of Object Detection Technologies
Evolution of Object Detection Technologies Open
This paper presents history and evolution of object detection technologies in recent years. Since the introduction of face detection technology by P. Viola and M. Jones in the early 2000s, various object detection methods have been propose…
View article: SDOF-Tracker: Fast and Accurate Multiple Human Tracking by Skipped-Detection and Optical-Flow
SDOF-Tracker: Fast and Accurate Multiple Human Tracking by Skipped-Detection and Optical-Flow Open
Multiple human tracking is a fundamental problem in understanding the context of a visual scene. Although both accuracy and speed are required in real-world applications, recent tracking methods based on deep learning focus on accuracy and…
View article: Implicit Interaction with an Autonomous Personal Mobility Vehicle: Relations of Pedestrians’ Gaze Behavior with Situation Awareness and Perceived Risks
Implicit Interaction with an Autonomous Personal Mobility Vehicle: Relations of Pedestrians’ Gaze Behavior with Situation Awareness and Perceived Risks Open
Interactions between pedestrians and automated vehicles (AVs) will increase significantly with the popularity of AV. However, pedestrians often have not enough trust on the AVs , particularly when they are confused about an AV's intention …
View article: Masked Face Recognition With Mask Transfer and Self-Attention Under the COVID-19 Pandemic
Masked Face Recognition With Mask Transfer and Self-Attention Under the COVID-19 Pandemic Open
Face masks bring a new challenge to face recognition systems especially against the background of the COVID-19 pandemic. In this paper, a method used for mitigating the negative effects of mask defects on face recognition is proposed. Firs…
View article: Context-Aware Contribution Estimation for Feature Aggregation in Video Face Recognition
Context-Aware Contribution Estimation for Feature Aggregation in Video Face Recognition Open
The difficulties in video-based face recognition, such as dramatic pose variations and low quality, can be alleviated by leveraging the rich complementary information between the frames. However, limited by the mini-batch training strategy…
View article: Gaits Generation from a Mimetic Word based on Sound Symbolism
Gaits Generation from a Mimetic Word based on Sound Symbolism Open
The Japanese language is known to have a rich vocabulary of mimetic words, which have the property of sound symbolism; Phonemes that compose the mimetic words are strongly related to the impression of various phenomena. Especially, human g…
View article: SDOF-Tracker: Fast and Accurate Multiple Human Tracking by Skipped-Detection and Optical-Flow
SDOF-Tracker: Fast and Accurate Multiple Human Tracking by Skipped-Detection and Optical-Flow Open
Multiple human tracking is a fundamental problem for scene understanding. Although both accuracy and speed are required in real-world applications, recent tracking methods based on deep learning have focused on accuracy and require substan…
View article: Interaction Detection Between Vehicles and Vulnerable Road Users: A Deep Generative Approach with Attention
Interaction Detection Between Vehicles and Vulnerable Road Users: A Deep Generative Approach with Attention Open
Intersections where vehicles are permitted to turn and interact with vulnerable road users (VRUs) like pedestrians and cyclists are among some of the most challenging locations for automated and accurate recognition of road users' behavior…