Explanipedia

BasketLiDAR: The First LiDAR-Camera Multimodal Dataset for Professional Basketball MOT Open

Ryunosuke Hayashi, Kohei Torimi, Rokuto Nagata, Kazuma Ikeda, Ozora Sako , et al. · 2025

Real-time 3D trajectory player tracking in sports plays a crucial role in tactical analysis, performance evaluation, and enhancing spectator experience. Traditional systems rely on multi-camera setups, but are constrained by the inherently…

Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos Open

Yasushige Ishikawa, Shota Nakada, Hokuto Munakata, Kazuhiro Saito, Tatsuya Komatsu , et al. · 2025

In this paper, we propose Language-Guided Contrastive Audio-Visual Masked Autoencoders (LG-CAV-MAE) to improve audio-visual representation learning. LG-CAV-MAE integrates a pretrained text encoder into contrastive audio-visual masked autoe…

Iterative Event-based Motion Segmentation by Variational Contrast Maximization Open

Ryo Yamaki, Yoshimitsu Aoki · 2025

Event cameras provide rich signals that are suitable for motion estimation since they respond to changes in the scene. As any visual changes in the scene produce event data, it is paramount to classify the data into different motions (i.e.…

Postoperative Knee Extensor Strength After Medial Patellofemoral Ligament Reconstruction Using Superficial Slip of the Quadriceps Tendon and the Factors That Affect Strength Recovery Open

Chiharu Inoue, Yoshimitsu Aoki, Kazunori Yasuda, Eiji Kondo, Satoru Kaneko , et al. · 2025

Medicine

Background: Medial patellofemoral ligament reconstruction (MPFLR) using the quadriceps tendon can avoid complications related to the fixation of other graft types to the patella. However, there is concern about postoperative loss of knee e…

Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering Open

Erika Mori, Yue Qiu, Hirokatsu Kataoka, Yoshimitsu Aoki · 2025

Social intelligence, the ability to interpret emotions, intentions, and behaviors, is essential for effective communication and adaptive responses. As robots and AI systems become more prevalent in caregiving, healthcare, and education, th…

Relationship Between Vertical Ground Reaction Force and Acceleration from Wearable Inertial Measurement Units During Single-Leg Drop Landing After Anterior Cruciate Ligament Reconstruction Open

Makoto Suzuki, Tomoya Ishida, Hisashi Matsumoto, Kazuhiko Kondo, Shota Yamaguchi , et al. · 2025

Geology Medicine Physics

The purpose of this study was to clarify the relationship between vertical ground reaction force (VGRF) and acceleration from wearable inertial measurement units (IMUs) during single-leg drop landing after anterior cruciate ligament recons…

Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding Open

Kohei Torimi, Ryosuke Yamada, Daichi Otsuka, Kensho Hara, Yuki Asano , et al. · 2025

Computer science Physics Mathematics

Zero-shot recognition models require extensive training data for generalization. However, in zero-shot 3D classification, collecting 3D data and captions is costly and laborintensive, posing a significant barrier compared to 2D vision. Rec…

A Comprehensive Analysis of a Social Intelligence Dataset and Response Tendencies Between Large Language Models (LLMs) and Humans Open

Erika Mori, Yue Qiu, Hirokatsu Kataoka, Yoshimitsu Aoki · 2025

Computer science Psychology Physics

In recent years, advancements in the interaction and collaboration between humans and have garnered significant attention. Social intelligence plays a crucial role in facilitating natural interactions and seamless communication between hum…

DynamicVLN: Incorporating Dynamics into Vision-and-Language Navigation Scenarios Open

Y. Sun, Yue Qiu, Yoshimitsu Aoki · 2025

Computer science Physics

Traditional Vision-and-Language Navigation (VLN) tasks require an agent to navigate static environments using natural language instructions. However, real-world road conditions such as vehicle movements, traffic signal fluctuations, pedest…

BoundMatch: Boundary Detection Applied to Semi-Supervised Segmentation Open

Haruya Ishikawa, Yoshimitsu Aoki · 2025

Semi-supervised semantic segmentation (SS-SS) aims to mitigate the heavy annotation burden of dense pixel labeling by leveraging abundant unlabeled images alongside a small labeled set. While current consistency regularization methods achi…

RECA: A Pipeline for Refinement of Compressed Artifacts in Image Super-Resolution Training Open

Go Ohtani, Hirokatsu Kataoka, Yoshimitsu Aoki · 2025

Training datasets for image super-resolution (SR) are often constructed from web images. However, these images are typically stored in JPEG format, introducing compression artifacts that degrade SR performance. To ensure data quality, conv…

Relationship Between Quadriceps Strength at 6 Months Postoperatively and Improvement in Patient-Reported Knee Function After Anterior Cruciate Ligament Reconstruction Open

Tomoya Ishida, Makoto Suzuki, Hisashi Matsumoto, Mina Samukawa, Satoru Kaneko , et al. · 2025

Medicine

Background: Understanding the factors associated with poor recovery over time after anterior cruciate ligament reconstruction (ACLR) helps clinicians identify patients who are at risk and targets for an intervention. Purpose: To determine …

Acoustic-based 3D Human Pose Estimation Robust to Human Position Open

Yusuke Oumi, Yuto Shibata, Go Irie, Akisato Kimura, Yoshimitsu Aoki , et al. · 2024

Computer science Business Engineering

This paper explores the problem of 3D human pose estimation from only low-level acoustic signals. The existing active acoustic sensing-based approach for 3D human pose estimation implicitly assumes that the target user is positioned along …

Pre-training with Synthetic Patterns for Audio Open

Yasushige Ishikawa, Tatsuya Komatsu, Yoshimitsu Aoki · 2024

Computer science Geography

In this paper, we propose to pre-train audio encoders using synthetic patterns instead of real audio data. Our proposed framework consists of two key elements. The first one is Masked Autoencoder (MAE), a self-supervised learning framework…

Data Collection-free Masked Video Modeling Open

Yuchi Ishikawa, Masayoshi Kondo, Yoshimitsu Aoki · 2024

Computer science Mathematics

Pre-training video transformers generally requires a large amount of data, presenting significant challenges in terms of data collection costs and concerns related to privacy, licensing, and inherent biases. Synthesizing data is one of the…

Rethinking Image Super-Resolution from Training Data Perspectives Open

Go Ohtani, Ryu Tadokoro, R Yamada, Yuki M. Asano, Iro Laina , et al. · 2024

Computer science Geography

In this work, we investigate the understudied effect of the training data used for image super-resolution (SR). Most commonly, novel SR methods are developed and benchmarked on common training datasets such as DIV2K and DF2K. However, we i…

RetinaViT: Efficient Visual Backbone for Online Video Streams Open

Tomoyuki Suzuki, Yoshimitsu Aoki · 2024

Computer science Biology Physics

In online video understanding, which has a wide range of real-world applications, inference speed is crucial. Many approaches involve frame-level visual feature extraction, which often represents the biggest bottleneck. We propose RetinaVi…

Poster 241: Comparison of Short-Term Clinical Outcomes of Medial Patellofemoral Ligament Reconstruction Using Superficial Quadriceps Tendon and Using Hamstring Tendon in Patella Instability Open

Yuki Suzuki, Yoshimitsu Aoki, Chiharu Inoue, Sho Matsumoto, Satoru Kaneko , et al. · 2024

Medicine

Objectives: Medial patellofemoral ligament reconstruction (MPFLR) is widely acknowledged as a therapeutic approach for patella instability. While hamstring autografts are widely used in MPFLR, there are concerns regarding complications ass…

Proto-Adapter: Efficient Training-Free CLIP-Adapter for Few-Shot Image Classification Open

N. Kato, Yoshiki Nota, Yoshimitsu Aoki · 2024

Computer science

Large vision-language models, such as Contrastive Vision-Language Pre-training (CLIP), pre-trained on large-scale image–text datasets, have demonstrated robust zero-shot transfer capabilities across various downstream tasks. To further enh…

Secrets of Event-Based Optical Flow, Depth and Ego-Motion Estimation by Contrast Maximization Open

Shintaro Shiba, Yannick Klose, Yoshimitsu Aoki, Guillermo Gallego · 2024

Computer science Physics Geography

Event cameras respond to scene dynamics and provide signals naturally suitable for motion estimation with advantages, such as high dynamic range. The emerging field of event-based vision motivates a revisit of fundamental computer vision t…

3D Human Scan With A Moving Event Camera Open

K. Kohyama, Shintaro Shiba, Yoshimitsu Aoki · 2024

Computer science Physics

Capturing a 3D human body is one of the important tasks in computer vision with a wide range of applications such as virtual reality and sports analysis. However, conventional frame cameras are limited by their temporal resolution and dyna…

PCT: Perspective Cue Training Framework for Multi-Camera BEV Segmentation Open

Haruya Ishikawa, Takumi Iida, Yoshinori Konishi, Yoshimitsu Aoki · 2024

Computer science Geography

Generating annotations for bird's-eye-view (BEV) segmentation presents significant challenges due to the scenes' complexity and the high manual annotation cost. In this work, we address these challenges by leveraging the abundance of unlab…

MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation Open

Yasufumi Kawano, Yoshimitsu Aoki · 2024

Computer science Physics

Semantic segmentation is essential in computer vision for various applications, yet traditional approaches face significant challenges, including the high cost of annotation and extensive training for supervised learning. Additionally, due…

TAG: Guidance-free Open-Vocabulary Semantic Segmentation Open

Yasufumi Kawano, Yoshimitsu Aoki · 2024

Computer science Philosophy

Semantic segmentation is a crucial task in computer vision, where each pixel in an image is classified into a category. However, traditional methods face significant challenges, including the need for pixel-level annotations and extensive …

Improving Perceptual Loss with CLIP for Super-Resolution Open

Go Ohtani, Hirokatsu Kataoka, Yoshimitsu Aoki · 2024

Computer science Engineering Mathematics

Perceptual loss, calculated by VGG network pre-trained on ImageNet, has been widely employed in the past for super-resolution tasks, enabling the generation of photo-realistic images. However, it has been reported that grid-like artifacts …

Synthetic Document Images with Diverse Shadows for Deep Shadow Removal Networks Open

Yuhi Matsuo, Yoshimitsu Aoki · 2024

Computer science Psychology

Shadow removal for document images is an essential task for digitized document applications. Recent shadow removal models have been trained on pairs of shadow images and shadow-free images. However, obtaining a large, diverse dataset for d…

MaskDiffusion: Exploiting Pre-Trained Diffusion Models for Semantic Segmentation Open

Yasufumi Kawano, Yoshimitsu Aoki · 2024

Computer science Physics

Semantic segmentation is essential in computer vision for various applications, yet traditional approaches face significant challenges, including the high cost of annotation and extensive training for supervised learning. Additionally, due…

Yoshimitsu Aoki YOU? Author Swipe