Hanseok Ko
YOU?
Author Swipe
View article: Performance Evaluation Metrics for Empathetic LLMs
Performance Evaluation Metrics for Empathetic LLMs Open
With the rapid advancement of large language models (LLMs), recent systems have demonstrated increasing capability in understanding and expressing human emotions. However, no objective and standardized metric currently exists to evaluate h…
View article: Training a Team of Language Models as Options to Build an SQL-Based Memory
Training a Team of Language Models as Options to Build an SQL-Based Memory Open
Despite the rapid progress in the capabilities of large language models, they still lack a reliable and efficient method of storing and retrieving new information conveyed over the course of their interaction with users upon deployment. In…
View article: The Karmic Theory of Inequality
The Karmic Theory of Inequality Open
This paper introduces the Karmic Theory of Inequality (KTI) with the formal explanation for the persistence of individual differences through the lens of Buddhist philosophy. The theory conceptualizes inequality as the cumulative outcome o…
View article: Time-Series Representation Feature Refinement with a Learnable Masking Augmentation Framework in Contrastive Learning
Time-Series Representation Feature Refinement with a Learnable Masking Augmentation Framework in Contrastive Learning Open
In this study, we propose a novel framework for time-series representation learning that integrates a learnable masking-augmentation strategy into a contrastive learning framework. Time-series data pose challenges due to their temporal dep…
View article: WaveVC: Speech and Fundamental Frequency Consistent Raw Audio Voice Conversion
WaveVC: Speech and Fundamental Frequency Consistent Raw Audio Voice Conversion Open
Voice conversion (VC) is a task for changing the speech of a source speaker to the target voice while preserving linguistic information of the source speech. The existing VC methods typically use mel-spectrogram as both input and output, s…
View article: Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices
Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices Open
In recent times, there has been a growing interest in utilizing personalized large models on low-spec devices, such as mobile and CPU-only devices. However, utilizing a personalized large model in the on-device is inefficient, and sometime…
View article: Towards Multi-domain Face Landmark Detection with Synthetic Data from Diffusion model
Towards Multi-domain Face Landmark Detection with Synthetic Data from Diffusion model Open
Recently, deep learning-based facial landmark detection for in-the-wild faces has achieved significant improvement. However, there are still challenges in face landmark detection in other domains (e.g. cartoon, caricature, etc). This is du…
View article: 4D Facial Avatar Reconstruction From Monocular Video via Efficient and Controllable Neural Radiance Fields
4D Facial Avatar Reconstruction From Monocular Video via Efficient and Controllable Neural Radiance Fields Open
We present an efficient approach for monocular 4D facial avatar reconstruction using a dynamic neural radiance field (NeRF). Over the years, NeRFs have been popular methods for 3D scene representation, but lack computational efficiency and…
View article: SYRFA: SYnthetic-to-Real Adaptation via Feature Alignment for Video Anomaly Detection
SYRFA: SYnthetic-to-Real Adaptation via Feature Alignment for Video Anomaly Detection Open
Video Anomaly Detection (VAD) has garnered significant attention in computer vision, especially with the exponential growth of surveillance videos. Recently, the synthetic dataset has been released to address the imbalance problem between …
View article: Cognitive Refined Augmentation for Video Anomaly Detection in Weak Supervision
Cognitive Refined Augmentation for Video Anomaly Detection in Weak Supervision Open
Weakly supervised video anomaly detection is a methodology that assesses anomaly levels in individual frames based on labeled video data. Anomaly scores are computed by evaluating the deviation of distances derived from frames in an unbias…
View article: ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models Open
Generating novel views of an object from a single image is a challenging task. It requires an understanding of the underlying 3D structure of the object from an image and rendering high-quality, spatially consistent new views. While recent…
View article: Voice as a Biomarker to Detect Acute Decompensated Heart Failure: Pilot Study for the Analysis of Voice Using Deep Learning Models
Voice as a Biomarker to Detect Acute Decompensated Heart Failure: Pilot Study for the Analysis of Voice Using Deep Learning Models Open
Background Acute decompensated heart failure (ADHF) is a systemic congestion state requiring timely management. Admission for ADHF is closely related to the readmission and post-discharge mortality in patients, which makes it imperative to…
View article: WaveVC: Speech and Fundamental Frequency Consistent Raw Audio Voice Conversion
WaveVC: Speech and Fundamental Frequency Consistent Raw Audio Voice Conversion Open
Voice conversion (VC) is a task for changing the speech of a source speaker to the target voice style while preserving linguistic information of the source speech. Existing VC methods require a separate vocoder because they output mel-spec…
View article: MPE4G: Multimodal Pretrained Encoder for Co-Speech Gesture Generation
MPE4G: Multimodal Pretrained Encoder for Co-Speech Gesture Generation Open
When virtual agents interact with humans, gestures are crucial to delivering their intentions with speech. Previous multimodal co-speech gesture generation models required encoded features of all modalities to generate gestures. If some in…
View article: Spatial-temporal Transformer-guided Diffusion based Data Augmentation for Efficient Skeleton-based Action Recognition
Spatial-temporal Transformer-guided Diffusion based Data Augmentation for Efficient Skeleton-based Action Recognition Open
Recently, skeleton-based human action has become a hot research topic because the compact representation of human skeletons brings new blood to this research domain. As a result, researchers began to notice the importance of using RGB or o…
View article: Reference Guided Image Inpainting using Facial Attributes
Reference Guided Image Inpainting using Facial Attributes Open
Image inpainting is a technique of completing missing pixels such as occluded region restoration, distracting objects removal, and facial completion. Among these inpainting tasks, facial completion algorithm performs face inpainting accord…
View article: Single Cell Training on Architecture Search for Image Denoising
Single Cell Training on Architecture Search for Image Denoising Open
Neural Architecture Search (NAS) for automatically finding the optimal network architecture has shown some success with competitive performances in various computer vision tasks. However, NAS in general requires a tremendous amount of comp…
View article: 3d human motion generation from the text via gesture action classification and the autoregressive model
3d human motion generation from the text via gesture action classification and the autoregressive model Open
In this paper, a deep learning-based model for 3D human motion generation from the text is proposed via gesture action classification and an autoregressive model. The model focuses on generating special gestures that express human thinking…
View article: DIFAI: Diverse Facial Inpainting using StyleGAN Inversion
DIFAI: Diverse Facial Inpainting using StyleGAN Inversion Open
Image inpainting is an old problem in computer vision that restores occluded\nregions and completes damaged images. In the case of facial image inpainting,\nmost of the methods generate only one result for each masked image, even though\nt…
View article: Pose-Guided Graph Convolutional Networks for Skeleton-Based Action Recognition
Pose-Guided Graph Convolutional Networks for Skeleton-Based Action Recognition Open
Graph convolutional networks (GCNs), which can model the human body skeletons as spatial and temporal graphs, have shown remarkable potential in skeleton-based action recognition. However, in the existing GCN-based methods, graph-structure…
View article: Controllable Face Manipulation and UV Map Generation by Self-supervised Learning
Controllable Face Manipulation and UV Map Generation by Self-supervised Learning Open
Although manipulating facial attributes by Generative Adversarial Networks (GANs) has been remarkably successful recently, there are still some challenges in explicit control of features such as pose, expression, lighting, etc. Recent meth…
View article: Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis
Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis Open
Over the years, 2D GANs have achieved great successes in photorealistic portrait generation. However, they lack 3D understanding in the generation process, thus they suffer from multi-view inconsistency problem. To alleviate the issue, man…
View article: Generate and Edit Your Own Character in a Canonical View
Generate and Edit Your Own Character in a Canonical View Open
Recently, synthesizing personalized characters from a single user-given portrait has received remarkable attention as a drastic popularization of social media and the metaverse. The input image is not always in frontal view, thus it is imp…
View article: Efficient dynamic filter for robust and low computational feature extraction
Efficient dynamic filter for robust and low computational feature extraction Open
Unseen noise signal which is not considered in a model training process is difficult to anticipate and would lead to performance degradation. Various methods have been investigated to mitigate unseen noise. In our previous work, an Instanc…
View article: Searching similar weather maps using convolutional autoencoder and satellite images
Searching similar weather maps using convolutional autoencoder and satellite images Open
A weather forecaster predicts the weather by analyzing current weather map images generated by a satellite. In this analyzing process, the accuracy of the prediction depends highly on the forecaster’s experience which is needed to recollec…