Explanipedia

TrueCount: Improving Open-World Object Counting with Visual-Language Models and Dynamic Multi-Modal Inputs Open

Ziqiang Shi, R. G. Liu, Jun Takahashi, Shan Jiang · 2025

Generative Modelling with High-Order Langevin Dynamics Open

Ziqiang Shi, Rujie Liu · 2024

Diffusion generative modelling (DGM) based on stochastic differential equations (SDEs) with score matching has achieved unprecedented results in data generation. In this paper, we propose a novel fast high-quality generative modelling meth…

ItôWave: Itô Stochastic Differential Equation Is All You Need For Wave Generation Open

Shoule Wu, Ziqiang Shi · 2022

In this paper, we propose a vocoder based on a pair of forward and reverse-time linear stochastic differential equations (SDE). The solutions of this SDE pair are two stochastic processes, one of which turns the distribution of wave, that …

Iton: End-to-End Audio Generation with Ito Stochastic Differential Equations Open

Ziqiang Shi, Shoule Wu · 2022

Schrowave: Realistic Voice Generation by Solving Two-Stage Conditional Schrodinger Bridge Problems Open

Shoule Wu, Ziqiang Shi · 2022

Multi-modal Affect Analysis using standardized data within subjects in the Wild Open

Sachihiro Youoku, Takahisa Yamamoto, Junya Saito, Akiyoshi Uchida, Xiaoyu Mi , et al. · 2021

Human affective recognition is an important factor in human-computer interaction. However, the method development with in-the-wild data is not yet accurate enough for practical usage. In this paper, we introduce the affective recognition m…

ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation Open

Shoule Wu, Ziqiang Shi · 2021

In this paper, we propose to unify the two aspects of voice synthesis, namely text-to-speech (TTS) and vocoder, into one framework based on a pair of forward and reverse-time linear stochastic differential equations (SDE). The solutions of…

It$\hat{\text{o}}$TTS and It$\hat{\text{o}}$Wave: Linear Stochastic Differential Equation Is All You Need For Audio Generation. Open

Shoule Wu, Ziqiang Shi · 2021

In this paper, we propose to unify the two aspects of voice synthesis, namely text-to-speech (TTS) and vocoder, into one framework based on a pair of forward and reverse-time linear stochastic differential equations (SDE). The solutions of…

HiCOMEX: Facial Action Unit Recognition Based on Hierarchy Intensity Distribution and COMEX Relation Learning Open

Ziqiang Shi, Liu Liu, Zhongling Liu, Rujie Liu, Xiaoyu Mi , et al. · 2020

The detection of facial action units (AUs) has been studied as it has the competition due to the wide-ranging applications thereof. In this paper, we propose a novel framework for the AU detection from a single input image by grasping the …

Toward the pre-cocktail party problem with TasTas+. Open

Anyan Shi, Jiqing Han, Ziqiang Shi · 2020

Deep neural network with dual-path bi-directional long short-term memory (BiLSTM) block has been proved to be very effective in sequence modeling, especially in speech separation, e.g. DPRNN-TasNet \cite{luo2019dual}, TasTas \cite{shi2020s…

Toward Speech Separation in The Pre-Cocktail Party Problem with TasTas Open

Ziqiang Shi, Jiqing Han · 2020

In this note, we propose to use TasTas \cite{shi2020speech} for the end-to-end approach to monaural speech separation in the pre-cocktail party problem. Our experiments on the public WSJ0-5mix data corpus results in 10.41dB SDR improvement…

Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss Open

Ziqiang Shi, Rujie Liu, Jiqing Han · 2020

Deep neural network with dual-path bi-directional long short-term memory (BiLSTM) block has been proved to be very effective in sequence modeling, especially in speech separation. This work investigates how to extend dual-path BiLSTM to re…

SingCubic: Cyclic Incremental Newton-type Gradient Descent with Cubic Regularization for Non-Convex Optimization Open

Ziqiang Shi · 2020

In this work, we generalized and unified two recent completely different works of~\cite{shi2015large} and~\cite{cartis2012adaptive} respectively into one by proposing the cyclic incremental Newton-type gradient descent with cubic regulariz…

Hodge and Podge: Hybrid Supervised Sound Event Detection with Multi-Hot MixMatch and Composition Consistence Training Open

Ziqiang Shi, Liu Liu, Huibin Lin, Rujie Liu · 2020

In this paper, we propose a method called Hodge and Podge for sound event detection. We demonstrate Hodge and Podge on the dataset of Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Challenge Task 4. This task aims …

LaFurca: Iterative Multi-Stage Refined End-to-End Monaural Speech Separation Based on Context-Aware Dual-Path Deep Parallel Inter-Intra Bi-LSTM Open

Ziqiang Shi, Rujie Liu, Jiqing Han · 2020

Deep neural network with dual-path bi-directional long short-term memory (BiLSTM) block has been proved to be very effective in sequence modeling, especially in speech separation, e.g. DPRNN-TasNet \cite{luo2019dual}. In this paper, we pro…

LaFurca: Iterative Refined Speech Separation Based on Context-Aware Dual-Path Parallel Bi-LSTM Open

Ziqiang Shi, Rujie Liu, Jiqing Han · 2020

Deep neural network with dual-path bi-directional long short-term memory (BiLSTM) block has been proved to be very effective in sequence modeling, especially in speech separation, e.g. DPRNN-TasNet \cite{luo2019dual}. In this paper, we pro…

HODGEPODGE: Sound event detection based on ensemble of semi-supervised learning methods Open

Ziqiang Shi, Liu Liu, Huibin Lin, Rujie Liu, Anyan Shi · 2019

In this paper, we present a method called HODGEPODGE\footnotemark[1] for large-scale detection of sound events using weakly labeled, synthetic, and unlabeled data proposed in the Detection and Classification of Acoustic Scenes and Events (…

Learning from Adversarial Features for Few-Shot Classification Open

Wei Shen, Ziqiang Shi, Jun Sun · 2019

Many recent few-shot learning methods concentrate on designing novel model architectures. In this paper, we instead show that with a simple backbone convolutional network we can even surpass state-of-the-art classification accuracy. The es…

FurcaNeXt: End-to-end monaural speech separation with dynamic gated dilated temporal convolutional networks Open

Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Jiqing Han , et al. · 2019

Deep dilated temporal convolutional networks (TCN) have been proved to be very effective in sequence modeling. In this paper we propose several improvements of TCN for end-to-end approach to monaural speech separation, which consists of 1)…

FurcaNet: An end-to-end deep gated convolutional, long short-term memory, deep neural networks for single channel speech separation Open

Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Shoji Hayakawa , et al. · 2019

Deep gated convolutional networks have been proved to be very effective in single channel speech separation. However current state-of-the-art framework often considers training the gated convolutional networks in time-frequency (TF) domain…

Is CQT more suitable for monaural speech separation than STFT? an empirical study Open

Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Jiqing Han · 2019

Short-time Fourier transform (STFT) is used as the front end of many popular successful monaural speech separation methods, such as deep clustering (DPCL), permutation invariant training (PIT) and their various variants. Since the frequenc…

HODGEPODGE: Sound Event Detection Based on Ensemble of Semi-Supervised Learning Methods Open

Ziqiang Shi, Liu Liu, Huibin Lin, Rujie Liu, Anyan Shi · 2019

In this paper, we present a method called HODGEPODGE\\footnotemark[1] for large-scale detection of sound events using weakly labeled, synthetic, and unlabeled data proposed in the Detection and Classification of Acoustic Scenes and Events …

Deep Clustering With Constant Q Transform For Multi-Talker Single Channel Speech Separation Open

Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Shoji Hayakawa , et al. · 2018

Deep clustering technique is a state-of-the-art deep learning-based method for multi-talker speaker-independent speech separation. It solves the label ambiguity problem by mapping time-frequency (TF) bins of the mixed spectrogram to an emb…

A Double Joint Bayesian Approach for J-Vector Based Text-dependent Speaker Verification Open

Ziqiang Shi, Mengjiao Wang, Liu Liu, Huibin Lin, Rujie Liu · 2017

J-vector has been proved to be very effective in text-dependent speaker verification with short-duration speech. However, the current state-of-the-art back-end classifiers, e.g. joint Bayesian model, cannot make full use of such deep featu…

Multi-view Probability Linear Discrimination Analysis for Multi-view Vector Based Text Dependent Speaker Verification. Open

Ziqiang Shi, Liu Liu, Rujie Liu · 2017

Multi-view (Joint) Probability Linear Discrimination Analysis for Multi-view Feature Verification Open

Ziqiang Shi, Liu Liu, Mengjiao Wang, Rujie Liu · 2017

Multi-view feature has been proved to be very effective in many multimedia applications. However, the current back-end classifiers cannot make full use of such features. In this paper, we propose a method to model the multi-faceted informa…

A better convergence analysis of the block coordinate descent method for large scale machine learning Open

Ziqiang Shi, Rujie Liu · 2016

This paper considers the problems of unconstrained minimization of large scale smooth convex functions having block-coordinate-wise Lipschitz continuous gradients. The block coordinate descent (BCD) method are among the first optimization …

Empirical study of PROXTONE and PROXTONE$^+$ for Fast Learning of Large Scale Sparse Models Open

Ziqiang Shi, Rujie Liu · 2016

PROXTONE is a novel and fast method for optimization of large scale non-smooth convex problem \cite{shi2015large}. In this work, we try to use PROXTONE method in solving large scale \emph{non-smooth non-convex} problems, for example traini…

Ziqiang Shi YOU? Author Swipe