Explanipedia

Flux homomorphism and bilinear form constructed from Shelukhin's quasimorphism Open

Morimichi Kawasaki, Mitsuaki Kimura, Shuhei Maruyama, Takahiro Matsushita, Masato Mimura · 2025

Given a closed connected symplectic manifold $(M,ω)$, we construct an alternating $\mathbb{R}$-bilinear form $\mathfrak{b}=\mathfrak{b}_{μ_{\mathrm{Sh}}}$ on the real first cohomology of $M$ from Shelukhin's quasimorphism $μ_{\mathrm{Sh}}$…

Non-extendablity of Shelukhin's quasimorphism and non-triviality of Reznikov's class Open

Morimichi Kawasaki, Mitsuaki Kimura, Shuhei Maruyama, Takahiro Matsushita, Masato Mimura · 2025

Shelukhin constructed a quasimorphism on the universal covering of the group of Hamiltonian diffeomorphisms for a general closed symplectic manifold. In the present paper, we prove the non-extendability of that quasimorphism for certain sy…

Microphone Array Geometry Independent Multi-Talker Distant ASR: NTT System for the DASR Task of the CHiME-8 Challenge Open

Naoyuki Kamo, Naohiro Tawara, Atsushi Ando, Takatomo Kano, Hiroshi Satō , et al. · 2025

In this paper, we introduce a multi-talker distant automatic speech recognition (DASR) system we designed for the DASR task 1 of the CHiME-8 challenge. Our system performs speaker counting, diarization, and ASR. It handles various recordin…

Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding Open

Takafumi Moriya, Takanori Ashihara, Masato Mimura, Hiroshi Sato, Kohei Matsuura , et al. · 2024

Computer science Materials science Mathematics

A hybrid autoregressive transducer (HAT) is a variant of neural transducer that models blank and non-blank posterior distributions separately. In this paper, we propose a novel internal acoustic model (IAM) training strategy to enhance HAT…

Alignment-Free Training for Transducer-based Multi-Talker ASR Open

Takafumi Moriya, Shota Horiguchi, Marc Delcroix, Ryo Masumura, Takanori Ashihara , et al. · 2024

Computer science Physics

Extending the RNN Transducer (RNNT) to recognize multi-talker speech is essential for wider automatic speech recognition (ASR) applications. Multi-talker RNNT (MT-RNNT) aims to achieve recognition without relying on costly front-end source…

NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge Open

Naoyuki Kamo, Naohiro Tawara, Atsushi Ando, Takatomo Kano, Hiroshi Satō , et al. · 2024

Computer science Engineering Philosophy

We present a distant automatic speech recognition (DASR) system developed for the CHiME-8 DASR track. It consists of a diarization first pipeline. For diarization, we use end-to-end diarization with vector clustering (EEND-VC) followed by …

NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge Open

Naoyuki Kamo, Naohiro Tawara, Atsushi Ando, Takatomo Kano, Hiroshi Satō , et al. · 2024

Computer science Engineering

International audience

Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation Open

Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Masato Mimura, Takatomo Kano , et al. · 2024

Computer science Engineering Chemistry

This paper introduces a novel approach called sentence-wise speech summarization (Sen-SSum), which generates text summaries from a spoken document in a sentence-by-sentence manner. Sen-SSum combines the real-time processing of automatic sp…

SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling Open

Hiroshi Sato, Takafumi Moriya, Masato Mimura, Shota Horiguchi, Tsubasa Ochiai , et al. · 2024

Computer science Mathematics Chemistry

Real-time target speaker extraction (TSE) is intended to extract the desired speaker's voice from the observed mixture of multiple speakers in a streaming manner. Implementing real-time TSE is challenging as the computational complexity mu…

Invariant quasimorphisms and generalized mixed Bavard duality Open

Morimichi Kawasaki, Mitsuaki Kimura, Shuhei Maruyama, Takahiro Matsushita, Masato Mimura · 2024

Mathematics

This article provides an expository account of the celebrated duality theorem of Bavard and three its strengthenings. The Bavard duality theorem connects scl (stable commutator length) and quasimorphisms on a group. Calegari extended the f…

Coarse group theoretic study on stable mixed commutator length Open

Morimichi Kawasaki, Mitsuaki Kimura, Shuhei Maruyama, Takahiro Matsushita, Masato Mimura · 2023

Mathematics Physics

Let $G$ be a group and $N$ a normal subgroup of $G$. We study the large scale behavior, not the exact values themselves, of the stable mixed commutator length $scl_{G,N}$ on the mixed commutator subgroup $[G,N]$; when $N=G$, $scl_{G,N}$ eq…

Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder Open

Hao Shi, Masato Mimura, Longbiao Wang, Jianwu Dang, Tatsuya Kawahara · 2023

Computer science Mathematics Engineering

Time-domain speech enhancement (SE) has recently been intensively investigated. Among recent works, DEMUCS introduces multi-resolution STFT loss to enhance performance. However, some resolutions used for STFT contain non-stationary signals…

End-to-End Generation of Written-style Transcript of Speech from Parliamentary Meetings Open

Masato Mimura, Tatsuya Kawahara · 2023

Computer science Art Psychology

従来の音声認識システムは，入力音声に現れるすべての単語を忠実に再現するように設計されているため，認識精度が高いときでも，人間にとって読みやすい文を出力するとは限らない．これに対して，本研究では，フィラーや言い誤りの削除，句読点や脱落した助詞の挿入，また口語的な表現の修正など，適宜必要な編集を行いながら，音声から直接可読性の高い書き言葉スタイルの文を出力する新しい音声認識のアプローチについて述べる．我々はこのアプローチを単一のニューラルネットワークを用いた音声から書き言葉への…

Survey on invariant quasimorphisms and stable mixed commutator length Open

Morimichi Kawasaki, Mitsuaki Kimura, Shuhei Maruyama, Takahiro Matsushita, Masato Mimura · 2022

Mathematics

A homogeneous quasimorphism $ϕ$ on a normal subgroup $N$ of $G$ is said to be $G$-invariant if $ϕ(gxg^{-1}) = ϕ(x)$ for every $g \in G$ and for every $x \in N$. Invariant quasimorphisms have naturally appeared in symplectic geometry and th…

Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM Open

Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai , et al. · 2022

Computer science Mathematics Philosophy

Connectionist temporal classification (CTC) -based models are attractive in automatic speech recognition (ASR) because of their non-autoregressive nature. To take advantage of text-only data, language model (LM) integration approaches such…

Distilling the Knowledge of BERT for CTC-based ASR Open

Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara · 2022

Computer science

Connectionist temporal classification (CTC) -based models are attractive because of their fast inference in automatic speech recognition (ASR). Language model (LM) integration approaches such as shallow fusion and rescoring can improve the…

Invariant quasimorphisms for groups acting on the circle and non-equivalence of SCL Open

Shuhei Maruyama, Takahiro Matsushita, Masato Mimura · 2022

Mathematics Physics

We construct invariant quasimorphisms for groups acting on the circle. Furthermore, we provide a criterion for the non-extendablity of the resulting quasimorphisms and an explicit formula which relates the values of our quasimorphisms to t…

Mixed commutator lengths, wreath products and general ranks Open

Morimichi Kawasaki, Mitsuaki Kimura, Shuhei Maruyama, Takahiro Matsushita, Masato Mimura · 2022

Mathematics Physics

In the present paper, for a pair $(G,N)$ of a group $G$ and its normal subgroup $N$, we consider the mixed commutator length $\mathrm{cl}_{G,N}$ on the mixed commutator subgroup $[G,N]$. We focus on the setting of wreath products: $ (G,N)=…

Synthesizing waveform sequence-to-sequence to augment training data for sequence-to-sequence speech recognition Open

Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara · 2021

Computer science Biology Art

Sequence-to-sequence (seq2seq) automatic speech recognition (ASR) recently achieves state-of-the-art performance with fast decoding and a simple architecture. On the other hand, it requires a large amount of training data and cannot use te…

ASR Rescoring and Confidence Estimation with ELECTRA Open

Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara · 2021

Computer science Philosophy

In automatic speech recognition (ASR) rescoring, the hypothesis with the fewest errors should be selected from the n-best list using a language model (LM). However, LMs are usually trained to maximize the likelihood of correct word sequenc…

The space of non-extendable quasimorphisms Open

Morimichi Kawasaki, Mitsuaki Kimura, Shuhei Maruyama, Takahiro Matsushita, Masato Mimura · 2021

Mathematics Computer science Physics

For a pair $(G,N)$ of a group $G$ and its normal subgroup $N$, we consider the space of quasimorphisms and quasi-cocycles on $N$ non-extendable to $G$. To treat this space, we establish the five-term exact sequence of cohomology relative t…

Commuting symplectomorphisms on a surface and the flux homomorphism Open

Morimichi Kawasaki, Mitsuaki Kimura, Takahiro Matsushita, Masato Mimura · 2021

Mathematics Physics Computer science

Let $(S,ω)$ be a closed connected oriented surface whose genus $l$ is at least two equipped with a symplectic form. Then we show the vanishing of the cup product of the fluxes of commuting symplectomorphisms. This result may be regarded as…

Automatic Speech Recognition for the Archive of Ainu Folklores Open

Kohei Matsuura, Masato Mimura, Tatsuya Kawahara · 2021

Computer science Philosophy

本稿では，アイヌ民話（ウウェペケㇾ）の音声認識に関する我々の取り組みについて述べる．まず，2 つの博物館から提供されたアイヌ語アーカイブのデータを元に，沙流方言を対象としたアイヌ語音声コーパスを構築した．次に，このコーパスを用いて注意機構モデルに基づく音声認識システムを構成し，音素・音節・ワードピース・単語の 4 つの認識単位について検討した．その結果，音節単位での音声認識精度が最も高くなることがわかり，話者クローズド条件と話者オープン条件のそれぞれについて，音素認識精度で…

Constellations in prime elements of number fields Open

Wataru Kai, Masato Mimura, Akihiro Munemasa, Shin-ichiro Seki, Kiyoto Yoshino · 2020

Mathematics Computer science Chemistry

Given any number field, we prove that there exist arbitrarily shaped constellations consisting of pairwise non-associate prime elements of the ring of integers. This result extends the celebrated Green-Tao theorem on arithmetic progression…

CTC-Synchronous Training for Monotonic Attention Model Open

Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara · 2020

Computer science Mathematics Biology

Monotonic chunkwise attention (MoChA) has been studied for the online streaming automatic speech recognition (ASR) based on a sequence-to-sequence framework. In contrast to connectionist temporal classification (CTC), backward probabilitie…

Enhancing Monotonic Multihead Attention for Streaming ASR Open

Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara · 2020

Computer science Mathematics

We investigate a monotonic multihead attention (MMA) by extending hard monotonic attention to Transformer-based automatic speech recognition (ASR) for online streaming applications. For streaming inference, all monotonic attention (MA) hea…

End-to-end Music-mixed Speech Recognition Open

Jeongwoo Woo, Masato Mimura, Kazuyoshi Yoshii, Tatsuya Kawahara · 2020

Computer science Mathematics Engineering

Automatic speech recognition (ASR) in multimedia content is one of the promising applications, but speech data in this kind of content are frequently mixed with background music, which is harmful for the performance of ASR. In this study, …

Distilling the Knowledge of BERT for Sequence-to-Sequence ASR Open

Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai , et al. · 2020

Computer science Engineering Biology

Attention-based sequence-to-sequence (seq2seq) models have achieved promising results in automatic speech recognition (ASR). However, as these models decode in a left-to-right way, they do not have access to context on the right. We levera…

Masato Mimura YOU? Author Swipe