Committee |
Date Time |
Place |
Paper Title / Authors |
Abstract |
Paper # |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-03-01 09:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
A Study on Scheduled Sampling for Neural Transducer-based ASR Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura (NTT) EA2022-100 SIP2022-144 SP2022-64 |
In this paper, we propose scheduled sampling approaches suited for the recurrent neural network-transducer (RNNT) that i... [more] |
EA2022-100 SIP2022-144 SP2022-64 pp.147-152 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-03-01 15:50 |
Okinawa |
(Primary: On-site, Secondary: Online) |
The linguistic influence on speaker verification based on Self-Supervised Learning Tomoka Wakamatsu (Tokyo Metropolitan Univ.), Atsushi Ando (NTT), Sayaka Shiota (Tokyo Metropolitan Univ.), Ryo Masumura (NTT), Hitoshi Kiya (Tokyo Metropolitan Univ.) EA2022-118 SIP2022-162 SP2022-82 |
In recent years, statistical models utilizing Self-Supervised Learning (SSL) have been employed in various fields
It ha... [more] |
EA2022-118 SIP2022-162 SP2022-82 pp.247-252 |
NLC, IPSJ-NL, SP, IPSJ-SLP [detail] |
2022-11-30 15:30 |
Tokyo |
(Primary: On-site, Secondary: Online) |
Semi-supervised joint training of text to speech and automatic speech recognition using unpaired text data Naoki Makishima, Satoshi Suzuki, Atsushi Ando, Ryo Masumura (NTT) NLC2022-14 SP2022-34 |
This paper presents a novel joint training of text to speech (TTS) and automatic speech recognition (ASR) with small amo... [more] |
NLC2022-14 SP2022-34 pp.27-32 |
EA, SIP, SP, IPSJ-SLP [detail] |
2022-03-02 10:20 |
Okinawa |
(Primary: On-site, Secondary: Online) |
A Study on Hybrid RNN-T/Attention-based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix (NTT), Takahiro Shinozaki (Tokyo Tech) EA2021-78 SIP2021-105 SP2021-63 |
In this paper we propose improvements to our recently proposed hybrid RNN-T/Attention architecture that includes a share... [more] |
EA2021-78 SIP2021-105 SP2021-63 pp.90-95 |
NLC, IPSJ-NL, SP, IPSJ-SLP [detail] |
2021-12-03 11:00 |
Online |
Online |
Multi-speaker Audiobook Speech Synthesis using Discrete Character Acting Styles Acquired by VQVAE Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito (UT), Yusuke Ijima, Ryo Masumura (NTT), Hiroshi Saruwatari (UT) NLC2021-26 SP2021-47 |
In this paper, we propose a method of extracting discrete character acting styles using vector quantized variational aut... [more] |
NLC2021-26 SP2021-47 pp.42-47 |
NLC |
2020-09-10 15:25 |
Online |
Online |
Unsupervised Domain Adaptation for Dialogue Sequence Labeling
-- Application to Contact Center Tasks -- Shota Orihashi, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Ryo Masumura (NTT) NLC2020-8 |
This paper presents an unsupervised domain adaptation for utterance-level sequence labeling of conversation in a contact... [more] |
NLC2020-8 pp.34-39 |
SP, EA, SIP |
2020-03-02 13:00 |
Okinawa |
Okinawa Industry Support Center (Cancelled but technical report was issued) |
Japanese dialect speech classification using sequence-to-one neural networks Ryo Imaizumi (TMU), Ryo Masumura (NTT), Sayaka Shiota, Hitoshi Kiya (TMU) EA2019-108 SIP2019-110 SP2019-57 |
The language specific to a certain region is called a dialect, and the task of identifying which dialect the input speec... [more] |
EA2019-108 SIP2019-110 SP2019-57 pp.41-46 |
SP, EA, SIP |
2020-03-02 13:00 |
Okinawa |
Okinawa Industry Support Center (Cancelled but technical report was issued) |
[Poster Presentation]
Neural Voice Activity Detection using Multiple Auxiliary Networks Ryo Masumura, Kiyoaki Matsui, Yuma Koizumi, Takanobu Oba (NTT) EA2019-109 SIP2019-111 SP2019-58 |
[more] |
EA2019-109 SIP2019-111 SP2019-58 pp.47-52 |
SP, EA, SIP |
2020-03-02 13:00 |
Okinawa |
Okinawa Industry Support Center (Cancelled but technical report was issued) |
Data augmentation for ASR system by using locally time-reversed speech
-- Temporal inversion of feature sequence -- Takanori Ashihara, Tomohiro Tanaka, Takafumi Moriya, Ryo Masumura, Yusuke Shinohara, Makio Kashino (NTT) EA2019-110 SIP2019-112 SP2019-59 |
Data augmentation is one of the techniques to mitigate overfitting and improve robustness against several acoustic varia... [more] |
EA2019-110 SIP2019-112 SP2019-59 pp.53-58 |
SP, EA, SIP |
2020-03-02 13:00 |
Okinawa |
Okinawa Industry Support Center (Cancelled but technical report was issued) |
The Effectiveness of Additional Context in DNN-based Spontaneous Speech Synthesis Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi (UTokyo), Yusuke Ijima, Ryo Masumura (NTT), Hiroshi Saruwatari (UTokyo) EA2019-112 SIP2019-114 SP2019-61 |
In DNN-based speech synthesis, contexts, which are input features of DNN, can be used not only for the representation of... [more] |
EA2019-112 SIP2019-114 SP2019-61 pp.65-70 |
SP, EA, SIP |
2020-03-02 15:45 |
Okinawa |
Okinawa Industry Support Center (Cancelled but technical report was issued) |
Performance evaluation of distilling knowledge using encoder-decoder for CTC-based automatic speech recognition systems Takafumi Moriya, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara (NTT) EA2019-131 SIP2019-133 SP2019-80 |
We present a novel training approach for connectionist temporal classification (CTC) -based automatic speech recognition... [more] |
EA2019-131 SIP2019-133 SP2019-80 pp.175-180 |
SP, EA, SIP |
2020-03-03 09:00 |
Okinawa |
Okinawa Industry Support Center (Cancelled but technical report was issued) |
LARGE-CONTEXT POINTER-GENERATOR NETWORKS FOR SPOKEN-TO-WRITTEN STYLE CONVERSION Mana Ihori, Akihiko Takashima, Ryo Masumura (NTT) EA2019-142 SIP2019-144 SP2019-91 |
This paper introduces a spoken-to-written style conversion method that is suitable for handling a series of text such as... [more] |
EA2019-142 SIP2019-144 SP2019-91 pp.237-242 |
SP |
2019-08-28 17:00 |
Kyoto |
Kyoto Univ. |
Speech Emotion Classification based on Multi-Label Emotion Existence Estimation Atsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono (NTT) SP2019-16 |
This paper presents a novel speech emotion classification that addresses the ambiguous nature of emotions in speech. Mos... [more] |
SP2019-16 pp.39-44 |
EA, SIP, SP |
2019-03-14 13:30 |
Nagasaki |
i+Land nagasaki (Nagasaki-shi) |
[Poster Presentation]
Modeling learners’ pronunciation variations and its application to automatic phoneme error detection Zhang Haoyu, Saito Daisuke, Minematsu Nobuaki (UTokyo), Kobashikawa Satoshi, Masumura Ryo (NTT) EA2018-119 SIP2018-125 SP2018-81 |
[more] |
EA2018-119 SIP2018-125 SP2018-81 pp.119-124 |
EA, SIP, SP |
2019-03-15 10:25 |
Nagasaki |
i+Land nagasaki (Nagasaki-shi) |
Neural Language Models based on Conditional Hierarchical Recurrent Encoder-Decoder for Multi-Party Conversational Speech Recognition Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Takanobu Oba, Yushi Aono (NTT) EA2018-131 SIP2018-137 SP2018-93 |
This paper presents fully neural network based language models (LMs) that can leverage long-range conversational context... [more] |
EA2018-131 SIP2018-137 SP2018-93 pp.191-196 |
EA, SIP, SP |
2019-03-15 10:50 |
Nagasaki |
i+Land nagasaki (Nagasaki-shi) |
Likability Estimation Model Training of Call-center Agents Based on Annotators' Skills Hosana Kamiyama, Atsushi Ando, Ryo Masumura, Satoshi Kobashikawa, Yushi Aono (NTT) EA2018-132 SIP2018-138 SP2018-94 |
This paper proposes a new technique for estimating the likability of call-center agents.
Most techniques of likability ... [more] |
EA2018-132 SIP2018-138 SP2018-94 pp.197-202 |
NLC, IPSJ-IFAT |
2019-02-07 14:45 |
Kyoto |
Ryukoku University Omiya Campus |
Call Scene Segmentation based on Neural Networks with Conversational Contexts Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Hosana Kamiyama, Takanobu Oba, Yushi Aono (NTT) NLC2018-39 |
Call scene segmentation that automatically splits contact center dialogues into several call scenes is useful for constr... [more] |
NLC2018-39 pp.21-26 |
SP |
2018-08-27 15:30 |
Kyoto |
Kyoto Univ. |
Neural Error Corrective Language Models with Multiple Hypotheses Tomohiro Tanaka, Ryo Masumura, Yushi Aono (NTT) SP2018-29 |
[more] |
SP2018-29 pp.31-36 |
SIP, EA, SP, MI (Joint) [detail] |
2018-03-19 13:00 |
Okinawa |
|
[Poster Presentation]
Quantitative and corpus-based analysis of pronunciation diversity observed in Japanese English Suguru Kabashima, Haoyu Zhang, Daisuke Saito, Nobuaki Minematsu (Univ. of Tokyo), Satoshi Kobashikawa, Ryo Masumura (NTT) EA2017-113 SIP2017-122 SP2017-96 |
In foreign language teaching, corrective feedback to learners' pronunciation is regarded
as highly important and automa... [more] |
EA2017-113 SIP2017-122 SP2017-96 pp.69-74 |
SP, SIP, EA |
2017-03-01 12:40 |
Okinawa |
Okinawa Industry Support Center |
[Poster Presentation]
Prosodic Word Embeddings for DNN-based speech synthesis Yusuke Ijima, Nobukatsu Hojo, Ryo Masumura, Taichi Asami (NTT) EA2016-109 SIP2016-164 SP2016-104 |
This paper proposed a novel word embeddings with prosodic information (prosodic word embeddings) for DNN-based speech sy... [more] |
EA2016-109 SIP2016-164 SP2016-104 pp.153-158 |