Committee |
Date Time |
Place |
Paper Title / Authors |
Abstract |
Paper # |
SP, IPSJ-MUS, IPSJ-SLP [detail] |
2024-06-15 13:50 |
Tokyo |
(Primary: On-site, Secondary: Online) |
[Poster Presentation]
A voice synthesizer operated by fingers to control its vocal-tract area function. Amane Koriki, Masashi Ito (Tohtech) |
(To be available after the conference date) [more] |
|
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 09:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
An experimental survey on speaker embedding spaces for controlling speaker identity in speech synthesis system Wakuto Morita, Daisuke Saito, Nobuaki Minematsu (Univ. of Tokyo) EA2023-93 SIP2023-140 SP2023-75 |
This study investigated the influence of the discriminability of speaker encoders on speech synthesis models that can co... [more] |
EA2023-93 SIP2023-140 SP2023-75 pp.190-195 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 09:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Multi-Dialect Speech Synthesis with Interpretable Accent latent Variable based on VQ-VAE Kazuki Yamauchi, Yuki Saito, Hiroshi Saruwatari (UTokyo) EA2023-98 SIP2023-145 SP2023-80 |
In this paper, we address two tasks: "Intra-dialect Text-to-Speech (TTS)," aiming to synthesize speech in the same diale... [more] |
EA2023-98 SIP2023-145 SP2023-80 pp.220-225 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 10:40 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Intermediate speaker speech synthesis between two speakers using x-vector speaker space Sota Hosoi, Takahiro Kinouchi, Yukoh Wakabayashi, Norihide Kitaoka (TUT) EA2023-103 SIP2023-150 SP2023-85 |
Recent advancements in speech synthesis technologies have enabled the synthesis of speeches of speakers not in the train... [more] |
EA2023-103 SIP2023-150 SP2023-85 pp.250-255 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 10:40 |
Okinawa |
(Primary: On-site, Secondary: Online) |
An Investigation on the Speech Recovery from EEG Signals Using Transformer Tomoaki Mizuno (The Univ. of Electro-Communications), Takuya Kishida (Aichi Shukutoku Univ.), Natsue Yoshimura (Tokyo Tech), Toru Nakashika (The Univ. of Electro-Communications) EA2023-108 SIP2023-155 SP2023-90 |
Synthesizing full speech from ElectroEncephaloGraphy(EEG) signals is a challenging task. In this paper, speech reconstru... [more] |
EA2023-108 SIP2023-155 SP2023-90 pp.277-282 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 16:35 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Discrimination of rotation direction of virtual sound source in binaural synthesis using sound source radiation characteristics Orie Nishiyama (Chiba Institute of Technology), Toshiharu Horiuchi, Shota Okubo (KDDI Research, Inc.), Yoshifumi Chisaki (Chiba Institute of Technology) EA2023-125 SIP2023-172 SP2023-107 |
In order to provide the sensation of being there, research has been conducted on realistic communication that acquires, ... [more] |
EA2023-125 SIP2023-172 SP2023-107 pp.376-381 |
SP, NLC, IPSJ-SLP, IPSJ-NL [detail] |
2023-12-03 10:00 |
Tokyo |
Kikai-Shinko-Kaikan Bldg. (Primary: On-site, Secondary: Online) |
Improvement of Tacotron2 text-to-speech model based on masking operation and positional attention mechanism Tong Ma, Daisuke Saito, Nobuaki Minematsu (Univ. of Tokyo) NLC2023-17 SP2023-37 |
[more] |
NLC2023-17 SP2023-37 pp.19-24 |
SP, NLC, IPSJ-SLP, IPSJ-NL [detail] |
2023-12-03 11:05 |
Tokyo |
Kikai-Shinko-Kaikan Bldg. (Primary: On-site, Secondary: Online) |
[Poster Presentation]
Self-supervised learning model based emotion transfer and intensity control technology for expressive speech synthesis Wei Li, Nobuaki Minematsu, Daisuke Saito (Univ. of Tokyo) NLC2023-21 SP2023-41 |
Emotion transfer techniques, which transfersba the speaking style from the reference speech to the target speech, are wi... [more] |
NLC2023-21 SP2023-41 pp.43-48 |
PRMU, IPSJ-CVIM, IPSJ-DCC, IPSJ-CGVI |
2023-11-17 09:20 |
Tottori |
(Primary: On-site, Secondary: Online) |
Co-speech Gesture Generation with Variational Auto Encoder Shihichi Ka, Koichi Shinoda (Tokyo Tech) PRMU2023-29 |
Co-speech gesture generation is the study of generating gestures from speech. In prior works, deterministic methods lear... [more] |
PRMU2023-29 pp.74-79 |
SP, IPSJ-MUS, IPSJ-SLP [detail] |
2023-06-23 13:50 |
Tokyo |
(Primary: On-site, Secondary: Online) |
[Poster Presentation]
MS-Harmonic-Net++ vs SiFi-GAN: Comparison of fundamental frequency controllable fast neural waveform generative models. Sota Shimizu (Kobe Univ./NICT), Takuma Okamoto (NICT), Ryoichi Takashima (Kobe Univ.), Yamato Ohtani (NICT), Tetsuya Takiguchi (Kobe Univ.), Tomoki Toda (Nagoya Univ./NICT), Hisashi Kawai (NICT) SP2023-5 |
Although Harmonic-Net+ has been proposed as a fundamental frequency (fo) and speech rate (SR) controllable fast neural v... [more] |
SP2023-5 pp.20-25 |
SP, IPSJ-MUS, IPSJ-SLP [detail] |
2023-06-24 13:50 |
Tokyo |
(Primary: On-site, Secondary: Online) |
Fast Neural Waveform Generation Model With Fully Connected Upsampling Haruki Yamashita (Kobe cniv/NICT), Takuma Okamoto (NICT), Ryoichi Takashima (Kobe Univ), Yamato Ohtani (NICT), Tetsuya Takiguchi (Kobe Univ), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT) SP2023-15 |
In recent years, in text-to-speech synthesis, it is required to improve the inference speed while keeping the quality.
... [more] |
SP2023-15 pp.73-78 |
SP, IPSJ-MUS, IPSJ-SLP [detail] |
2023-06-24 13:50 |
Tokyo |
(Primary: On-site, Secondary: Online) |
Effect of pause length ratio in speech length on the perception of speech rate induced by speech length Maho Tamakawa, Shuichi Sakamoto (Tohoku Univ.) SP2023-23 |
The goal of this study is to investigate the mechanism of the perception of speech rate. In this preliminary study, we i... [more] |
SP2023-23 pp.114-118 |
SP, IPSJ-MUS, IPSJ-SLP [detail] |
2023-06-24 13:50 |
Tokyo |
(Primary: On-site, Secondary: Online) |
Evaluation of multi-speaker text-to-speech synthesis using a corpus for speech recognition with x-vectors for various speech styles Koki Hida (Wakayama Univ/NICT), Takuma Okamoto (NICT), Ryuichi Nisimura (Wakayama Univ), Yamato Ohtani (NICT), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT) SP2023-25 |
We have implemented multi-speaker end-to-end text-to-speech synthesis based on JETS using x-vectors as speaker embedding... [more] |
SP2023-25 pp.125-130 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-02-28 09:10 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Comparison of fundamental frequency controllable fast neural waveform generative models. Sota Shimizu (Kobe Univ./NICT), Takuma Okamoto (NICT), Ryoichi Takashima, Tetsuya Takiguchi (Kobe Univ.), Tomoki Toda (Nagoya Univ./NICT), Hisashi Kawai (NICT) EA2022-75 SIP2022-119 SP2022-39 |
Neural vocoders, which reconstruct speech waveforms from acoustic features with deep neural networks, have significantly... [more] |
EA2022-75 SIP2022-119 SP2022-39 pp.1-6 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-02-28 09:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
MS-FC-HiFiGAN : Fast Neural Waveform Generation Model With Learnable Lightweight Upsampling Haruki Yamashita (Kobe Univ/NICT), Takuma Okamoto (NICT), Ryoichi Takashima, Tetsuya Takiguchi (Kobe Univ), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT) EA2022-76 SIP2022-120 SP2022-40 |
In recent years, in text-to-speech synthesis, it is required to improve the inference speed while keeping the quality.
... [more] |
EA2022-76 SIP2022-120 SP2022-40 pp.7-12 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-02-28 09:50 |
Okinawa |
(Primary: On-site, Secondary: Online) |
End-to-End Speech Synthesis Based on Articulatory Movements Captured by Real-time MRI Yuto Otani, Shun Sawada, Hidefumi Ohmura, Kouichi Katsurada (Tokyo Univ. Sci.) EA2022-77 SIP2022-121 SP2022-41 |
We propose an end-to-end deep learning model for speech synthesis based on articulatory movements captured by real-time ... [more] |
EA2022-77 SIP2022-121 SP2022-41 pp.13-18 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-02-28 13:00 |
Okinawa |
(Primary: On-site, Secondary: Online) |
[Invited Talk]
Multiple sound spot synthesis meets multilingual speech synthesis
-- Implementation is really all we need -- Takuma Okamoto (NICT) EA2022-87 SIP2022-131 SP2022-51 |
A multilingual multiple sound spot synthesis system is implemented as a user interface for real-time speech translation ... [more] |
EA2022-87 SIP2022-131 SP2022-51 pp.73-76 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-03-01 11:00 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Representation and Prediction of Accent Phrase Prosodic Features in Japanese Text-to-Speech Masaki Sato, Shinnosuke Takamichi, Hiroshi Saruwatari (The Univ. of Tokyo) EA2022-108 SIP2022-152 SP2022-72 |
In order to use speech synthesis in a variety of situations such as dialogue systems and emotional expression in audiobo... [more] |
EA2022-108 SIP2022-152 SP2022-72 pp.197-202 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-03-01 11:20 |
Okinawa |
(Primary: On-site, Secondary: Online) |
An Investigation of Text-to-Speech Synthesis Using Voice Conversion and x-vector Embedding Sympathizing Emotion of Input Audio for Spoken Dialogue Systems Shunichi Kohara, Masanobu Abe, Sunao Hara (Okayama Univ.) EA2022-109 SIP2022-153 SP2022-73 |
In this paper, we propose a Text-to-Speech synthesis method to synthesize the same emotional expression as the input spe... [more] |
EA2022-109 SIP2022-153 SP2022-73 pp.203-208 |
EA, US (Joint) |
2022-12-22 13:30 |
Hiroshima |
Satellite Campus Hiroshima |
[Poster Presentation]
Quality Improvement of Children's Speech with Multiple Inputs of Speaker Vectors in a General Purpose Vocoder Satoshi Yoshida, Ken'ichi Furuya (Oita Univ.), Hideyuki Mizuno (SUS) EA2022-64 |
Neural vocoders used in speech synthesis are capable of synthesizing high-quality speech that is indistinguishable from ... [more] |
EA2022-64 pp.18-23 |