Committee |
Date Time |
Place |
Paper Title / Authors |
Abstract |
Paper # |
SP, IPSJ-MUS, IPSJ-SLP [detail] |
2023-06-23 13:50 |
Tokyo |
(Primary: On-site, Secondary: Online) |
[Poster Presentation]
MS-Harmonic-Net++ vs SiFi-GAN: Comparison of fundamental frequency controllable fast neural waveform generative models. Sota Shimizu (Kobe Univ./NICT), Takuma Okamoto (NICT), Ryoichi Takashima (Kobe Univ.), Yamato Ohtani (NICT), Tetsuya Takiguchi (Kobe Univ.), Tomoki Toda (Nagoya Univ./NICT), Hisashi Kawai (NICT) SP2023-5 |
Although Harmonic-Net+ has been proposed as a fundamental frequency (fo) and speech rate (SR) controllable fast neural v... [more] |
SP2023-5 pp.20-25 |
SP, IPSJ-MUS, IPSJ-SLP [detail] |
2023-06-24 13:50 |
Tokyo |
(Primary: On-site, Secondary: Online) |
Fast Neural Waveform Generation Model With Fully Connected Upsampling Haruki Yamashita (Kobe cniv/NICT), Takuma Okamoto (NICT), Ryoichi Takashima (Kobe Univ), Yamato Ohtani (NICT), Tetsuya Takiguchi (Kobe Univ), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT) SP2023-15 |
In recent years, in text-to-speech synthesis, it is required to improve the inference speed while keeping the quality.
... [more] |
SP2023-15 pp.73-78 |
SP, IPSJ-MUS, IPSJ-SLP [detail] |
2023-06-24 13:50 |
Tokyo |
(Primary: On-site, Secondary: Online) |
Evaluation of multi-speaker text-to-speech synthesis using a corpus for speech recognition with x-vectors for various speech styles Koki Hida (Wakayama Univ/NICT), Takuma Okamoto (NICT), Ryuichi Nisimura (Wakayama Univ), Yamato Ohtani (NICT), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT) SP2023-25 |
We have implemented multi-speaker end-to-end text-to-speech synthesis based on JETS using x-vectors as speaker embedding... [more] |
SP2023-25 pp.125-130 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-02-28 09:10 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Comparison of fundamental frequency controllable fast neural waveform generative models. Sota Shimizu (Kobe Univ./NICT), Takuma Okamoto (NICT), Ryoichi Takashima, Tetsuya Takiguchi (Kobe Univ.), Tomoki Toda (Nagoya Univ./NICT), Hisashi Kawai (NICT) EA2022-75 SIP2022-119 SP2022-39 |
Neural vocoders, which reconstruct speech waveforms from acoustic features with deep neural networks, have significantly... [more] |
EA2022-75 SIP2022-119 SP2022-39 pp.1-6 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-02-28 09:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
MS-FC-HiFiGAN : Fast Neural Waveform Generation Model With Learnable Lightweight Upsampling Haruki Yamashita (Kobe Univ/NICT), Takuma Okamoto (NICT), Ryoichi Takashima, Tetsuya Takiguchi (Kobe Univ), Tomoki Toda (Nagoya Univ/NICT), Hisashi Kawai (NICT) EA2022-76 SIP2022-120 SP2022-40 |
In recent years, in text-to-speech synthesis, it is required to improve the inference speed while keeping the quality.
... [more] |
EA2022-76 SIP2022-120 SP2022-40 pp.7-12 |
SP, IPSJ-MUS, IPSJ-SLP [detail] |
2022-06-17 13:00 |
Online |
Online |
A Study of Speech Recognition Result Correction Using BERT for Speech Translation Tadashi Ogura, Masakiyo Fujimoto, Peng Shen, Xugang Lu, Hisashi Kawai (NICT) SP2022-4 |
Speech translation (ST) technology consists of automatic speech recognition (ASR) and machine translation technologies. ... [more] |
SP2022-4 pp.10-13 |
SP, EA, SIP |
2020-03-02 09:20 |
Okinawa |
Okinawa Industry Support Center (Cancelled but technical report was issued) |
Investigation of neural speech rate conversion with multi-speaker WaveNet vocoder Takuma Okamoto (NICT), Keisuke Matsubara (Kobe Univ./NICT), Tomoki Toda (Nagoya Univ./NICT), Yoshinori Shiga, Hisashi Kawai (NICT) EA2019-101 SIP2019-103 SP2019-50 |
Speech rate conversion technology, which can expand or compress speech waveforms without changing pitch of sound, is con... [more] |
EA2019-101 SIP2019-103 SP2019-50 pp.1-6 |
SP, IPSJ-SLP (Joint) |
2018-07-26 16:45 |
Shizuoka |
Sago-Royal-Hotel (Hamamatsu) |
Single channel noisy speech recognition based on combination of noisy speech and enhanced speech Masakiyo Fujimoto, Hisashi Kawai (NICT) SP2018-19 |
In many cases, single channel speech enhancement seriously deteriorates speech recognition accuracy due to the influence... [more] |
SP2018-19 pp.15-20 |
PRMU |
2015-12-21 09:30 |
Nagano |
|
Evaluation of Automatic Prototype-Model Size Optimization in Large Geometric Margin Minimum Classification Error Training Masahiro Ogino (Doshisha Univ.), Hideyuki Watanabe (NICT), Shigeru Katagiri, Miho Osaki (Doshisha Univ.), Xugang Lu, Hisashi Kawai (NICT) PRMU2015-100 |
To develop a method for nding an appropriate class model size, which leads to accurate classication over unseen patter... [more] |
PRMU2015-100 pp.1-6 |
SP, IPSJ-SLP (Joint) |
2015-07-16 17:20 |
Nagano |
Katakura Suwako Hotel |
Experimental evaluation of network size effect in speaker adaptive trained DNNs embedding linear transformation networks Tsubasa Ochiai (Doshisha Univ./NICT), Shigeki Matsuda (Doshisha Univ.), Hideyuki Watanabe, Xugang Lu, Hisashi Kawai (NICT), Shigeru Katagiri (Doshisha Univ.) SP2015-41 |
Recently we proposed a novel speaker adaptation method that applied the Speaker Adaptive Training
(SAT) concept to DNN-... [more] |
SP2015-41 pp.31-36 |
SP |
2012-06-14 16:00 |
Kanagawa |
NTT Atsugi R&D Center |
Perceptual evaluation of synthesized speech reflecting "personalities" Minoru Tsuzaki (KCUA), Keiichi Tokuda (NITEC), Hisashi Kawai (KDDI R&D Labs), Yoshinori Shiga, Jinfu Ni (NICT), Keiichiro Oura, Sayaka Shiota (NITEC) SP2012-39 |
Perceptual evaluation tests were performed for talker selection methods in the application of the speaker adaptation fra... [more] |
SP2012-39 pp.33-38 |
SP |
2011-07-22 14:15 |
Hokkaido |
Jozankei Grand Hotel |
Network-based Spoken Dialog System Development Platform: WFSTDM builder Chiori Hori, Hisashi Kawai, Hideki Kashioka (NICT) SP2011-46 |
[more] |
SP2011-46 pp.29-34 |
SP |
2011-03-04 14:15 |
Tokyo |
Faculty of Engineering, The University of Tokyo |
Estimation of perceptual talker space using Japanese-English bilingual corpu Minoru Tsuzaki (Kyoto City Univ. of Arts), Keiichi Tokuda (Nagoya Inst. of Tech.), Hisashi Kawai, Jinfu Ni (NICT) SP2010-116 |
This paper reconfirms that talker identity can be transmitted even under the across-linguistic circumstances using a bil... [more] |
SP2010-116 pp.7-12 |
SP |
2011-01-28 11:00 |
Kyoto |
NICT |
Iterative Mapping Function Estimation and Environment Structure Refinement in the Online Phase of the ESSEM Approach Yu Tsao, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura (NICT) SP2010-112 |
Recently, we proposed an ensemble speaker and speaking environment modeling (ESSEM) approach to improve automatic speech... [more] |
SP2010-112 pp.55-58 |
EA |
2007-12-07 13:50 |
Tokyo |
NHK Science & Technical Research Laboratories |
Speech Input Method by Using a Small Number of Microphones Toshiharu Horiuchi, Hao Yuan, Hisashi Kawai (KDDI R&D Labs.) EA2007-90 |
This report describes a simple method for noise suppression under distant-talking environment, to deal with relative cha... [more] |
EA2007-90 pp.25-30 |
SP |
2007-11-28 |
Chiba |
Chiba Institute of Technology |
Utterance-based Mean and Segmental Variance Normalization for Robust Speech Recognition Toshiki Endo, Hisashi Kawai (KDDI Labs.) SP2007-90 |
[more] |
SP2007-90 pp.25-30 |
IN, ICM, LOIS (Joint) |
2006-01-19 11:15 |
Kyoto |
Kyoto Univ. |
Study of Performance Enhancement of Header Compression over Wireless Links of Asymmetric Characterictics Norihiro Fukumoto, Hideaki Yamada, Hisashi Kawai (KDDI Labs.) |
A variety of conversational multimedia services have been developed and provided over FMC (Fixed Mobile Convergence) env... [more] |
IN2005-132 pp.19-24 |
MoNA |
2005-09-08 09:50 |
Kyoto |
Kei-Han-Na NICT |
A Speech Translation System Using Mobile Devices and a Field Experiment Genichiro Kikui, Toshiyuki Takezawa, Masahide Mizushima, Yutaka Ashikari, Satoshi Nakamura, Yutaka Sasaki, Hisashi Kawai, Tohru Shimizu, Seiichi Yamamoto (ATR) |
This paper introduces a speech-to-speech translation system using mobile devices as user terminals. Instead of packing e... [more] |
MoMuC2005-32 pp.11-16 |
NS, IN |
2005-03-03 10:30 |
Okinawa |
Okinawa Zanpa-misaki Royal |
A Quality Control Mechanism for Multimedia Streams based on the Multi-RTCP Scheme for QoS Reporting over Wireless IP-Based Networks Norihiro Fukumoto, Hideaki Yamada, Hisashi Kawai (KDDI R&D Labs.) |
A variety of IP-based multimedia applications have been developed and provided due to the penetration of the infrastruct... [more] |
NS2004-221 IN2004-221 pp.125-130 |