Presentation | 2023-03-02 [Invited Talk] -- Yuma Koizumi, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Machine learning tasks that deal with acoustic signals can be broadly classified into "recognizing sounds" and "generating sounds". In particular, the latter sound generation tasks, such as text-to-speech (TTS) and speech enhancement (SE), have made remarkable progress in recent years along with the development of deep generative models. This invited talk will introduce what kind of sound generation tasks exist, what kind of problem settings they are, and outline how diffusion models are used in these tasks. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Sound generation / text-to-speech / neural vocoder / denoising diffusion probablistic models |
Paper # | PRMU2022-87,IBISML2022-94 |
Date of Issue | 2023-02-23 (PRMU, IBISML) |
Conference Information | |
Committee | PRMU / IBISML / IPSJ-CVIM |
---|---|
Conference Date | 2023/3/2(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Future University Hakodate |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Seiichi Uchida(Kyushu Univ.) / Masashi Sugiyama(Univ. of Tokyo) |
Vice Chair | Takuya Funatomi(NAIST) / Mitsuru Anpai(Denso IT Lab.) / Toshihiro Kamishima(AIST) / Koji Tsuda(Univ. of Tokyo) |
Secretary | Takuya Funatomi(CyberAgent) / Mitsuru Anpai(Univ. of Tokyo) / Toshihiro Kamishima(NTT) / Koji Tsuda(Hokkaido Univ.) |
Assistant | Nakamasa Inoue(Tokyo Inst. of Tech.) / Yasutomo Kawanishi(Riken) / Yoshinobu Kawahara(Osaka Univ.) / Taiji Suzuki(Tokyo Inst. of Tech.) |
Paper Information | |
Registration To | Technical Committee on Pattern Recognition and Media Understanding / Technical Committee on Information-Based Induction Sciences and Machine Learning / Special Interest Group on Computer Vision and Image Media |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | [Invited Talk] -- |
Sub Title (in English) | |
Keyword(1) | Sound generation |
Keyword(2) | text-to-speech |
Keyword(3) | neural vocoder |
Keyword(4) | denoising diffusion probablistic models |
1st Author's Name | Yuma Koizumi |
1st Author's Affiliation | Google Research(Google Research) |
Date | 2023-03-02 |
Paper # | PRMU2022-87,IBISML2022-94 |
Volume (vol) | vol.122 |
Number (no) | PRMU-404,IBISML-405 |
Page | pp.pp.149-149(PRMU), pp.149-149(IBISML), |
#Pages | 1 |
Date of Issue | 2023-02-23 (PRMU, IBISML) |