Committee |
Date Time |
Place |
Paper Title / Authors |
Abstract |
Paper # |
SIS |
2024-03-14 13:00 |
Kanagawa |
Kanagawa Institute of Technology (Primary: On-site, Secondary: Online) |
On Time-Position Detection of Signals under Noise Considering Threshold
-- Applications of Fractal Dimension Filters -- Hideo Shibayama (Shibaura Institute of Technology), Yoshiaki Makabe (Kanagawa Institute of Technology), Kenji Muto (Shibaura Institute of Technology), Tomoaki Kimura (Kanagawa Institute of Technology) SIS2023-45 |
Conflicts due to neighborhood noise can occur even when the sound pressure level is low. In such cases, the sound pressu... [more] |
SIS2023-45 pp.1-6 |
CAS, CS |
2024-03-14 13:30 |
Okinawa |
|
Characterization of Semantic Communications in Speech Signal Transmission Futo Iwanaga, Daisuke Umehara (Kyoto Inst. of Tech.) CAS2023-118 CS2023-111 |
In recent years, the volume of data in data communication has surged, Characterization of Semantic Communications in Spe... [more] |
CAS2023-118 CS2023-111 pp.41-46 |
CAS, CS |
2024-03-14 15:55 |
Okinawa |
|
Residual Noise Removal in of Sound Source Separation Signal by Spectral Replacement Taiga Saito, Kenji Suyama (Tokyo Denki Univ.) CAS2023-122 CS2023-115 |
Although sound source separation method based on a multiplication of multiple weighted sum circuits has high suppression... [more] |
CAS2023-122 CS2023-115 pp.64-69 |
IE, MVE, CQ, IMQ (Joint) [detail] |
2024-03-14 16:20 |
Okinawa |
Okinawa Sangyo Shien Center (Primary: On-site, Secondary: Online) |
Impression of Character's Responding Behavior Corresponding to User's Speech Proactiveness Naoki Matsumura, Tomoko Yonezawa (Kansai Univ.) IMQ2023-51 IE2023-106 MVE2023-80 |
This study aimed to improve speech proactivity in foreign-language face-to-face dialogue. We have examined a system to e... [more] |
IMQ2023-51 IE2023-106 MVE2023-80 pp.208-213 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-02-29 10:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Multi-task learning with age information model for highly accurate elderly speech recognition. Shine Takumi, Kinouchi Takahiro, Wakabayashi Yukoh, Kitaoka Norihide (TUT) EA2023-64 SIP2023-111 SP2023-46 |
The speech recognition of the elderly is less accurate, especially in smart speaker speech recognition, due to aging-rel... [more] |
EA2023-64 SIP2023-111 SP2023-46 pp.19-24 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-02-29 15:45 |
Okinawa |
(Primary: On-site, Secondary: Online) |
|
We have developed automatic speech recognition and dialect identification techniques by using COJADS, a corpus of Japane... [more] |
|
SIP, SP, EA, IPSJ-SLP [detail] |
2024-02-29 16:45 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Multiple Lag Window Pairs for Estimation of Fundamental Frequency and Periodicity Measure of Speech Signals Michiki Koshimori (UEC), Shigeki Sagayama (UTokyo/UEC), Toru Nakashika (UEC) EA2023-75 SIP2023-122 SP2023-57 |
Extending the main concept of modified autocorrelation method in LPC, we investigate lag windows, lag window pairs, and ... [more] |
EA2023-75 SIP2023-122 SP2023-57 pp.85-90 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 09:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
An experimental survey on speaker embedding spaces for controlling speaker identity in speech synthesis system Wakuto Morita, Daisuke Saito, Nobuaki Minematsu (Univ. of Tokyo) EA2023-93 SIP2023-140 SP2023-75 |
This study investigated the influence of the discriminability of speaker encoders on speech synthesis models that can co... [more] |
EA2023-93 SIP2023-140 SP2023-75 pp.190-195 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 09:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
SELECTING N-LOWEST SCORES FOR TRAINING MOS PREDICTION MODELS Yuto Kondo, Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko (NTT) EA2023-94 SIP2023-141 SP2023-76 |
Automatic speech quality assessment (SQA) is a task to evaluate the quality of speech samples without resorting to time-... [more] |
EA2023-94 SIP2023-141 SP2023-76 pp.196-201 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 09:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Improving training recipe of Remixed2Remixed for speech enhancement Li Li, Shogo Seki (CyberAgent) EA2023-95 SIP2023-142 SP2023-77 |
In the use of deep learning for speech enhancement, supervised learning models that use pairs of clean speech and artifi... [more] |
EA2023-95 SIP2023-142 SP2023-77 pp.202-207 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 09:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Multi-Dialect Speech Synthesis with Interpretable Accent latent Variable based on VQ-VAE Kazuki Yamauchi, Yuki Saito, Hiroshi Saruwatari (UTokyo) EA2023-98 SIP2023-145 SP2023-80 |
In this paper, we address two tasks: "Intra-dialect Text-to-Speech (TTS)," aiming to synthesize speech in the same diale... [more] |
EA2023-98 SIP2023-145 SP2023-80 pp.220-225 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 09:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Constructing and Evaluating a Batch Voice Input System for Electronic Medical Records Using Large Language Models Ryo Maejima, Norihide Kitaoka (TUT) EA2023-99 SIP2023-146 SP2023-81 |
This study aims to develop an electronic medical record with a voice input interface that lets users input several items... [more] |
EA2023-99 SIP2023-146 SP2023-81 pp.226-231 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 09:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Domain adaptation of speech recognition model based on multilingual SSL model with only nonparallel corpus. Takahiro Kinouchi (TUT), Atsunori Ogawa (NTT), Yukoh Wakabayashi (TUT), Kengo Ohta (NITA), Norihide Kitaoka (TUT) EA2023-100 SIP2023-147 SP2023-82 |
Automatic speech recognition (ASR) models are used in various services and businesses, and each domain’s recognition acc... [more] |
EA2023-100 SIP2023-147 SP2023-82 pp.232-237 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 09:30 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Evaluation of Automatic Speech Recognition for Deaf and Hard-of-Hearing People by Speaker Adaptation. Kaito Takahashi, Takahiro Kinouchi, Yukoh Wakabayashi (TUT), Kengo Ohta (NITAC), Akio Kobayashi (Yamato Univ.), Norihide Kitaoka (TUT) EA2023-102 SIP2023-149 SP2023-84 |
Communication between normal-hearing people and the deaf is generally used sign language, written communication, and spe... [more] |
EA2023-102 SIP2023-149 SP2023-84 pp.244-249 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 10:40 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Intermediate speaker speech synthesis between two speakers using x-vector speaker space Sota Hosoi, Takahiro Kinouchi, Yukoh Wakabayashi, Norihide Kitaoka (TUT) EA2023-103 SIP2023-150 SP2023-85 |
Recent advancements in speech synthesis technologies have enabled the synthesis of speeches of speakers not in the train... [more] |
EA2023-103 SIP2023-150 SP2023-85 pp.250-255 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 10:40 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Speech representation based on VAE assuming gamma distribution for latent variables and observation Nanako Imaichi, Toru Nakashika (UEC) EA2023-104 SIP2023-151 SP2023-86 |
Recently, deep generative models that can represent complex relationships in data generation have been attracting attent... [more] |
EA2023-104 SIP2023-151 SP2023-86 pp.256-261 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 10:40 |
Okinawa |
(Primary: On-site, Secondary: Online) |
An Investigation into Weighting Strategies for Model Averaging in Continual Learning for Automatic Speech Recognition Kentaro Shinayama, Hiroshi Sato, Tomoharu Iwata, Takeshi Mori, Taichi Asami (NTT) EA2023-105 SIP2023-152 SP2023-87 |
In recent years, the application scope of speech recognition AI has expanded, enabling the acquisition of diverse data d... [more] |
EA2023-105 SIP2023-152 SP2023-87 pp.262-267 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 10:40 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Substitution of Implicit Linguistic Information in Beam Search Decoding Using CTC-based Speech Recognition Models Tatsunari Takagi, Yukoh Wakabayashi (TUT), Atsunori Ogawa (NTT), Norihide Kitaoka (TUT) EA2023-106 SIP2023-153 SP2023-88 |
The rise of neural networks in the field of automatic speech recognition has notably improved the accuracy of speech rec... [more] |
EA2023-106 SIP2023-153 SP2023-88 pp.268-273 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 10:40 |
Okinawa |
(Primary: On-site, Secondary: Online) |
An Investigation on the Speech Recovery from EEG Signals Using Transformer Tomoaki Mizuno (The Univ. of Electro-Communications), Takuya Kishida (Aichi Shukutoku Univ.), Natsue Yoshimura (Tokyo Tech), Toru Nakashika (The Univ. of Electro-Communications) EA2023-108 SIP2023-155 SP2023-90 |
Synthesizing full speech from ElectroEncephaloGraphy(EEG) signals is a challenging task. In this paper, speech reconstru... [more] |
EA2023-108 SIP2023-155 SP2023-90 pp.277-282 |
SIP, SP, EA, IPSJ-SLP [detail] |
2024-03-01 15:25 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Investigation of objective intelligibility metrics based on speech foundation models for Clarity Prediction Challenge 2 Katsuhiko Yamamoto (CyberAgent) EA2023-119 SIP2023-166 SP2023-101 |
Speech Foundation Models (SFMs), which use components like the encoder layer of Whisper, have been suggested to separate... [more] |
EA2023-119 SIP2023-166 SP2023-101 pp.334-339 |