Presentation 2020-12-02
Sentence Boundary Disambiguation for Miscellaneous Writings in Japanese
Sanae Yamashita, Noriyuki Okumura,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) We can see miscellaneous writings in Twitter post (tweets) and so on with informal expressions. Japanese miscellaneous writings often have unknown words, abbreviations, and various end-of-sentence expressions, therefore, it is hard to take morphological analysis or to split into sentences. This research examines two issues: what kind of words do miscellaneous writings have and how people's sentence splittings are different. We also suggest the way of sentence boundary disambiguation. We estimated the end of sentences from miscellaneous writings by rule-based judgment and CRF using a human-annotated EOS corpus. The $F_{1}$ value is 0.86.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) sentence boundary disambiguation / miscellaneous writings
Paper # NLC2020-16,SP2020-19
Date of Issue 2020-11-25 (NLC, SP)

Conference Information
Committee NLC / IPSJ-NL / SP / IPSJ-SLP
Conference Date 2020/12/2(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Online
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair Kazutaka Shimada(Kyushu Inst. of Tech.) / 関根 聡(理研) / Hisashi Kawai(NICT) / 北岡 教英(豊技大)
Vice Chair Mitsuo Yoshida(Toyohashi Univ. of Tech.) / Takeshi Kobayakawa(NHK)
Secretary Mitsuo Yoshida(Univ. of Tokyo) / Takeshi Kobayakawa(Hiroshima Univ. of Economics) / (デンソーITラボ) / (小樽商科大) / (茨城大)
Assistant Kanjin Takahashi(Sansan) / Ko Mitsuda(NTT) / / Yusuke Ijima(NTT)

Paper Information
Registration To Technical Committee on Natural Language Understanding and Models of Communication / Special Interest Group on Natural Language / Technical Committee on Speech / Special Interest Group on Spoken Language Processing
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Sentence Boundary Disambiguation for Miscellaneous Writings in Japanese
Sub Title (in English)
Keyword(1) sentence boundary disambiguation
Keyword(2) miscellaneous writings
1st Author's Name Sanae Yamashita
1st Author's Affiliation National Institute of Technology, Akashi College(NIT, Akashi College)
2nd Author's Name Noriyuki Okumura
2nd Author's Affiliation Otemae University(Otemae Univ.)
Date 2020-12-02
Paper # NLC2020-16,SP2020-19
Volume (vol) vol.120
Number (no) NLC-270,SP-271
Page pp.pp.19-24(NLC), pp.19-24(SP),
#Pages 6
Date of Issue 2020-11-25 (NLC, SP)