Presentation 2018-07-28
Improve accuracy of predicting confidential words in juditial precedents
Masakazu Kanazawa, Atsushi Ito, Yuya Kiryu, Kazuyuki Yamaswa, Takehiko Kasahara,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) The judicial system in which IT technology was introduced is called Cyber Court. In judicial field in Japan, confidential words such as personal names and personal places are converted to meaningless words manually because Japanese values do not allow the disclosure of the individual information. It is not easy to construct a comprehensive dictionary for detecting confidential words. We have already proposed two models that predict confidential words automatically by using neural networks. We used long short-term memory (LSTM) and continuous bag-of-words (CBOW) as our language models. Firstly, we explained the possibility of detecting the words surrounding a confidential word by using CBOW. Then, we proposed two models to predict the confidential words from the neighboring words by applying LSTM. The first model imitates the anonymization work by a human being, and the second model was based on CBOW. The results show that the first model is more effective for predicting confidential words than the simple LSTM model. We expected the second model to have paraphrasing ability to increase the possibility of finding other paraphraseable One is Bi-directional LSTM LR model and the other is Sum-LSTM based on the CBOW model. The two proposed models were effective for predicting all the words. However, only the Bi-directional LSTM LR model was effective for predicting confidential words. This could have happened because Sum-LSTM was based on CBOW. CBOW is an effective model for paraphrasing words; therefore, Sum-LSTM also has that mechanism. Therefore, when Sum-LSTM predicted a word whose answer was “confidential,” the CW_PPL (Confidential word perplexity) became worse because there was a possibility of paraphrasing words of paraphrasing words such as “plaintiff,” “defendant,” “doctor,” and “teacher.” Knowing the paraphrased words of the confidential words meant that the embedding vectors of the confidential words could be successfully generated. This meant that the model could recognize the meaning of “confidential.” However, the prediction accuracy did not improve; therefore, there was a problem in calculating the probability of the prediction task. To solve this problem, we could exclude these paraphraseable words from the choices when calculating the probability. It is also important to examine scores other than PPL. Then we consider to improve accuracy of predicting confidential words. At first we focus the parameter of neural network. We would change the window size to large because we think the surrounding words of the target words (that is window size) is larger, accuracy will be high. Then, we are experimenting but we can’t get the results by now to spent more memory and more calculating time. Next, we would use the proper noun dictionary combined neural network. But we don’t discuss in the this paper.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Bi-directional LSTMCBOWpplcw-pplwindow size
Paper # TL2018-12
Date of Issue 2018-07-21 (TL)

Conference Information
Committee TL
Conference Date 2018/7/28(2days)
Place (in Japanese) (See Japanese page)
Place (in English) Keio University
Topics (in Japanese) (See Japanese page)
Topics (in English) Human Language Processing and Learning
Chair Hiroshi Sano(Tokyo Univ. of Foreign Studies)
Vice Chair Tadahisa Kondo(Kogakuin Univ.) / Kazuhiro Takeuchi(Osaka Electro-Comm. Univ.)
Secretary Tadahisa Kondo(Kobe Gakuin Univ.) / Kazuhiro Takeuchi(Kyoto Inst. of Tech.)
Assistant Nobuyuki Jincho(Waseda Univ.) / Akinori Takada(Ferris Univ.) / Akio Ishikawa(KDDI Research)

Paper Information
Registration To Technical Committee on Thought and Language
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Improve accuracy of predicting confidential words in juditial precedents
Sub Title (in English)
Keyword(1) Bi-directional LSTMCBOWpplcw-pplwindow size
1st Author's Name Masakazu Kanazawa
1st Author's Affiliation Utsunomiya University(Utsunomiya Univ.)
2nd Author's Name Atsushi Ito
2nd Author's Affiliation Utsunomiya University(Utsunomiya Univ.)
3rd Author's Name Yuya Kiryu
3rd Author's Affiliation KDDI Corporation(KDDI)
4th Author's Name Kazuyuki Yamaswa
4th Author's Affiliation TKC Corporation(TKC)
5th Author's Name Takehiko Kasahara
5th Author's Affiliation Toin Yokohama Univeecity(Toin Yokohama Univ.)
Date 2018-07-28
Paper # TL2018-12
Volume (vol) vol.118
Number (no) TL-163
Page pp.pp.1-6(TL),
#Pages 6
Date of Issue 2018-07-21 (TL)