Presentation 2014-06-21
Determining the Number of Topics for LDA Method and Evaluating Extracted Topics : With an Application to Twitter Streaming Data
Iwao FUJINO, Yuko HOSHINO,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Topic model is an emerging approach to summarize data, especially text data, in terms of a small set of latent variables. The most useful implement of topic model is LDA method, which is an unsupervised machine learning technique to identify latent topic information from a massive document collection. However, sometimes the LDA method gives some hard-understanding or meaningless results. In order to improve this problem, in this paper we proposed a method for refining result of LDA and also ranking topics in order of some significant criterion. Our study is based on two assumptions. The first assumption is that the correlation coefficient between any two different topics should be zero under ideal condition. The second assumption is that the quality of topics can be defined as a deviation from usual word distribution. Starting from these two assumptions, we provided a concrete method to determine the number of topics when using LDA method to extract topics from documents data and also to ranking the LDA results in order of quality. As a confirmation of our proposed methods, we conducted some experiments to processing Twitter streaming data. The results of these experiments show that our methods work efficiently as expected.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Topic model / LDA(Latent Dirichlet Allocation) / Correlation coefficient / JS divergence / Twitter
Paper # DE2014-16
Date of Issue

Conference Information
Committee DE
Conference Date 2014/6/14(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Data Engineering (DE)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Determining the Number of Topics for LDA Method and Evaluating Extracted Topics : With an Application to Twitter Streaming Data
Sub Title (in English)
Keyword(1) Topic model
Keyword(2) LDA(Latent Dirichlet Allocation)
Keyword(3) Correlation coefficient
Keyword(4) JS divergence
Keyword(5) Twitter
1st Author's Name Iwao FUJINO
1st Author's Affiliation School of information and telecommunication engineering, Tokai University()
2nd Author's Name Yuko HOSHINO
2nd Author's Affiliation School of information and telecommunication engineering, Tokai University
Date 2014-06-21
Paper # DE2014-16
Volume (vol) vol.114
Number (no) 101
Page pp.pp.-
#Pages 6
Date of Issue