講演抄録/キーワード |
講演名 |
2008-09-22 12:45
[招待講演]Data Stream Processing Research at IMC of East China Normal University ○Aoying Zhou・Cheqing Jin・Weining Qian(East China Normal Univ.) DE2008-49 |
抄録 |
(和) |
Data stream processing has been attracting more and more attention in research and industry communities due to its broad potential applications. In this talk, we would like to introduce briefly the research work which have been done in our group. Our research interests on data streams are frequent item(set)s mining, clustering, and burst detection over data streams. Some work on practical application and some consideration on future work will be introduced as well.
For the basic problem of mining frequent items over data streams, an algorithm, called hCount is proposed. It is of low space complexity, low per-tuple processing cost, and high recall and precision. Then, for mining of the frequent itemsets, we develop a new false-negative frequent itemset mining algorithm which can get a condensed representation of frequent itemsets in transactional data streams by discovering a false negative collection of some special itemsets that covers frequent itemsets with high probability with respect to set inclusion relationship among itemsets.
Our research on data stream mining was focusing on clustering of data streams. SWClustering is the algorithm we proposed to cluster data streams over sliding windows, and EHCF (Exponential Histogram of Cluster Features) is the synopsis to maintain the statistic information of clusters in sliding windows. With SWClustering, not only the changing distribution of clusters but also the evolving behaviors of individual clusters could be captured. CluDistream is for clustering distributed data streams, which can effectively handle a huge volume of data with noisy, corrupted or incomplete data records generated in distributed enviornment. In CluDistream, the EM-based (Expectation Maximization) algorithms, each data record is assigned to a cluster with certain degree of membership.
The other important piece of work is on burst detection or monitoring over data streams. The fractal analysis method is adapted to enable the monitoring of both monotonic and non-monotonic aggregates on time changing data stream. The monotony property of aggregate monitoring is revealed and monotonic search space is built to decrease the time overhead for detecting bursts from O(m) to O(log m), where m is the number of windows to be monitored. With the help of a novel piecewise fractal model, the statistical summary is compressed to be fit in limited main memory, so that high aggregates on windows of any length can be detected accurately and efficiently on-line.
A practical data stream processing system for telecommunication network flow data analysis will be also introduced in this talk. |
(英) |
Data stream processing has been attracting more and more attention in research and industry communities due to its broad potential applications. In this talk, we would like to introduce briefly the research work which have been done in our group. Our research interests on data streams are frequent item(set)s mining, clustering, and burst detection over data streams. Some work on practical application and some consideration on future work will be introduced as well.
For the basic problem of mining frequent items over data streams, an algorithm, called hCount is proposed. It is of low space complexity, low per-tuple processing cost, and high recall and precision. Then, for mining of the frequent itemsets, we develop a new false-negative frequent itemset mining algorithm which can get a condensed representation of frequent itemsets in transactional data streams by discovering a false negative collection of some special itemsets that covers frequent itemsets with high probability with respect to set inclusion relationship among itemsets.
Our research on data stream mining was focusing on clustering of data streams. SWClustering is the algorithm we proposed to cluster data streams over sliding windows, and EHCF (Exponential Histogram of Cluster Features) is the synopsis to maintain the statistic information of clusters in sliding windows. With SWClustering, not only the changing distribution of clusters but also the evolving behaviors of individual clusters could be captured. CluDistream is for clustering distributed data streams, which can effectively handle a huge volume of data with noisy, corrupted or incomplete data records generated in distributed enviornment. In CluDistream, the EM-based (Expectation Maximization) algorithms, each data record is assigned to a cluster with certain degree of membership.
The other important piece of work is on burst detection or monitoring over data streams. The fractal analysis method is adapted to enable the monitoring of both monotonic and non-monotonic aggregates on time changing data stream. The monotony property of aggregate monitoring is revealed and monotonic search space is built to decrease the time overhead for detecting bursts from O(m) to O(log m), where m is the number of windows to be monitored. With the help of a novel piecewise fractal model, the statistical summary is compressed to be fit in limited main memory, so that high aggregates on windows of any length can be detected accurately and efficiently on-line.
A practical data stream processing system for telecommunication network flow data analysis will be also introduced in this talk. |
キーワード |
(和) |
Data stream processing / Frequent item / Clustering / Burst Detection / / / / |
(英) |
Data stream processing / Frequent item / Clustering / Burst Detection / / / / |
文献情報 |
信学技報, vol. 108, no. 211, DE2008-49, pp. 39-40, 2008年9月. |
資料番号 |
DE2008-49 |
発行日 |
2008-09-14 (DE) |
ISSN |
Print edition: ISSN 0913-5685 Online edition: ISSN 2432-6380 |
著作権に ついて |
技術研究報告に掲載された論文の著作権は電子情報通信学会に帰属します.(許諾番号:10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
PDFダウンロード |
DE2008-49 |
|