The 2018 International Symposium on Information Theory and Its Applications (ISITA2018)


Session Number:Mo-AM-1-2



Scalable Machine Learning on Compact Data Representations

Yasuo Tabei,  


Publication Date:2018/10/18

Online ISSN:2188-5079


PDF download


With massive high-dimensional data now common-place in research and industry, there is a strong and growing demand for more scalable computational techniques for data analysis and knowledge discovery. In this paper, we review scalable algorithms for learning statistical models on high-dimensional data. Especially, we introduce two techniques of lossless and lossy compressions. The first one is a method using grammar compression. Grammar compression is a lossless compression for texts and has been successfully applied to binary data matrices for scalable learning of statistical models. The second one is a method of lossy compressions named feature maps (FMs). Recently, quite a few number of FMs for kernel approximations have been proposed and have been used in practical applications. Those methods, of which we present a brief survey in this paper, open the door for large-scale analyses of massive and high-dimensional data.