# Best Paper Award

## Probabilistic Models Based on Non-Negative Matrix Factorization for Inconsistent Resolution Dataset Analysis

### Masahiro KOHJIMA, Tatsushi MATSUBAYASHI, Hiroshi SAWADA

#### [Trans. Inf. & Syst. (JPN Edition) , Vol.J100-D, No.4 April 2017]

• Masahiro KOHJIMA

• Tatsushi MATSUBAYASHI

In recent years, the significance of data analysis has been widely recognized in various fields. Data collected for this purpose is often heterogeneous due to various realistic restrictions.
For example, if a customer in a retail store shows his or her membership card, the purchase history information is obtained in individual units (fine-grained data); if the customer does not show his or her membership card, the cashier estimates the attributes (such as sex and age) of the customer by his or her appearance to obtain purchase history information in group units (coarse-grained data).
As a result, the collected dataset consists of inconsistent resolution data.
Non-negative Matrix Factorization (NMF) is well-known as an important method in data analysis.
It is extended to be able to handle multiple data in an integrated manner, called Non-negative Multiple Matrix Factorization (NMMF), however no previous method including NMMF takes the difference in data resolution into consideration.
This research thus focuses on this point and extends NMF to be able to handle multiple inconsistent resolution data simultaneously.
Here, the essential part considers the data analysis problem that assumes the two conditions (A1) a common user set and (A2) independent and identically distributed random variables.
For example, suppose that a retail store introduced membership cards and all customers have been required to show membership cards since then.
Before and after the introduction, (A1) means that customers are unchanged and (A2) means that the probability of each customer purchasing items also is unchanged.
This paper presents an NMF-based solution method for the basic data analysis problem with (A1) and (A2).
Based on this method, it then illustrates how another problem that does not match with the conditions (A1) and (A2) can be solved with a similar approach. By this, it demonstrates that this research can serve as a foundational approach to various different analysis problems over inconsistent resolution data.
To summarize, this paper extends the important data analysis method, NMF, to be able to handle multiple inconsistent resolution data.
Hence, it is expected to advance and promote this research area and to be applied in various areas.
Due to the major contributions described above, this paper is deserving of the IEICE Best Paper Award.