Layer Specificity of Acquired Memory Duration in Multilayer LSTM Networks

Kazuki Hatanaka; Jun-Nosuke Teramae; Naoki Wakamiya

Summary

International Symposium on Nonlinear Theory and its Applications

2017

Session Number:A2L-D

Session:

Number:A2L-D-2

Layer Specificity of Acquired Memory Duration in Multilayer LSTM Networks

Kazuki Hatanaka, Jun-Nosuke Teramae, Naoki Wakamiya,

pp.162-165

Publication Date:2017/12/4

Online ISSN:2188-5079

DOI:10.34385/proc.29.A2L-D-2

PDF download (162.5KB)

Summary:

The LSTM network is a recurrent neural network achieving impressive performance on machine learning tasks of sequential data recently. The network consists of many LSTM units that are able to store past inputs into their internal memory variables. If the number of the LSTM units of the network is fixed, achieved performance of the network generally increases as the numbers of layers of the network increases. It remains unclear, however, why deeper LSTM networks can achieve higher performance. As a first step to answer the question, here, we analyze layer-wise difference of temporal dynamics of the memory variable of each unit. We found that units in a deeper layer averagely keep longer memory duration than these in a shallower layer. We also found that memory duration is broadly distributed among units in a deeper layer than a shallower layer. These results mean that units in different layers share different roles of memorization.