Best Paper Award

Training of Gated Recurrent Units Constrained to Locally Stable to Prevent the Gradient Exploding

Sekitoshi KANAI,Yasuhiro FUJIWARA,Sotetsu IWAMURA,Shuichi ADACHI

[IEICE TRANS. INF. & SYST., Vol. J102-D No. 8 AUGUST 2019]

In this paper, a new effective learning method for Gated Recurrent Unit (GRU) is proposed. GRU is a neural network model which has been widely used in time series analysis in recent years. The authors regarded GRU as a nonlinear dynamic system and the gradient explosion problem on GRU learning is addressed based on dynamics analysis. The proposed method can learn GRU efficiently while avoiding the gradient explosion. The effectiveness of the proposed method and accuracy of learnt model are verified in detail thorough real data experiences.

Specifically, the authors focus on the fact that a major cause of the gradient explosion is a phenomenon generally called gbifurcationh which occurs due to changes in stability at the equilibrium point. To avoid bifurcation during the learning process, they propose a method for optimizing a GRU with a constraint on the maximum value of the spectral norm of the weight matrix of the regression loop and devised the method to perform this efficiently. As a result of evaluation experiments using language and music data, it was shown that the proposed method can suppress gradient explosion more effectively than the existing method and can achieve higher accuracy.

Although a similar learning method that constrains the maximum singular value has already been proposed for other models (such as CNN), the authors applied it to the more complicated model, GRU. This is major novelty of this paper. The authors theoretically developed an optimization method based on eigenvalue constrains, and then an improved method is devised based on the singular value constraints technique which can achieve the same effect while reducing the computational cost. In addition to this, further improvements to the method are addressed for expanding to multi-layer GRUs. Unique ideas can be found in the paper and high originality is recognized.

Since GRU is becoming widely used in various fields recently, the gradient explosion during the learning process is an important issue. It is significant that a stable learning method for GRU was developed. The discussion based on mathematical theory is polite and its logic is clear. In addition to theoretical discussions, evaluations are conducted using actual data, which is also strong point. The experiment and its analysis are generally appropriate, and the effectiveness of the proposed method is sufficiently reliable. The paper offers very useful results and contributes greatly to many readers of the Society. From the above, this paper is considered suitable for the Societyfs Paper Award.

Close