Best Paper Award 2014

Best Paper Award

Application of Out-of-Order Execution to Parallel Data Processing Systems and Evaluation of Its Effectiveness

Hiroyuki Yamada, Kazuo Goda, Masaru Kitsuregawa

[Trans. Inf. & Syst. (JPN Edition), Vol. J97-D No.4, Apr. 2014]

Hiroyuki Yamada

Kazuo Goda

Masaru Kitsuregawa

  Recently, various kinds of information are stored as digital data and applied to business efficiency improvement and decision-making sophistication in corporate as well as academic research. The amount and size of such data are rapidly growing, and thus developing fundamental technologies for processing such large-scale data is an immensely important problem.
  Previously, the authors applied the out-of-order execution principle to a relational database engine, in which parallel distributed data processing is not considered. In this paper, the authors propose a novel parallel data processing system called Hadooode, which applies the out-of-order execution principle to Hadoop and Hive. In the proposed system, each node decomposes a task into subtasks, and the input/output of the subtasks are processed asynchronously over the secondary disks of remote nodes as well as its own local disks. This approach desynchronizes all the input/output of the parallel data processing system, which can greatly increase the efficiency of data processing. Moreover, the out-of-order execution principle is also applied to Hive in order to improve the efficiency of query processing. The original Hive scans whole data when processing a query, while Hadooode optimizes a query by using indexes, which is particularly effective for queries with low selectivity.
  The results of experimental evaluations are presented in this paper. In the first experiment, the performance of the proposed system was compared with those of other systems in which some or all of the asyncronizations of accesses to local/remote secondary storages are disabled. The result showed that the proposed system is more than 100 times faster than the other systems for queries with low selectivity. Moreover, it was also shown that the proposed system has higher node scalability than other Hadoop-based systems.
  Consequently, the proposed system is a highly promising approach for effective big-data utilization, and this paper surely deserves the Best Paper Award.