Summary
International Technical Conference on Circuits/Systems, Computers and Communications
2016
Session Number:P3
Session:
Number:P3-7
Spark vs. Virtualized Spark: A Performance Analysis
Wenjing Jin, Jae W. Lee ,
pp.1019-1020
Publication Date:2016/7/10
Online ISSN:2188-5079
DOI:10.34385/proc.61.P3-7
PDF download (953.7KB)
Summary:
Apache Spark is an open-source framework for scalable big data processing. OpenStack is a popular virtualization framework that provides Infrastructure as a Service (IaaS) on cloud. Deploying Spark on OpenStack provides many benefits such as on-demand resource scaling, greater availability and flexibility. However, this virtualized Spark is likely to have very different performance characteristics from the native Spark. This paper aims to quantize the cost of virtualization on a Spark cluster. Our experiments demonstrate that (i) the virtualized Spark with four nodes is about 1.58X slower than the native Spark, (ii) all of network, CPU and GC cause this slowdown. Overall, the network waiting time and CPU time contribute the most to the increased execution time, and the GC time has the highest increasing rate.