Summary

IEICE Information and Communication Technology Forum

2017

Session Number:SESSION07

Session:

Number:SESSION07_2

Malicious PDF Detection Scheme Using the Useful Feature Based on Non-Frequent Keywords in a File

Hiroya Kato,  Shuichiro Haruta,  Iwao Sasase,  

pp.-

Publication Date:2017/10/1

Online ISSN:2188-5079

DOI:10.34385/proc.50.SESSION07_2

PDF download (769.9KB)

Summary:
Detecting malicious PDFs (Portable Document Format) is imperative. As a malicious PDF detection scheme, we focus on the scheme leveraging the fact that the frequency of internal components called keywords is different between legitimate and malicious PDFs. That scheme uses the keywords which frequently appear in the dataset to detect malicious PDFs. However, the keywords appeared only in legitimate or malicious PDFs can be ignored in the conventional scheme. In ignored keywords, if there exist the keywords which can have useful features, that scheme cannot detect malicious PDFs which possess such keywords. In this paper, we propose malicious PDF detection scheme using the useful feature based on non-frequent keywords in a file. Thus, in order to evaluate such keywords precisely, we utilize csub which represents the deference of keywords appeared only in legitimate and malicious dataset. Furthermore, we use nkeyword which denotes the number of non-duplicate keywords appeared in a file. In this way, we can evaluate keywords that the conventional scheme ignores. In order to prevent csub and nkeyword from degrading the detection performance, we get the feature which quantitatively represents maliciousness of a PDF by applying fuzzy inference to these features. Our scheme utilizes this feature with conventional scheme’s features to detect malicious PDFs. By computer simulation with real dataset, we demonstrate our scheme can reduce both false positive and false negative.