Application of Hybrid Machine Learning to Detect and Remove Malware
Anti-malware software traditionally employ methods of signature-based and heuristic-based detection. These detection systems need to be manually updated with new behaviors to detect new, unknown, or adapted malware. Our goal is to create a new malware detection solution that will serve three purposes: to automatically identify and classify unknown files on a spectrum of malware severity; to introduce a hybrid machine learning approach to detect modified malware traces; and to increase the accuracy of detection results. Our solution is accomplished through the use of data mining and machine learning concepts and algorithms. We perform two types of data mining on samples, extracting n-grams and PE features that are used for our machine learning environment. We also introduce a new hybrid learning approach that utilizes both supervised and unsupervised machine learning in a two-layer protocol. A supervised algorithm is applied to classify if a file is considered malware or benign. The files classified as malware will then be categorized and then assigned on a severity spectrum using the SOFM unsupervised algorithm.
(1) Abdel-Aty-Zohdy, H. S., & Zohdy, M. A. (2001). Self-organizing feature maps. Wiley encyclopedia of electrical and electronics engineering () John Wiley & Sons, Inc. doi:10.1002/047134608X.W5114
(2) Akpojaro, J., Aigbe, P., Onwudebelu,U. (2014). Unsupervised machine learning techniques for detecting malware applications in wireless devices. Transactions on Machine Learning and Artificial Intelligence, 2(3)
(3) Bahraminikoo, P., Samiei yeganeh, M., & Babu, G. P. (2012). Utilization data mining to detect spyware. IOSR Journal of Computer Engineering, 4(3)
(4) Baldangombo, U., Jambaljav, N., & Horng, S. (2013). A static malware detection system using data mining methods. Corr, abs/1308.2831
(5) Chavan, M. k., & Zende, D. A. (2013). Spyware solution: Detection of spyware by data mining and machine learning technique. International Conference on Advanced Research in Engineering and Technology, Vijayawada, India.
(6) Elovici, Y., Shabtai, A., Moskovitch, R., Tahan, G., & Glezer, C. (2007). Applying machine learning techniques for detection of malicious code in network traffic.4667, 44-50. doi:10.1007/978-3-540-74565-5_5
(7) Lavesson, N., Boldt, M., Davidsson, P., & Jacobsson, A. (2011). Learning to detect spyware using end user license agreements. Knowledge and Information Systems, 26(2), 285-307.
(8) Shahzad, R. K., Haider, S. I., & Lavesson, N. (2010). Detection of spyware by mining executable files. Availability, Reliability, and Security, 2010. ARES '10 International Conference on, Krakow. 295-302.
(9) Singhal, P., & Raul, N.Malware detection module using machine learning algorithms to assistin centralized security in enterprise networks.
(10) Wang, T., Horng, S., Su, M., Wu, C., Wang, P., & Su, W. (2006). A surveillance spyware detection system based on data mining methods. IEEE Congress on Evolutionary Computation, Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada. 3236-3241.