An Android Malware Detection Architecture based on Ensemble Learning

  • Mehmet Ozdemir Department of Computer Engineering, Gebze Institute of Technology, Gebze, Kocaeli
  • Ibrahim Sogukpinar Department of Computer Engineering, Gebze Institute of Technology, Gebze, Kocaeli
Keywords: Ensemble Learning, Multiple Classifier Systems, Mixture of Experts, Selective Ensemble, Malware Detection, Android Malware


In the scope of anomaly based Android malware detection, different type of features has been used to represent applications and lots of algorithms have been applied to evaluate these features. Although researchers have reported accurate results, in order to improve accuracy, sensitivity and generalization, we suggest using an ensemble learning approach for Android malware detection. In this study, we propose to use an ensemble learning system whose base learners are built with different feature subsets which are extracted and processed with multiple methods, and selected with a proposed selective ensemble approach which is based on three criteria: Accuracy, sensitivity and diversity.


. Alpaydin, E.: Introduction to Machine Learning (Adaptive Computation and Ma-chine Learning). The MIT Press (2004)

. Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. ARTIFICIAL INTELLIGENCE 97, 245-271 (1997)

. Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: A survey and categorisation. Journal of Information Fusion 6, 5-20 (2005)

. Burguera, I., Zurutuza, U., Nadjm-Tehrani, S.: Crowdroid: Behavior-based mal-ware detection system for android. In: Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices. pp. 15-26. SPSM '11, ACM, New York, NY, USA (2011)

. Demiroz, A.: Google Play Crawler (2013), google-play-crawler, [Online; accessed 1-April-2014]

. Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research 2, 263-286 (1995)

. Dietterich, T.: Ensemble methods in machine learning. In: Multiple Classifier Sys-tems, Lecture Notes in Computer Science, vol. 1857, pp. 1{15. Springer Berlin Heidelberg (2000)

. Deroski, S., enko, B.: Is combining classifiers with stacking better than selecting the best one? Machine Learning 54(3), 255-273 (2004)

. Enck, W., Gilbert, P., Chun, B.G., Cox, L.P., Jung, J., McDaniel, P., Sheth, A.N.: Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones. In: Proceedings of the 9th USENIX conference on Operating systems design and implementation. pp. 1-6. OSDI'10, USENIX Association, Berkeley, CA, USA (2010

. Enck, W., Ongtang, M., McDaniel, P.: On lightweight mobile phone application certification. In: Proceedings of the 16th ACM Conference on Computer and Com-munications Security. pp. 235-245. CCS '09, ACM, New York, NY, USA (2009)

. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Annals of Statistics 28, 2000 (1998)

. Guyon, I.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157-1182 (2003)

. Hansen, L., Salamon, P.: Neural network ensembles. Pattern Analysis and Machine Intelligence, IEEE Transactions on 12(10), 993-1001 (Oct 1990)

. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832-844 (Aug 1998)

. Kantardzic, M.: Data Mining: Concepts, Models, Methods and Algorithms. John Wiley & Sons, Inc., New York, NY, USA (2002)

. Kuncheva, L., Whitaker, C.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51(2), 181-207 (2003)

. Kwok, S.W., Carter, C.: Multiple decision trees. In: Proceedings of the Fourth Annual Conference on Uncertainty in Artificial Intelligence. pp. 327-338. UAI '88, North-Holland Publishing Co., Amsterdam, The Netherlands, The Netherlands (1990)

. Partridge, D., Yates, W.B.: Engineering multiversion neural-net systems. NEURAL COMPUTATION 8, 869-893 (1995)

. Petrakos, M., Benediktsson, J.A., Kanellopoulos, I.: The effect of classifier agree-ment on the accuracy of the combined classifier in decision level fusion. IEEE T. Geoscience and Remote Sensing 39(11), 2539-2546 (2001)

. Sahs, J., Khan, L.: A machine learning approach to android malware detection. In: Intelligence and Security Informatics Conference (EISIC), 2012 European. pp. 141-147 (Aug 2012)

. dos Santos, E., Sabourin, R., Maupin, P.: Single and multi-objective genetic al-gorithms for the selection of ensemble of classifiers. In: Neural Networks, 2006. IJCNN '06. International Joint Conference on. pp. 3070-3077 (2006)

. Schmidt, A.D., Schmidt, H.G., Clausen, J., Yksel, K.A., Kiraz, O., Camtepe, A., Albayrak, S.: Enhancing security of linux-based android devices. In: in Proceedings of 15th International Linux Kongress. Lehmann (Oct 2008)

. Shabtai, A., Kanonov, U., Elovici, Y., Glezer, C., Weiss, Y.: andromaly: a behavioral malware detection framework for android devices. Journal of Intelligent Information Systems 38(1), 161-190 (2012)

. Sharkey, A.J., Sharkey, N.E.: Combining diverse neural nets. THE KNOWLEDGE ENGINEERING REVIEW 12, 231-247 (1997)

. Tang, E., Suganthan, P., Yao, X.: An analysis of diversity measures. Machine Learning 65(1), 247-271 (2006)

. Teu, P., Kraxberger, S., Orthacker, C., Lackner, G., Gissing, M., Marsalek, A., Leibetseder, J., Prevenhueber, O.: Android market analysis with activation pat-terns. In: Prasad, R., Farkas, K., Schmidt, A., Lioy, A., Russello, G., Luccio, F. (eds.) Security and Privacy in Mobile Information and Communication Sys-tems, Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol. 94, pp. 1-12. Springer Berlin Heidelberg (2012)

. Tulyakov, S., Jaeger, S., Govindaraju, V., Doermann, D.: Review of classifier com-bination methods. In: In Machine Learning in Document Analysis and Recognition. Informatica 34 (2010) 111118 S. Vemulapalli et al (2008)

. Wolpert, D.H.: Stacked generalization. Neural Networks 5, 241-259 (1992)

. Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. pp. 856-863 (2003)

. Zhou, W., Zhou, Y., Jiang, X., Ning, P.: Detecting repackaged smartphone applica-tions in third-party android marketplaces. In: Proceedings of the second ACM con-ference on Data and Application Security and Privacy. pp. 317-326. ACM (2012)

. Zhou, Y., Jiang, X.: Dissecting android malware: Characterization and evolution. In: Security and Privacy (SP), 2012 IEEE Symposium on. pp. 95-109 (May 2012)

. Zhou, Y., Wang, Z., Zhou, W., Jiang, X.: Hey, you, get off of my market: Detecting malicious apps in official and alternative android markets. Proceedings of the 19th Annual Network and Distributed System Security Symposium pp. 5-8 (2012)

. Zhou, Z.H., Wu, J., Tang, W.: Ensembling neural networks: Many could be better than all. Artificial Intelligence 137(12), 239-263 (2002)

How to Cite
Ozdemir, M., & Sogukpinar, I. (2014). An Android Malware Detection Architecture based on Ensemble Learning. Transactions on Machine Learning and Artificial Intelligence, 2(3), 90-106.