Machine learning-based approach for designing and implementing a collaborative fraud detection model through CDR and traffic analysis

Authors

  • Eric Michel DEUSSOM DJOMADJI Department of electrical and electronic engineering College of Technology, University of Buea Division of information and communications technology NASPT, University of Yaoundé I
  • Bequerelle MATEMTSAP MBOU Division of information and communications technology NASPT, University of Yaoundé I
  • Aurelle TCHAGNA KOUANOU Department of computer engineering, College of Technology University of Buea, Cameroon Division of information and communications technology NASPT, University of Yaoundé I
  • Michael EKONDE SONE Department of electrical and electronic engineering College of Technology, University of Buea
  • Parfait BAYONBOG Division of information and communications technology NASPT, University of Yaoundé I

DOI:

https://doi.org/10.14738/tmlai.104.12854

Keywords:

Fraud, Traffic, Personal Data, CDR, Machine Learning, K-Means

Abstract

Fraud in telecommunications networks is a constantly growing phenomenon that causes enormous financial losses for both the individual user and the telecommunications operators. We can denote many researchers who have proposed various approaches to provide a solution to this problem, but still need to be improve to ensure the efficiency. Detecting fraud is difficult and, it's no surprise that many frauds schemes have serious limitations. Different types of fraud may require different systems, each with different procedures, parameter adjustments, database interfaces, and case management tools and capabilities. This article uses the K-Means algorithm to handle fraud detection based on Call Detail Record (CDR) and traffic analysis in a telecommunication industry. Our algorithm consists to compare traffic and CDR generated in the network and check if there is abnormal behavior and if yes, our model is used to confirm if users suspecting of fraud are really fraudster or not. To build our model we used real word CDR data collected in November 2021. Our model associates the Differential Privacy model to encrypt users' personal information, and the k-means algorithm to group users into different clusters. Those clusters represent non fraud users having similar characteristics based on criteria used to build the model. Users having abnormal behavior that can be assimilated to fraudsters are those who are far from the different clusters center. Thanks to a representation in a plan, we better visualize user’s behavior. We validated our model by evaluating our segmentation method. The interpretation of the results shows sufficiently that our approach allows to obtain better results. Our approach can be used by all telecommunications operator to reduce the impact of fraud on internet services.

References

Ewieda M, Shaaban EM, Roushdy M (2021) Review of Data Mining Techniques for Detecting Churners in the Telecommunication Industry, Future Computing and Informatics Journal, Vol. 6 : Iss. 1. DOI: http://doi.org/10.54623/fue.fcij.6.1.1

Hadi MS, Lawey AQ, El-Gorashi TEH, JMH (2018) Big data analytics for wireless and wired network design: A survey, Computer Networks 132, 180–199. https://doi.org/10.1016/j.comnet.2018.01.016

S. Gee, Fraud and Fraud Detection: A Data Analytics Approach, Hoboken: John Wiley & Sons, Inc., 2015.

Ouyang Y, Huet A, Shim JP and Hu M (2016) Latent Clustering Models for Outlier Identification in Telecom Data, Mobile Information Systems Vol 2016, Article ID 1542540, 11 p. http://dx.doi.org/10.1155/2016/1542540

Rebahi Y, Nassar M, Magedanz T, Festor O (2011) A survey on fraud and service misuse in voice over IP (VoIP) networks, Information Security Technical report 16, 12-19. doi:10.1016/j.istr.2010.10.012

Ahmed M, Mahmood AN, Hu J (2016) A survey of network anomaly detection techniques, Journal of Network and Computer Applications 60 (2016) 19–31. http://dx.doi.org/10.1016/j.jnca.2015.11.016

Raghavan P, Gayar NE (2019) Fraud Detection using Machine Learning and Deep Learning, IEEE, International Conference on Computational Intelligence and Knowledge Economy (ICCIKE) December 11-12, 2019, Amity University Dubai, UAE.

Alsamhi SH, Almalki FA, Al-Dois H et al (2021) Machine Learning for Smart Environments in B5G Networks: Connectivity and QoS, Computational Intelligence and Neuroscience, Vol 2021, Article ID 6805151, 23 p. https://doi.org/10.1155/2021/6805151

Chouiekha A and EL Hajb E (2018) ConvNets for Fraud Detection analysis, Procedia Computer Science 127, 133–138. Doi 10.1016/j.procs.2018.01.107.

Al‑Molhem NR, Rahal Y and Dakkak M (2021) Social network analysis in Telecom data, J. Big Data 6:99. https://doi.org/10.1186/s40537-019-0264-6

Muniyandi AP, Rajeswari R, Rajaramc R (2012) Network Anomaly Detection by Cascading K-Means Clustering and C4.5 Decision Tree algorithm, Procedia Engineering 30,174 – 182. doi:10.1016/j.proeng.2012.01.849

Hoffstadt D, Rathgeb E, Liebig M et al. (2014) A Comprehensive Framework for Detecting and Preventing VoIP Fraud and Misuse, IEEE, DOI: 10.1109/ICCNC.2014.6785441.

Zhao Q, Chen K, Li T, Yang Y, Wang XF (2018) Detecting telecommunication fraud by understanding the contents of a call, Cybersecurity (2018) 1:8. https://doi.org/10.1186/s42400-018-0008-5

Subudhi S and Panigrahi S (2018) A hybrid mobile call fraud detection model using optimized fuzzy C-means clustering and group method of data handling-based network, Vietnam Journal of Computer Science, 5:205–217. https://doi.org/10.1007/s40595-018-0116-x

Sultan K, Ali H, Zhang Z (2018) Call Detail Records Driven Anomaly Detection and Traffic Prediction in Mobile Cellular Networks arXiv:1807.11545v1

Gbadoubissa JEZ, Ari AAA and Gueroui AM (2020), EffiCient k-means based clustering scheme for mobile networks cell sites management, Journal of King Saud University-Computer and Information Sciences, 32(9) 1063-1070. Doi:10.1016/j.jksuci.2018.10.015

Trinh DH, Zeydan E, Giupponi L, Dini P(2019) Detecting Mobile Traffic Anomalies Through Physical Control Channel Fingerprinting: A Deep Semi-Supervised Approach, IEEE Access 7, 152187-152201. DOI: 10.1109/ACCESS.2019.2947742

Dalenius T (1977) Towards a methodology for statistical disclosure control, Statistik Tidskrift, vol. 15, 429-444.

Dwork C (2008) Differential privacy: A survey of results, in Proc. Int. Conf. Theory Appl. Models Comput: Springer. 1-19.

Kouanou AT, Tchiotsop D, Tchinda R, Tchapga TC, Kengnou TAN, Kengne R (2018) “A machine learning algorithm for biomedical image compression using orthogonal transforms”, MECS Press I.J Image, Graphics and Signal Processing, 10 (11), 38-53.

Kouanou AT, Tchiotsop D, Fozin Fonzin T, Bayangmbe Mounmo, Tchinda R (2019) “Real-Time Image Compression System Using an Embedded Board”, Science Journal of Circuits, Systems and Signal Processing, 7(4), 81-86. doi: 10.11648/j.cssp.20180704.11

Clayman CL, Srinivasana SM, Sangwana RS (2020) K-means Clustering and Principal Components Analysis of Microarray Data of L1000 Landmark Genes, Procedia Computer Science 168, 97–104. Doi: 10.1016/j.procs.2020.02.265

Downloads

Published

2022-08-23

How to Cite

DEUSSOM DJOMADJI, E. M. ., MATEMTSAP MBOU, B., TCHAGNA KOUANOU, A., EKONDE SONE, M. ., & BAYONBOG, P. (2022). Machine learning-based approach for designing and implementing a collaborative fraud detection model through CDR and traffic analysis. Transactions on Machine Learning and Artificial Intelligence, 10(4), 46–58. https://doi.org/10.14738/tmlai.104.12854