Hierarchy Website Fingerprint Using N-gram Byte Distribution
Keywords:Website fingerprinting, Traffic analysis, N-gram byte distribution.
According to www.internetlivestats.com, there are over 1 billion websites on the world wide web (WWW) today while in 1991, there were only one single website. Websites classification based on traffic analysis has become a difficult problem due to the large number of websites within the internet. All the proposed approaches in the literature could not classify more than 100 websites which is a very trivial number compared to the total number of websites over the internet. In this paper, a two-level websites’ classification technique is proposed. At the first level, the traffic is classified to a general category such as sports, news, social, healthy, education, etc. Then, for further information the packet could be classified within the same category to identify from which websites the packet came.
(1) M. Cohen, "PyFlag–An advanced network forensic framework," Digital investigation, vol. 5, pp. S112-S120, 2008.
(2) E. S. Pilli, R. C. Joshi, and R. Niyogi, "Network forensic frameworks: Survey and research challenges," digital investigation, vol. 7, pp. 14-27, 2010.
(3) K. Karampidis and G. Papadourakis, "File Type Identification for Digital Forensics," in International Conference on Advanced Information Systems Engineering, 2016, pp. 266-274.
(4) S. Feghhi and D. J. Leith, "A Web Traffic Analysis Attack Using Only Timing Information," IEEE Transactions on Information Forensics and Security, vol. 11, pp. 1747-1759, 2016.
(5) S. S. Kowsalya, "Website Fingerprinting using Traffic Analysis Attacks."
(6) X. Gong, N. Borisov, N. Kiyavash, and N. Schear, "Website detection using remote traffic analysis," in International Symposium on Privacy Enhancing Technologies Symposium, 2012, pp. 58-78.
(7) G. D. Bissias, M. Liberatore, D. Jensen, and B. N. Levine, "Privacy vulnerabilities in encrypted HTTP streams," in International Workshop on Privacy Enhancing Technologies, 2005, pp. 1-11.
(8) L. Lu, E.-C. Chang, and M. C. Chan, "Website fingerprinting and identification using ordered feature sequences," in European Symposium on Research in Computer Security, 2010, pp. 199-214.
(9) M. Liberatore and B. N. Levine, "Inferring the source of encrypted HTTP connections," in Proceedings of the 13th ACM conference on Computer and communications security, 2006, pp. 255-263.
(10) A. Panchenko, F. Lanze, A. Zinnen, M. Henze, J. Pennekamp, K. Wehrle, et al., "Website fingerprinting at internet scale," in Network & Distributed System Security Symposium (NDSS). IEEE Computer Society, 2016.
(11) W.-J. Li, K. Wang, S. J. Stolfo, and B. Herzog, "Fileprints: Identifying file types by n-gram analysis," in Information Assurance Workshop, 2005. IAW'05. Proceedings from the Sixth Annual IEEE SMC, 2005, pp. 64-71.