A Novel Approach to Compute Confusion Matrix for Classification of n-Class Attributes with Feature Selection
Keywords:Confusion Matrix, Classifiers, Feature Selection, Weighted Average Confusion Matrix, Classification Accuracy, Weighted average accuracy
AbstractConfusion matrix is a useful tool to measure the performance of classifiers in their ability to classify multi-classed objects. Computation of classification accuracy for 2-classed attributes using confusion matrix is rather straightforward whereas it is quite cumbersome in case of multi-class attributes. In this work, we propose a novel approach to transform an n × n confusion matrix for n-class attributes to its equivalent 2 × 2 weighted average confusion matrix (WACM). The suitability of WACM has been shown for a classification problem using a web service data set. We have computed the accuracy of four classifiers, namely, Naïve Bayes(NB), Genetic Programming(GP), Instance Based Lazy Learner(IB1), and Decision Tree(J48) with and without feature selection. Next, WACM has been employed on the confusion matrix obtained after feature selection which further improves the classification accuracy.
. Han, J., and Kamber, M., 2006, Book on” Data Mining: Concepts and Techniques”, 2nd ed., Morgan Kaufmann Publishers, March 2006, ISBN 978-1-55860-901-3.
. Patro, V. M., and Patra, M. R., 2014, “Augmenting Weighted Average with Confusion Matrix to Enhance Classification Accuracy”, Transactions on Machine Learning and Artificial Intelligence, Volume 2 No 4, Aug (2014); pp: 77-91
. http://en.wikipedia.org/wiki/Confusion_matrix last accessed on 10/11/14.
. Jensen, F.V., 1993, “Introduction to Bayesian Networks”. Denmark: Hugin Expert A/S, 1993.
. Wang, Z., and Webb, G. I., 2002, “Comparison of lazy bayesian rule and tree-augmented bayesian learning”, IEEE, 2002, pp. 490 – 497.
. Shi, Z., Huang, Y., and Zhang, S., 2005, “Fisher score based naive Bayesian classifier”, IEEE, 2005, pp. 1616-1621.
. Xie, Z., and Zhang, Q., 2004, “A study of selective neighborhood-based naïve bayes for efficient lazy learning”, 16th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2004.
. Santafe, G., Loranzo, J.A., and Larranaga, P., 2006, “Bayesian model averaging of naive bayes for clustering”, IEEE, 2006, Page(s) 1149 – 1161.
. Koza, J.R., 1992, “Genetic Programming: On the Programming of Computers by Means of Natural Selection”, MIT Press.
. Aha, D.W., Kibler, D., and Albert, M. K., 1991, “Instance-based learning algorithms”, Machine Learning journal, Vol. 6, No 1, Page(s):37-66.
. Witten, I. H., and Frank, E., 2005, Data Mining: Practical machine learning tools and techniques, 2nd Edition, Morgan Kaufmann, San Francisco, 2005.
. Quinlan, J.R., 1993, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Francisco, CA.
. http://en.wikipedia.org/wiki/C4.5_algorithm last accessed on 11/11/14.
. http://www.uoguelph.ca/~qmahmoud/qws/dataset/ last accessed on 04/09/14.
. Al-Masri, E., and Mahmoud, Q. H., 2007, "Discovering the best web service", (poster) 16th International Conference on World Wide Web (WWW), 2007, pp. 1257-1258.
. Al-Masri, E., and Mahmoud, Q. H., 2007, "QoS-based Discovery and Ranking of Web Services", IEEE 16th International Conference on Computer Communications and Networks (ICCCN), 2007, pp. 529-534.
. Al-Masri, E., and Mahmoud, Q.H., 2008, "Investigating Web Services on the World Wide Web", 17thInternational Conference on World Wide Web(WWW), Beijing, April 2008, pp. 795-804. (for QWS WSDLs Dataset Version 1.0)
. www.cs.waikato.ac.nz/ml/weka/ last accessed on 14/11/14.