Difficulty-Level Classification for English Writings
Keywords:Accuracy, Difficulty-level, F-measure, Machine learning
AbstractThe popularity of e-books has grown recently. As the number of e-books continues to increase, the task of categorizing all books manually requires a significant amount of time. If English sentences can be categorized according to their level of difficulty, it becomes possible to recommend a foreign-language book compatible with the reader’s level of competency in English. This study extracted eleven types of attribute from English text data, with the aim of classifying English text according to level of difficulty by learning and categorization. Using the method of “leave-one-out cross-validation,” text was subjected to machine learning and categorization. In order to improve accuracy, furthermore, an experiment was carried out in which the size of text data was varied, and the attribute selection method was implemented. As a result, accuracy was improved to 77.04%, and F-measure to 63.96%.
(1) ITmedia eBook USER | What is the total number of titles of e-books and e-magazines distributed within Japan? http://ebook.itmedia.co.jp/ebook/articles/1412/19/news033.html
(3) Hiromi Ban and Takashi Oyabu, Text Mining of English Textbooks in Finland, “Proceedings of the Asia Pacific Industrial Engineering & Management Systems Conference 2012”, V. Kachitvichyanukul, H.T. Luong and R. Pitakaso eds., pp.1674-1679.
(4) Wow! 3 (2002, WSOY) Wow! 4 (2003, WSOY) Wow! 5 (2005, WSOY) Wow! 6 (2006, WSOY), http://www.kknews.co.jp/developer/finland/
(5) Weka: Data Mining Software in Java, http://www.cs.waikato.ac.nz/ml/weka/