Kannada Named Entity Recognition and Classification using Support Vector Machine

  • Amarappa S Jawaharlal Nehru National College of Engineering
  • S V Sathyanarayana Department of E & C, Jawaharlal Nehru National College of Engineering, Shimoga - 577 204, India;
Keywords: Natural Language Processing, Hyperplane, Support vectors, Named Entity Recognition, Classification, Support vector machine, Training Corpus, Test Corpus.


Named Entity Recognition and Classification (NERC) is a process of identification of proper nouns in the text and classification of those nouns into certain predefined categories like person name, location, organization, date, time etc. Kannada NERC is an essential and challenging work which aims at developing a novel model based on Support Vector Machine. In this paper, tf-idf and POS features are used, which are extracted from a training corpus created manually. Furthermore, the model is trained and tested with different kernels: polynomial, rbf, sigmoid and linear kernels. The details of implementation and performance evaluation are discussed. The experiments are conducted on a training corpus of size 1, 51,440 tokens and test corpus of 7,000, 11,000, 15,000, 20,000, 30,000, 40,000 and 50,000 tokens. It is observed that the model works with an average precision, recall and F1-measure of 87%, 88% and 87.5% respectively for a linear kernel SVM on the test corpus of 7,000 tokens.

Author Biography

Amarappa S, Jawaharlal Nehru National College of Engineering

Department of Telecommunication Engineering

Associate Professor and HOD


