Hindi Word Sense Disambiguation using variants of Simplified Lesk measure
Keywords:
Hindi word sense disambiguation, Dictionary based Word Sense Disambiguation, Lesk-based Word Sense DisambiguationAbstract
This paper evaluates the performance of a Lesk-like algorithm for Hindi Word Sense Disambiguation (WSD). The algorithm uses the similarity between the sense definition and the context of ambiguous word for disambiguation. Three different scoring functions have been investigated for measuring the similarity: direct overlap, frequency of matching words and frequency of matching words excluding the target word. We evaluate the effects of context window size, stop word elimination and stemming on Hindi WSD task. We also investigate the effect of number of senses on Hindi WSD task. The evaluation has been carried out on a manually created sense inventory consisting of 60 polysemous Hindi nouns. The maximum overall precision of 54.54% was ob-served for the case when both stemming and stop word removal was performed and frequency based scoring excluding the target word was used. The best case results in a significant improvement of 10.4% in precision and 21.3% in recall over the baseline performance. In general, we obtained decrease in precision with increasing number of senses.
References
Banerjee, S. and Pederson, T.: An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet. In Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, pp. 136-145 (2002)
Banerjee, S. and Pederson, T.: Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, pp. 805-810 (2003)
Gale, W. A., Church, K. and Yarowsky, D.: A method for disambiguation word senses in a large corpus, In Journal of Computer and the Humanities, vol. 26, pp. 415-439 (1992)
Gaona, M. A. R., Gelbukh, A. and Bandyopadhyay, S.: Webbased variant of the Lesk approach to Word Sense Disambiguation. In Mexican International Conference on Artificial Intelligence, pp. 103-107 (2009)
Hindi Corpus http://www.cfilt.iitb.ac.in/Downloads.html
Hindi WordNet http://www.cfilt.iitb.ac.in/wordnet/webhwn/wn.php
Ide, N. and Veronis, J.: Word Sense Disambiguation: The State of the Art. Computational Linguistics, vol. 24, issue 1, pp. 1-40 (1998)
Khapra, M. M., Bhattacharyya, P., Chauhan, S., Nair, S. and Sharma A.: Domain Specific Iterative Word Sense Disambiguation in a Multilingual Setting. In Proceedings of International Conference on NLP (ICON 08), Pune India (2008)
Lee, Y. K., Ng, H. T. and Chia, T. K.: Supervised Word Sense Disambiguation with Support Vector Machines and Multiple Knowledge Sources. In SENSEVAL-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain, pp.137-140 (2004)
Lesk, M.: Automatic Sense Disambiguation using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone. In Proceedings of the 5th annual international conference on Systems documentation SIGDOC Toronto,
Ontario, pp.24-26 (1986)
Ng, H. T. and Lee, H. B.: Integrating multiple knowledge sources to disambiguate word sense: An exemplar based approach. In Proceedings of the 34th Annual meeting for the Association for Computational Linguistics, pp. 40-47 (1996)
Resnik, P.: Selectional preference and sense disambiguation. In Proceedings of the ACL SIGLEX workshop on tagging text with lexical semantics: Why, What and How?, pp. 52-57 (1997)
Sense Annotated Hindi Corpushttp://www.tdil-dc.in/index.php?option=com_download&task=showresourceDetails&toolid=1472&lang=en
Singh, S. and Siddiqui, T. J.: Sense Annotated Hindi Corpus. In the 20th International Conference on Asian Language Processing, Tainan, Taiwan, pp. 22-25 (2016)
Singh, S. and Siddiqui, T. J.: Utilizing Corpus Statistics for Hindi Word Sense Disambiguation. In International Arab Journal of Information Technology, volume 12, no. 6A, pp. 755 – 763 (2015)
Singh, S. and Siddiqui, T. J.: Evaluating Effect of Context Window Size, Stemming and Stop Word Removal on Hindi Word Sense Disambiguation. In Proceedings of the International Conference on Information Retrieval and Knowledge Management, Malaysia, pp. 1-5 (2012)
Singh, S. and Siddiqui, T. J.: Role of Semantic Relations in Hindi Word Sense Disambiguation. In Proceedings of International Conference on Information and Communication Technologies (ICICT 2014), Kochi, India, Procedia Computer Science, vol. 16, pp. 240-248 (2015)
Singh, S., Siddiqui, T. J. and Sharma, S. K.: Naïve Bayes classifier for Hindi Word Sense Disambiguation. In Proceedings of 7th ACM India Compute Conference, Article No. 1, pp. 1-9 (2014)
Singh, S., Singh, V. K. and Siddiqui, T. J.: Hindi Word Sense Disambiguation using Semantic Relatedness measure. In Proceedings of 7th Multi-Disciplinary workshop on Artificial Intelligence, Krabi, Thailand, pp. 247-256 (2013)
Sinha, M., Kumar, M., Pande, P., Kashyap, L. and Bhattacharyya, P.: Hindi Word Sense Disambiguation. In International Symposium on Machine Translation, Natural Language Processing and Translation Support Systems,
Delhi, India (2004)
Vasilescu, F., Langlasi, P. and Lapalme, G.: Evaluating Variants of the Lesk Approach for Disambiguating Words. In Proceedings of the Language Resources and Evaluation, pp. 633 - 636 (2004)
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual meeting of the Association for Computational Linguistics, pp. 189-196 (1995)