Hindi Word Sense Disambiguation using variants of Simplified Lesk measure


  • Satyendr Singh School of Engineering & Technology, BML Munjal University, Gurgaon, India
  • Goldie Gabrani School of Engineering & Technology, BML Munjal University, Gurgaon, India
  • Tanveer Siddiqui * Department of Electronics & Communication, University of Allahabad, Allahabad, India


Hindi word sense disambiguation, Dictionary based Word Sense Disambiguation, Lesk-based Word Sense Disambiguation


This paper evaluates the performance of a Lesk-like algorithm for Hindi Word Sense Disambiguation (WSD). The algorithm uses the similarity between the sense definition and the context of ambiguous word for disambiguation. Three different scoring functions have been investigated for measuring the similarity: direct overlap, frequency of matching words and frequency of matching words excluding the target word. We evaluate the effects of context window size, stop word elimination and stemming on Hindi WSD task. We also investigate the effect of number of senses on Hindi WSD task. The evaluation has been carried out on a manually created sense inventory consisting of 60 polysemous Hindi nouns. The maximum overall precision of 54.54% was ob-served for the case when both stemming and stop word removal was performed and frequency based scoring excluding the target word was used. The best case results in a significant improvement of 10.4% in precision and 21.3% in recall over the baseline performance. In general, we obtained decrease in precision with increasing number of senses.


