Myanmar (Burmese) String Similarity Measures based on Phoneme Similarity
Keywords:
Myanmar character, Burmese, String similarity metrics, Phonetic Similarity, Grapheme-to-Phoneme (G2P), Ripple Down Rules-Based (RDR)Abstract
String similarity measurement is useful for a wide range of applications. It performs an important role in machine learning, information retrieval, natural language processing, error encoding, and bioinformatics. Measuring string similarity is also a basic and fundamental operation of data science, important for data cleaning and integration. Applications such as spell checking, duplicate finding, searching similar words, and retrieving tasks use string similarity. Moreover, Grapheme-to-Phoneme (G2P) conversion is the necessary task of predicting the pronunciation of a word given its graphemic or written form. In this study, string similarity metrics have been calculated for Burmese (Myanmar language) based on phoneme similarity and phonetic similarity. Similarity distance is measured between the datasets and query words, both of which are converted with G2P model and with the phonetic encoding mapping tables. As previous string similarity approaches are not suitable for fuzzy string matching of tonal-based Burmese, measuring string similarity based on phoneme similarity and phonetic mapping approaches are proposed in this study.
References
Ohnmar Htun, Shigeki Kodama, Yoshiki Mikami, “Measuring Phonetic Similarities in Myanmar IDNs”, 2010.
Ohnmar Htun, Shigeaki Kodama, Yoshiki Mikami, Cross- language Phonetic Similarity Measure on Terms Appeared in Asian Languages, International Journal of Intelligent Informa- tion Processing Volume 2, Number 2, June 2011
Anil Kumar Singh, “A Computational Phonetic Model for In- dian Language Scripts”, Proceedings of Constraints on Spelling Changes: Fifth International Workshop on Writing Systems, 2006
Burmese Language Wikipedia Page: https: //en. wikipedia.org /wiki/Burmese_language
Ye Kyaw Thu, et al. Syllable Pronunciation Features for Myan- mar Grapheme to Phoneme Conversion.
Ye Kyaw Thu, et al. “The Application of Phrase Based Statisti- cal Machine Translation Techniques to Myanmar Grapheme to Phoneme Conversion”. Communications in Computer and In- formation Science, vol. 593, 2016, pp. 238–50, doi:10.1007/978- 981-10-0515-2_17.
Ye Kyaw, et al. “Comparison of Grapheme-to-Phoneme Con- version Methods on a Myanmar Pronunciation Dictionary”. Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing, pages 11-22, Osaka, Japan, December 11-17 2016.
Levenshtein, V. I., “Binary Codes Capable of Correcting Dele- tions, Insertions and Reversals”, Soviet Physics Doklady, Vol. 10, p.707, 02/1966
Damerau, Fred J., “A technique for computer detection and correction of spelling errors”, Communications of the ACM, 7 (3): 171–176, March, 1964
Hamming, R. W, “Error detecting and error correcting codes”. The Bell System Technical Journal. 29 (2): 147–160, April 1950
Matthew A. Jaro, Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida, Journal of the American Statistical Association, 84(406):414-420, June 1989.
Singhal, Amit, “Modern Information Retrieval: A Brief Overview”, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 24 (4): 35–43., 2001
Jaccard, P., “Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines”, Bulletin de la Société
Vaudoise des Sciences Naturelles 37, 241-272, 1901
Odell, Margaret King , “The profit in records management Systems”, New York, 20: 20, 1956
Compton, P., & R. Jansen 1990. A philosophical basis for knowledge acquisition. Knowledge Acquisition, 2(3):241 – 258.
Nguyen, Dat Quoc, Dai Quoc Nguyen, Dang Duc Pham, & Son Bao Pham 2016. A robust transformation-based learning approach using ripple down rules for part-of-speech tagging. AI Communications, 29(3):409–422.
Richards, Debbie 2009. Two decades of Ripple Down Rules research. Knowledge Eng. Review, 24(2):159–184.
Scheffer, Tobias 1996. Algebraic Foundation and Improved Methods of Induction of Ripple Down Rules. Pages 23–25.
Ye Kyaw Thu and Yoshiyori Urano, “Positional Mapping: Key- board Mapping Based on Characters Writing Positions for Mo- bile Devices”, Proceedings of the 9th International Conference on Multimodal Interfaces, ICMI 07, 110-117, 2007
Thein Tun, “Acoustic phonetics and the phonology of the myan- mar language”, School of Human Communication Sciences, La Trobe University, Melbourne, Australia, 2007.
Thein Tun, “The domain of tones in burmese”, SST 1990 Proceedings, pp. 406–411, 1990.