Myanmar Spelling Error Classification: An Empirical Study of Tsetlin Machine Techniques
Keywords:
Spelling error type classification, Spell checker, Tsetlin Machine, fastText, Natural Language Processing, Myanmar LanguageAbstract
Accurate spelling and grammar checking is fundamental to the development of language tools for Myanmar language. Classifying spelling error types is crucial in spell checkers and other language processing tools because it enables more accurate and context-aware error corrections. This process categorizes spelling errors in written text into distinct types or categories. To address the lack of such resources for Myanmar language, we have developed a spelling corpus containing misspelled words alongside their corrected forms in a parallel structure, paired with a corpus categorizing types of spelling errors. This paper focuses on an observational study of the Tsetlin Machine for Myanmar spelling error type classification, involving comprehensive parameter tuning and a performance comparison with fastText, a state-of-the-art natural language processing model. Our studies indicate that while the Tsetlin Machine achieves comparable results to fastText specifically in the domain of phonetic error classification, it demonstrates lower efficacy in other error classes.
References
Ei Phyu Phyu Mon, Ye Kyaw Thu, Thida San, Zun Hlaing Moe, Hnin Aye Thant, Automatic Rule Extraction for Detecting and Correcting Burmese Spelling Errors, The 4th ONA Conference, 17-18 December, Ministry of Posts and Telecommunications, Phnom Penh, Cambodia, 2021.
Ei Phyu Phyu Mon, Ye Kyaw Thu, Than Than Yu and Aye Wai Oo, SymSpell4Burmese: Symmetric Delete Spelling Correction Algorithm (SymSpell) for Burmese Spelling Checking, 2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), Ayutthaya, Thailand, 2021, pp. 1-6, doi: 10.1109/iSAI-NLP54397.2021.9678171.
Ole-Christoffer Granmo, The Tsetlin Machine - A Game Theoretic Bandit Driven Approach to Optimal Pattern Recognition with Propositional Logic, August, 2018.
Armand Joulin, Edouard Grave, Piotr Bojanowski and Tomas Mikolov, Bag of Tricks for Efficient Text Classificatio, August, 2016.
Aye Myat Mon and Thandar Thein , Myanmar Spell Checker, International Journal of Science and Research (IJSR), India, Online ISSN: 2319-7064, Volume 2 Issue 1, January, 2013.
Symmetric Delete spelling correction algorithm (SymSpell), https://github.com/wolfgarbe/SymSpell
x Faster Spelling Correction algorithm, https://seekstorm.com/blog/1000x-spelling-correction
John Carroll and Darrell Long, Theory of Finite Automata With an Introduction to Formal Languages, Prentice Hall, January, 1989.
Piotr Bojanowski, Edouard Grave, Armand Joulin and Tomas Mikolo, Enriching Word Vectors with Subword Information, Jun, 2016.
Joshua Goodman, Classes for fast maximum entropy training, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), Salt Lake City, UT, USA, 2001, pp. 561-564 vol.1, doi: 10.1109/ICASSP.2001.940893.
မြန်မာစာလုံးပေါင်းသတ်ပုံကျမ်း, Department of Myanmar Language Commission, Ministry of Education, Union of Myanmar, September 2019.
Ye Kyaw Thu, Hlaing Myat Nwe, Hnin Aye Thant, Hay Man Htun, Htay Mon, May Myat Myat Khine, Mi Hsu Pan Oo, Mi Pale Phyu, Nang Aeindray Kyaw, Thazin Myint Oo, Thazin Oo, Thet Thet Zin and Thida Oo, "sylbreak4all: Regular Expression based Syllable Breaking Tool for Nine Major Ethnic Languages of Myanmar", In Proceedings of the 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP 2021), virtual conference, Dec 21 to Dec 23, 2021, Thailand, pp. 1-6.
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot and Édouard Duchesna, Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.
Xuanyu Zhang, Hao Zhou, Ke Yu, Xiaofei Wu, and Anis Yazidi, Tsetlin Machine for Sentiment Analysis and Spam Review Detection in Chinese, Algorithms, vol. 16, no. 2: 93, 2023, https://doi.org/10.3390/a16020093.
Python implementation of Tsetlin Machine based classifier, https://github.com/cair/pyTsetlinMachine