Word to Word translation between Myanmar (Burmese) and other Ethnic languages

Authors

  • May Myat Myat Khaing The Language Understanding Laboratory, Myanmar.
  • Ye Kyaw Thu National Electronics and Computer Technology Center, Pathum Thani, Thailand; Language Understanding Lab., Myanmar.
  • Thazin Myint Oo The Language Understanding Laboratory, Myanmar.
  • Thet Thet Zin University of Information Technology, Yangon, Myanmar.
  • Nang Aeindray Kyaw TAIST-Tokyo Tech, Japan.
  • Hay Man Htun Kasetsart University, Bangkok, Thailand.
  • Thida San University of Technology (Yatanarpon Cyber City), Pyin Oo Lwin and a faculty, Myanmar Institute of Information Technology, Myanmar.
  • Zun Hlaing Moe Myanmar Institute of Information Technology; University of Technology (Yatanarpon Cyber City), Pyin Oo Lwin, Myanmar.
  • Hnin Aye Thant Information Science at the University of Technol-ogy (Yatanarpon Cyber City), Pyin Oo Lwin Township, Mandalay Division, Myanmar.

Keywords:

Bilingual lexicons, word-to-word Translation, Myanmar (Burmese) and Ethnic languages

Abstract

In Myanmar, there are still many challenges for NLP Researchs and Developements. Bilingual lexicon between Myanmar (Burmese) and other Ethnic Languages ​​is one of many challenges. This paper persent word-to-word translation between Myanmar (Burmese) and Ethnic languages extracted from sentence-level parallel corpora. This is the first time that a bilingual or cross-linigual lexicon has been developed based on between Myanmar and other Ethnic languages. 

The languages pairs between Myanmar (Burmese) and Ethnic languages are constructed for the bilingual or cross-linigual lexicon based on 12 Myanmar (Burmese) and Ethnic Languages. To obtain this dataset, we use a count-based bilingual lexicon extraction model based on the observation that not only source and target words but also source words themselves can be highly correlated[1]. According to the Out-Of-Vocabulary (OOV) rate, the evaluation of word-to-word translation between Myanmar (Burmese) and Ethnic languages ​​is acceptable level for all language pairs. Then we examined the human evaluation for some ethinc language pairs in this paper. Some ethnic language pairs are a level of satisfaction among the ethnic languages where human evaluation was conducted.

Author Biographies

May Myat Myat Khaing, The Language Understanding Laboratory, Myanmar.

May Myat Myat Khaing is a Lab member of Language Understanding Lab., Myanmar.

She is interested in the research area of natural language processing (NLP) such as machine translation, speech processing and Data Science.

Ye Kyaw Thu, National Electronics and Computer Technology Center, Pathum Thani, Thailand; Language Understanding Lab., Myanmar.

Ye Kyaw Thu is a Visiting Professor of Language Semantic Technology Research Team (LST), Artificial Intelligence Research Unit (AINRU), National Electronic Computer Technology Center (NECTEC), Thailand and Affiliate Professor at Cambodia Academy of Digital Technology (CADT), Cambodia. He is also a founder of Language Understanding Lab., Myanmar. His research lie in the fields of artificial intelligence (AI), natural language processing (NLP) and human-computer interaction (HCI). He is actively co-supervising/supervising undergrad, masters’ and doctoral students of several universities including Assumption University (AU), Kasetsart University (KU), King Mongkut’s Institute of Technology Ladkrabang (KMITL) and Sirindhorn International Institute of Technology (SIIT).

Thazin Myint Oo, The Language Understanding Laboratory, Myanmar.

Thazin Myint Oo received the B.C.Sc. (Hons) and M.C.Sc degrees from the University of Computer Studies Yangon, in 2005 and 2008, respectively. She is now a Ph.D. candidate of Assumption University, Thailand and also a Lab member of Natural Language Understanding, Myanmar. Her research interests include machine translation and natural language processing.

Thet Thet Zin, University of Information Technology, Yangon, Myanmar.

Thet Thet Zin received the M.C.Sc and Ph.D(IT) from University of Computer Studies, Yangon

(UCSY) Myanmar in 2007 and 2012 respectively. Her current research interest is Natural

Language Processing, Machine Learning and Computer Vision.

Nang Aeindray Kyaw, TAIST-Tokyo Tech, Japan.

Nang Aeindray Kyaw is a master candidate at TAIST-Tokyo Tech, Artificial Intelligence and Internet of Things (AIoT), Thailand. She holds a degree of Bachelor's of Engineering (Information Science and Technology) from the University of Technology (Yatanarpon Cyber City), Pyin Oo Lwin, Myanmar. She is strongly interested in Natural Language Processing (NLP).

Hay Man Htun, Kasetsart University, Bangkok, Thailand.

Hay Man Htun is a candidate of the M.E in Artificial Intelligence and Internet of Things (AIoT) at Kasetsart University (KU), Bangkok, Thailand.

She holds the degree of Bachelor of Engineering (Information Science Technology) from the University of Technology (Yatanarpon Cyber City), Pyin Oo Lwin, Myanmar. She is also a member of the NLP Research Lab.,UTYCC.

Her current master thesis research focuses on Speech Translation in Natural Languages.

She is strongly interested in the areas of Natural Language Processing (NLP) such as machine translation, Speech Processing, Image Processing, Machine Learning, and Deep Learning.

Thida San, University of Technology (Yatanarpon Cyber City), Pyin Oo Lwin and a faculty, Myanmar Institute of Information Technology, Myanmar.

Thida San is a Ph.D candidate at University of Technology (Yatanarpon Cyber City), Pyin Oo Lwin and a faculty at Myanmar Institute of Information Technology (MIIT) Myanmar.

Her current doctoral thesis research focuses on Text to Speech between Myanmar Braille and Myanmar written text. She is interested in the research area of natural language processing (NLP), speech processing, big data analysis, and deep learning.

Zun Hlaing Moe, Myanmar Institute of Information Technology; University of Technology (Yatanarpon Cyber City), Pyin Oo Lwin, Myanmar.

Zun Hlaing Moe is a faculty at Myanmar Institute of Information Technology (MIIT) and also a Ph.D candidate at University of Technology (Yatanarpon Cyber City), Pyin Oo Lwin, Myanmar. Her current doctoral thesis research focuses on machine translation between Myanmar Braille and Myanmar written text and vice versa. She is interested in the area of natural language processing (NLP) such as machine translation, big data analysis and deep learning.

Hnin Aye Thant, Information Science at the University of Technol-ogy (Yatanarpon Cyber City), Pyin Oo Lwin Township, Mandalay Division, Myanmar.

Hnin Aye Thant is currently working as a Professor and Head of Department of Information Science at the University of Technol-ogy (Yatanarpon Cyber City), Pyin Oo Lwin Township, Mandalay Division, Myanmar. Shegot Ph.D (IT) Degree from University of Computer Studies, Yangon, Myanmar in 2005. The current responsibilities are managing professional teachers, doing instructional designer of e-learning content development and teaching. She has 14 years teaching experiences in Information Technology specialized in Programming Languages (C,C++, Java and Assembly), Data Structure, Design and Analysis of Algorithms/Parallel Algorithms, Database Management System, Web Application Development, Operating System, Data Mining and Natural Language Processing. She is a member of research group in “Neural Network Machine Translation between Myanmar Sign Language to Myanmar Written Text” and Myanmar NLP Lab in UTYCC. She is also a Master Instructor and Coaching Expert of USAID COMET Mekong Learning Center. So, she has trained 190 Instructors from ten Technological Universities, twelve Computer Universities and UTYCC for Professional Development course to transform teacher-centered approach to learner-centered approach. This model is to reduce the skills gap between Universities and Industries and to fulfill the students’ work-readiness skills.

References

Fung, P., “A statistical view on bilingual lexicon extraction: from parallel corpora to non-parallel corpora,” in Machine Translation and the Information Soup, 1998, pp. 1–17.

Gū, J., Shavarani, H. S., and Sarkar, A., “Pointer-based fusion of bilingual lexicons into neural machine translation,” arXiv preprint arXiv:1909.7907, 2019.

JAKUBINA, L. and LANGLAIS, P., “A Comparison of Methods for Identifying the Translation of Words in a Comparable Corpus: Recipes and Limits,” Comp. y Sist., vol. 20, no. 3, pp. 449–458, 2016.

Levy, O., Søgaard, A., and Goldberg, Y., “A strong baseline for learning cross-lingual word embeddings from sentence alignments,” in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, 2017, pp. 765–774.

Levy, O. and Goldberg, Y., “Neural word embedding as implicit matrix factorization,” in Advances in Neural Information Processing Systems, 2014, pp. 2177–2185.

Lison, P., Tiedemann, J., and Kouylekov, M., “OpenSubtitles2018: Statistical rescoring of sentence alignments in large, noisy parallel corpora,” in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J., “Distributed representations of words and phrases and their compositionality,” in Advances in Neural Information Processing Systems, 2013, pp. 3111–3119.

Nwet, K., Soe, K., and Thein, N., “Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models,” International Journal of Computer Applications, 2013.

Papineni, K., Roukos, S., Ward, T., and Zhu, W., “BLEU: a Method for Automatic Evaluation of Machine Translation,” IBM Research Report RC22176, 2001.

Prachya, B. and Thepchai, S., “Technical report for the network-based ASEAN language translation public service project,” NECTEC, 2013.

Ramesh, S. H. and Sankaranarayanan, K. P., “Neural machine translation for low resource languages using bilingual lexicon induced from comparable corpora,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, New Orleans, Louisiana, USA, June 2018, pp. 112–119.

Ruder, S., Vulić, I., and Søgaard, A., “A survey of cross-lingual embedding models,” arXiv preprint arXiv:1706.04902, 2017.

Somers, H., “Bilingual parallel corpora and language engineering,” in Proc. Anglo-Indian Workshop: Language Engineering for South-Asian Languages, 2001.

Ximena Gutierrez-Vasques and Victor Mijangos, “Low-resource bilingual lexicon extraction using graph based word embeddings,” arXiv:1706.04902 [cs.CL], 2017.

Yo Joong Choe, Kyubyong Park, and Dongwoo Kim, “word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs,” arXiv:1911.01549 [cs.CL], 2019.

Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou, “Word Translation Without Parallel Data,” ICLR 2018.

Hay Man Htun, Ye Kyaw Thu, Hlaing Myat Nwe, May Thu Win, and Naw Naw, “Statistical Machine Translation System Combinations on Phrase-based, Hierarchical Phrase-based and Operation Sequence Model for Burmese and Pa’O Language Pair,” Journal of Intelligent Informatics and Smart Technology, October 2021, pp. 1–9.

Nang Aeindray Kyaw, Ye Kyaw Thu, Hlaing Myat Nwe, Phyu Phyu Tar, Nandar Win Min, and Thepchai Supnithi, “A Study of Three Statistical Machine Translation Methods for Myanmar (Burmese) and Shan (Tai Long) Language Pair,” in Proceedings of the 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP 2020), Bangkok, Thailand, Nov 18–20, 2020, pp. 218–223.

Thazin Myint Oo, Ye Kyaw Thu, Khin Mar Soe, and Thepchai Supnithi, “Statistical Machine Translation between Myanmar and Myeik,” in Proceedings of the 12th International Conference on Future Computer and Communication (ICFCC 2020), Yangon, Myanmar, Feb 26–28, 2020, pp. 36–45.

Hnin Yi Aye, Yuzana Win, and Ye Kyaw Thu, “Statistical Machine Translation between Myanmar (Burmese) and Chin (Mizo) Language,” in Proceedings of The 23rd Oriental COCOSDA 2020, Yangon, Myanmar, Nov 5–7, 2020, pp. 211–216.

Zar Zar Linn, Ye Kyaw Thu, and Pushpa B. Patil, “Statistical Machine Translation between Myanmar (Burmese) and Kayah Languages,” Journal of Intelligent Informatics and Smart Technology, April 2020, pp. 62–68.

Thazin Myint Oo, Ye Kyaw Thu, and Khin Mar Soe, “Statistical Machine Translation between Myanmar (Burmese) and Rakhine (Arakanese),” in Proceedings of ICCA2018, Yangon, Myanmar, Feb 22–23, 2018, pp. 304–311.

Wikipedia contributors, “Arakanese language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Arakanese_language

Wikipedia contributors, “Burmese language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Burmese_language

Wikipedia contributors, “Hakha Chin language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Hakha_Chin_language

Wikipedia contributors, “Jingpho language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Jingpho_language

Wikipedia contributors, “Languages of Myanmar,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Languages_of_Myanmar

Wikipedia contributors, “Mon language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Mon_language

Wikipedia contributors, “N-gram,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/N-gram#Examples

Wikipedia contributors, “Pa’O language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Pa’O_language

Wikipedia contributors, “Rawang language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Rawang_language

Wikipedia contributors, “Red Karen language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Red_Karen_language

Wikipedia contributors, “S’gaw Karen language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/S’gaw_Karen_language

Wikipedia contributors, “Shan language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Shan_language

Wikipedia contributors, “Western Pwo language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Western_Pwo_language

Downloads

Published

2026-01-25

How to Cite

1.
Khaing MMM, Thu YK, Oo TM, Zin TT, Kyaw NA, Htun HM, San T, Moe ZH, Thant HA. Word to Word translation between Myanmar (Burmese) and other Ethnic languages. j.intell.inform. [internet]. 2026 Jan. 25 [cited 2026 Feb. 12];8(Oct). available from: https://ph05.tci-thaijo.org/index.php/JIIST/article/view/260