Word to Word translation between Myanmar (Burmese) and other Ethnic languages
Keywords:
Bilingual lexicons, word-to-word Translation, Myanmar (Burmese) and Ethnic languagesAbstract
In Myanmar, there are still many challenges for NLP Researchs and Developements. Bilingual lexicon between Myanmar (Burmese) and other Ethnic Languages is one of many challenges. This paper persent word-to-word translation between Myanmar (Burmese) and Ethnic languages extracted from sentence-level parallel corpora. This is the first time that a bilingual or cross-linigual lexicon has been developed based on between Myanmar and other Ethnic languages.
The languages pairs between Myanmar (Burmese) and Ethnic languages are constructed for the bilingual or cross-linigual lexicon based on 12 Myanmar (Burmese) and Ethnic Languages. To obtain this dataset, we use a count-based bilingual lexicon extraction model based on the observation that not only source and target words but also source words themselves can be highly correlated[1]. According to the Out-Of-Vocabulary (OOV) rate, the evaluation of word-to-word translation between Myanmar (Burmese) and Ethnic languages is acceptable level for all language pairs. Then we examined the human evaluation for some ethinc language pairs in this paper. Some ethnic language pairs are a level of satisfaction among the ethnic languages where human evaluation was conducted.
References
Fung, P., “A statistical view on bilingual lexicon extraction: from parallel corpora to non-parallel corpora,” in Machine Translation and the Information Soup, 1998, pp. 1–17.
Gū, J., Shavarani, H. S., and Sarkar, A., “Pointer-based fusion of bilingual lexicons into neural machine translation,” arXiv preprint arXiv:1909.7907, 2019.
JAKUBINA, L. and LANGLAIS, P., “A Comparison of Methods for Identifying the Translation of Words in a Comparable Corpus: Recipes and Limits,” Comp. y Sist., vol. 20, no. 3, pp. 449–458, 2016.
Levy, O., Søgaard, A., and Goldberg, Y., “A strong baseline for learning cross-lingual word embeddings from sentence alignments,” in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, 2017, pp. 765–774.
Levy, O. and Goldberg, Y., “Neural word embedding as implicit matrix factorization,” in Advances in Neural Information Processing Systems, 2014, pp. 2177–2185.
Lison, P., Tiedemann, J., and Kouylekov, M., “OpenSubtitles2018: Statistical rescoring of sentence alignments in large, noisy parallel corpora,” in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J., “Distributed representations of words and phrases and their compositionality,” in Advances in Neural Information Processing Systems, 2013, pp. 3111–3119.
Nwet, K., Soe, K., and Thein, N., “Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models,” International Journal of Computer Applications, 2013.
Papineni, K., Roukos, S., Ward, T., and Zhu, W., “BLEU: a Method for Automatic Evaluation of Machine Translation,” IBM Research Report RC22176, 2001.
Prachya, B. and Thepchai, S., “Technical report for the network-based ASEAN language translation public service project,” NECTEC, 2013.
Ramesh, S. H. and Sankaranarayanan, K. P., “Neural machine translation for low resource languages using bilingual lexicon induced from comparable corpora,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, New Orleans, Louisiana, USA, June 2018, pp. 112–119.
Ruder, S., Vulić, I., and Søgaard, A., “A survey of cross-lingual embedding models,” arXiv preprint arXiv:1706.04902, 2017.
Somers, H., “Bilingual parallel corpora and language engineering,” in Proc. Anglo-Indian Workshop: Language Engineering for South-Asian Languages, 2001.
Ximena Gutierrez-Vasques and Victor Mijangos, “Low-resource bilingual lexicon extraction using graph based word embeddings,” arXiv:1706.04902 [cs.CL], 2017.
Yo Joong Choe, Kyubyong Park, and Dongwoo Kim, “word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs,” arXiv:1911.01549 [cs.CL], 2019.
Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou, “Word Translation Without Parallel Data,” ICLR 2018.
Hay Man Htun, Ye Kyaw Thu, Hlaing Myat Nwe, May Thu Win, and Naw Naw, “Statistical Machine Translation System Combinations on Phrase-based, Hierarchical Phrase-based and Operation Sequence Model for Burmese and Pa’O Language Pair,” Journal of Intelligent Informatics and Smart Technology, October 2021, pp. 1–9.
Nang Aeindray Kyaw, Ye Kyaw Thu, Hlaing Myat Nwe, Phyu Phyu Tar, Nandar Win Min, and Thepchai Supnithi, “A Study of Three Statistical Machine Translation Methods for Myanmar (Burmese) and Shan (Tai Long) Language Pair,” in Proceedings of the 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP 2020), Bangkok, Thailand, Nov 18–20, 2020, pp. 218–223.
Thazin Myint Oo, Ye Kyaw Thu, Khin Mar Soe, and Thepchai Supnithi, “Statistical Machine Translation between Myanmar and Myeik,” in Proceedings of the 12th International Conference on Future Computer and Communication (ICFCC 2020), Yangon, Myanmar, Feb 26–28, 2020, pp. 36–45.
Hnin Yi Aye, Yuzana Win, and Ye Kyaw Thu, “Statistical Machine Translation between Myanmar (Burmese) and Chin (Mizo) Language,” in Proceedings of The 23rd Oriental COCOSDA 2020, Yangon, Myanmar, Nov 5–7, 2020, pp. 211–216.
Zar Zar Linn, Ye Kyaw Thu, and Pushpa B. Patil, “Statistical Machine Translation between Myanmar (Burmese) and Kayah Languages,” Journal of Intelligent Informatics and Smart Technology, April 2020, pp. 62–68.
Thazin Myint Oo, Ye Kyaw Thu, and Khin Mar Soe, “Statistical Machine Translation between Myanmar (Burmese) and Rakhine (Arakanese),” in Proceedings of ICCA2018, Yangon, Myanmar, Feb 22–23, 2018, pp. 304–311.
Wikipedia contributors, “Arakanese language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Arakanese_language
Wikipedia contributors, “Burmese language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Burmese_language
Wikipedia contributors, “Hakha Chin language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Hakha_Chin_language
Wikipedia contributors, “Jingpho language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Jingpho_language
Wikipedia contributors, “Languages of Myanmar,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Languages_of_Myanmar
Wikipedia contributors, “Mon language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Mon_language
Wikipedia contributors, “N-gram,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/N-gram#Examples
Wikipedia contributors, “Pa’O language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Pa’O_language
Wikipedia contributors, “Rawang language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Rawang_language
Wikipedia contributors, “Red Karen language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Red_Karen_language
Wikipedia contributors, “S’gaw Karen language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/S’gaw_Karen_language
Wikipedia contributors, “Shan language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Shan_language
Wikipedia contributors, “Western Pwo language,” Wikipedia, The Free Encyclopedia. [Online]. Available: https://en.wikipedia.org/wiki/Western_Pwo_language
