Statistical Machine Translation between Myanmar (Burmese) and Kayah
Keywords:
Statistical Machine Translation, Phrase-based, Hierarchical phrase-based, Operation Sequence Model, Myanmar-Kayah, under-resourcedAbstract
This paper contributes the first evaluation of the quality of Statistical Machine Translation (SMT) between Myanmar (Burmese) and Kayah (Kayah Li) languages. We also developed a Myanmar- Kayah parallel corpus (6,590 sentences) based on the Myanmar language of ASEAN MT corpus. The experiments were carried out using three different statistical machine translation approaches: Phrase-based Statistical Machine Translation (PBSMT), Hierarchical Phrase-based Statistical Machine Translation (HPBSMT), and the Operation Sequence Model (OSM). The results show that HPBSMT approach achieves the highest BLEU score for Myanmar to Kayah translation and Operation Sequence Model approach achieves the highest BLEU score for Kayah to Myanmar translation.
References
Wikipedia of Red Karen Language: https://en.wikipedia.org/wiki/Red_Karen_language
P. Koehn, F. J. Och, and D. Marcu, “Statistical phrase-based translation”, in Proc. of HTLNAACL, 2003, pp. 48–54.
P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, andE. Herbst, “Moses: Open source toolkit for statistical machine translation”, in Proc. of ACL, 2007, pp. 177–180.
P. Koehn, “Europarl: A parallel corpus for statistical machine translation”, in Proc. of MT summit, 2005, pp. 79–86.
Ye Kyaw Thu, Andrew Finch, Win Pa Pa, and Eiichiro Sumita, “A Large-scale Study of Statistical Machine Translation Methods for Myanmar Language”, in Proc. of SNLP2016, February 10-12, 2016.
Chiang, D., “Hierarchical phrase-based translation”, Computa- tional Linguistics 33(2), 2007, pp. 201-228.
Papineni, K., Roukos, S., Ward, T., Zhu, W., “BLEU: a Method for Automatic Evaluation of Machine Translation”, IBM Re- search Report rc22176 (w0109022), Thomas J. Watson Research Center, 2001
Isozaki, H., Hirao, T., Duh, K., Sudoh, K., Tsukada, H, “Au- tomatic evaluation of translation quality for distant language pairs”, in Proc. of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 944-952.
Win Pa Pa, Ye Kyaw Thu, Andrew Finch and Eiichiro Sumita, “A Study of Statistical Machine Translation Methods for Under Resourced Languages”, 5th Workshop on Spoken Language Tech- nologies for Under-resourced Languages (SLTU Workshop), 09-12 May, 2016, Yogyakarta, Indonesia, Procedia Computer Science, Volume 81, 2016, pp. 250–257.
Ye Kyaw Thu, Vichet Chea, Andrew Finch, Masao Utiyama and Eiichiro Sumita, “A Large-scale Study of Statistical Machine Translation Methods for Khmer Language”, 29th Pacific Asia Conference on Language, Information and Computation, October 30-November 1, 2015, Shanghai, China, pp. 259-269.