Statistical Machine Translation of Myanmar Dialects


  • Thazin Myint Oo
  • Thepchai Supnithi


Statistical Machine Translation, Parallel Corpus Developing, Myanmar (Burmese), Rakhine (Arakanese), Dawei (Tavoyan), Myeik (Beik)


The goal of this work is to contribute the first evaluation of the quality of machine translation between Standard Myanmar and Other Myanmar Dialectal Languages. Myanmar Dialects present many challenges for machine translation, which is the lack of data resources. To fulfill this requirement, we also developed three Myanmar Dialect corpora based on the Myanmar language of ASEAN MT corpus. They are Myanmar-Rakhine (18K), Myanmar-Myeik (10K) and Myanmar-Dawei (9K) parallel corpora. The 10 folds cross-validation experiments were carried out using three different statistical machine translation approaches: phrase-based, hierarchical phrase-based, and the operation sequence model. In addition, two types of segmentation; word and syllable units were studied. The results show that all three statistical machine translation approaches give higher and comparable BLEU and RIBES scores between Myanmar and three dialects (Rakhine, Dawei and Myeik) in both directions. The OSM approach achieved the highest BLEU and RIBES scores among three approaches for both word and syllable segmentations. Moreover, we found that syllable segmentation is appropriate for translation quality comparing with word level segmentation results.


