Thai Text-to-Speech Synthesis: A Review

Authors

  • Chai Wutiwiwatchai National Electronics and Computer Technology Center, National Science and Technology Development Agency, Thailand
  • Chatchawarn Hansakunbuntheung National Electronics and Computer Technology Center, National Science and Technology Development Agency, Thailand
  • Anocha Rugchatjaroen National Electronics and Computer Technology Center, National Science and Technology Development Agency, Thailand
  • Sittipong Saychum National Electronics and Computer Technology Center, National Science and Technology Development Agency, Thailand
  • Sawit Kasuriya National Electronics and Computer Technology Center, National Science and Technology Development Agency, Thailand
  • Patcharika Chootrakool National Electronics and Computer Technology Center, National Science and Technology Development Agency, Thailand

Keywords:

Thai text-to-speech synthesis, Thai text processing

Abstract

Thai text-to-speech synthesis (TTS) has been researched and developed for decades. Several systems have been launched commercially or publicly while the global TTS technology is still going on with novel algorithms such as deep neural network (DNN). As the Thai language has its special characteristics that make difficulties in computer processing, research work have been focused much on Thai text analysis for TTS pre -processing. Automatic word segmentation, phrase boundary detection, as well as F0 prediction in the Thai tonal language are some of such issues challenging the development of Thai TTS. This article conclusively provides a review of research related to Thai TTS focusing on the last decade (2007-2016). Although there have been consecutive research work on this area, there are still unsolved and challenging problems needed further research. Discussion on the existing issues requiring extensive future work is finally given.

References

H. Dudley, “System for the artificial production of vocal or other sounds”, US application 2121142, Bell Telephone Laboratories , June 1938.

K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, “Speech parameter generation algorithms for HMM-based speech synthesis”, Proc. of ICASSP, pp.1315-1318, 2000.

H. Zen, A. Senior, M. Schuster, “Statistical parametric speech synthesis using deep neural networks”, Proc. of ICASSP, pp. 7962–7966, 2013.

K. Sreenivasa Rao, “Predicting prosody from text for textto-speech synthesis”, Springer-Verlag New York, 2012.

C. Traber, K. Huber, K. Nedir, B. Pfister, E. Keller, B. Zellner, “From multilingual to polyglot speech synthesis”, Proc. of Eurospeech, pp. 835-838, 1999.

C. Wutiwiwatchai, S. Furui, “Thai speech processing technology: a review”, Speech Communication 49 (1), 8-27, 2007.

M. R. Haas, “The Thai system of writing”, Program in Oriental Languages. Spoken Language Services, Inc., New York, 1980.

J. Higbie, S. Thinsan, “Thai reference grammar: the structure of spoken Thai”, Orchid Press, Bangkok, 2002.

K. Tingsabadh, A. S. Abramson, “Illustrations of the IPA: Thai”, Handbook of the International Phonetic Association. Cambridge University Press, Cambridge, 1999.

S. Luksaneeyanawin, “Three-dimensional phonology: a historical implication”, Proc. of International Symposium on Language and Linguistics, pp. 75–90, 1992.

T. Charoenporn, V. Sornlertlamvanich, H. Isahara, H., “Building a large Thai text corpus – part-of-speech tagged corpus: ORCHID”, Proc. Of NLPRS, 1997.

K. Kosawat, M. Boriboon, P. Chootrakool, A. Chotimongkol, S. Klaithin, S. Kongyoung, K. Kriengket, S. Phaholphinyo, S. Purodakananda, T. Thanakulwarapas, C. Wutiwiwatchai, “BEST 2009: Thai word segmentation software contest”, Proc. of SNLP, pp. 83–88, 2009

P. Chootrakool, C. Wutiwiwatchai and K. Kosawat, “A large pronunciation dictionary for Thai speech processing”, Proc. of ASIALEX, Bangkok, 2009.

C. Wutiwiwatchai, S. Saychum, A. Rugchatjaroen, “An intensive design of a Thai speech synthesis corpus”, Proc. of SNLP, 2007.

C. Hansakunbuntheung, S. Thatphithakkul, “Context-dependent grapheme-to-phoneme evaluation corpus using flexible contexts and categorial matrix”, Proc. of Orienta l COCOSDA, 2015.

C. Kruengkrai, K. Uchimoto, J. Kazama, K. Torisawa, H. Isahara, and C. Jaruskulchai, “A word and character-cluster hybrid model for Thai word segmentation”, Proc. of InterBEST 2009.

C. Haruechaiyasak and S. Kongyoung, “TLex: Thai lexeme analyser based on the Conditional Random Fields”, Proc. of InterBEST 2009.

S. Meknavin, P. Charoenpornsawat, and B. Kijsirikul, “Feature-based Thai word segmentation”, Proc. of NLPRS, Phuket, Thailand, 1997.

P. Boonkwan, V. Sutantayawalee, and T. Supnithi, “Language as Tensors: bidirectional deep learning of context representations for joint word segmentation and part-of-speech tagging”, Proc. of DESGT, 2017.

G. Slayden, M. Y. Hwang, and L. Schwartz, “Thai sentence breaking for large-scale SMT”, Proc. of WSSANLP, pp. 8-16, 2010.

P. Charoenpornsawat, V. Sornlertlamvanich, “Automatic sentence break disambiguation for Thai”, Proc. of ICCPOL, 2001.

N. Tangsirirat, A. Suchato, P. Punyabukkana, C. Wutiwiwatchai, “Contextual behavior features and grammar rules for Thai sentence-breaking”, Proc. of ECTI-CON, 2013.

W. Aroonmanakun, “A unified model of Thai word segmentation and Romanization”, Proc. of PACLIC, pp. 205–214, 2005.

M. Jongtaveesataporn, I. Thienlikit, C. Wutiwiwatchai, S.i Furui, “Lexical units for Thai LVCSR”, Speech Communication 51 (4), 379-389, 2009.

S. Kongyoung, A. Rugchatjaroen, “Thai Pseudo Syllable Segmentation using Conditional Random Fields”, Proc. of Oriental COCOSDA, Shanghai, China, 2015.

C. Jucksriporn, O. Sornil, “A minimum cluster-based trigram statistical model for Thai syllabification”, CICLing, pp. 493-505, 2011.

T. Theeramunkong, V. Sornlertlamvanich, “Character cluster based Thai information retrieval”, Proc. of IRAL, Hong Kong, pp. 75–80, 2000.

T. Supnithi, T. Ruangrajitpakorn, K. Trakultaweekool, and P. Porkaew, “AutoTagTCG: A framework for automatic Thai CG tagging”, Proc. of LREC, pp. 971-974, 2010.

S. Saychum, C. Hansakunbuntheung, N. Thatphithakkul, T. Ruangrajitpakorn, C. Wutiwiwatchai, T. Supnithi, A. Chotimongkol, A. Thangthai, “Categorial-grammar-based phrase break prediction”, Proc. of ECTI-CON, 2011.

A. Thangthai, C. Hansakunbuntheung, R. Siricharoenchai, C. Wutiwiwatchai, “Automatic syllable-pattern induction in statistical Thai text-to-phone transcription”, Proc. of INTERSPEECH 2006.

S. Saychum, S. Kongyoung, A. Rugchatjaroen, P.Chootrakool, S. Kasuriya, C. Wutiwiwatchai, “Efficient Thai grapheme-to-phoneme conversion using CRF-based joint sequence modeling”, Proc. of INTERSPEECH, 2016.

W. Aroonmanakun, N. Thapthong, P. Wattuya, B. Kasisopa, S. Luksaneeyanawin, “Generating Thai transcriptions for English words. In SEALS XIV, Vol. 1, Papers from the 14th annual meeting of the Southeast Asian Linguistics Society 2004, Wilaiwan Khanittanan and Paul Sidwell (eds). May 19-21, 2004, Bangkok, 13-22.

K. Pitakpawatkul, A. Suchato, P. Punyabukkana, C. Wutiwiwatchai, “Thai phonetization of English words using English syllables”, Proc. of ECTI-CON, 2013.

A. Thangthai, N. Thatphithakkul, C. Wutiwiwatchai, A. Rugchatjaroen, and S. Saychum, “T-Tilt: a modified Tilt model for F0 analysis and synthesis in tonal languages”, Proc. of INTERSPEECH, pp. 2270-2273, 2008.

A. Thangthai, A. Rugchatjaroen, N. Thatphithakkul, A. Chotimongkol, C. Wutiwiwatchai, “Optimization of T-Tilt F0 modeling”, Proc. of INTERSPEECH, 2009.

S. Prom-on, Y. Xu, “Pitch target representation of Thai tones”, Proc. of TAL, 2012.

V. Chunwijitra, T. Nose, T. Kobayashi, “A tone-modeling technique using a quantized F0 context to improve tone correctness in average-voice-based speech synthesis,” Speech Communication, vol.54(2), pp.245-255, 2012.

S. Chomphan, “Fujisaki’s model of fundamental frequency contours for Thai dialects”, Journal of Computer Science 6 (11): 1246-1254, 2010.

V. Boonpiam, A. Rugchatjaroen, C. Wutiwiwatchai, “Crosslanguage F0 modeling for under-resourced tonal languages: a case study on Thai-Mandarin”, Proc. of INTERSPEECH, 2009.

C. Hansakunbuntheung, H. Kato, Y. Sagisaka, “Syllablebased Thai duration model using multi-level linear regression and syllable accommodation”, Proc. of ISCA Workshop on Speech Synthesis (SSW6), pp. 356-361, 2007.

A. Rugchatjaroen, A. Thangthai, S. Saychum, N. Thatphithakkul, C. Wutiwiwatchai, “Prosody-based naturalness improvement in Thai unit-selection speech synthesis”, Proc. of ECTI-CON, Thailand, 2007.

S. Saychum, A. Rugchatjaroen, N. Thatphithakkul, C. Wutiwiwatchai A. Thangthai, “Automatic duration weighting in Thai unit-selection speech synthesis”, Proc. of ECTI-CON, Krabi, pp. 549-552, 2008.

C. Hansakunbutheung, A. Thangthai, C. Wutiwiwatchai and R. Siricharoenchai. “Learning Methods and Features for Corpus-Based Phrase Break Prediction on Thai”, Proc. of INTERSPEECH, pp. 325-328, 2005.

S. Nuratch, P. Boonpramuk, C. Wutiwiwatchai, “Shape and frequency continuity improvement in concatenation-based Thai text-to-speech synthesis”, Proc. of SNLP, 2007.

N. Kertkeidkachorn, S. Chanjaradwichai, P. Punyabukkana, A. Suchato, “CHULA TTS: A modularized text-to-speech framework”, Proc. of PACLIC, pp. 414–421, 2014.

A. Rugchatjaroen, N. Thatphithakkul, A. Chotimongkol, A. Thangthai, C. Wutiwiwatchai, “Speaker adaptation using a parallel phone set pronunciation dictionary for Thai-English bilingual TTS”, Proc. of INTERSPEECH, 2009.

C. Wutiwiwatchai, A. Thangthai, A. Chotimongkol, C. Hansakunbuntheung, N. Thatphithakkul, “Accent level adjustment in bilingual Thai-English text-to-speech synthesis”, Proc. of ASRU, 2011.

N. Chinathimatmongkhon, A. Suchato, P. Punyabukkana, “Implementing Thai text-to-speech synthesis for hand-held devices”, Proc. of ECTI-CON, Krabi, Thailand, 2008.

K. Wongpatikaseree, A. Ratikan, A. Thangthai, A. Chotimongkol, C. Nattee, “A real-time Thai speech synthesizer on a mobile device”, Proc. of SNLP, 2009.

K. Wongpatikaseree, A. Ratikan, A. Chotimongkol, P. Chootrakool, C. Nattee, T. Theeramunkong, T. Kobayashi, “A hybrid diphone speech unit and a speech corpus construction technique for a Thai text-to-speech system on mobile

devices”, Proc. of ECTI-CON, 2010.

S. Saychum, A. Thangthai, P. Janjoi, N. Thatphithakkul, C. Wutiwiwatchai, P. Lamsrichan, T. Kobayashi, “A bi-lingual Thai-English TTS system on Android mobile devices”, Proc. of ECTI-CON, 2012.

S. Saychum, N. Thatphithakkul, C. Wutiwiwatchai, P. Lamsrichan, T. Kobayashi, “Fast-track text processing for real-time text-to-speech on mobile devices”, Proc. of ECTICON, 2013.

A. Suchato, J. Chirathivat, P. Punyabukkana, “Enhancing a voice-enabled web browser for the visually impaired”, Proc. of ICAS, Vientiane , Laos , 2006.

C. Khorinphan, S. Phansamdaeng, S. Saiyod, “Thai speech synthesis with emotional tone based on formant synthesis for home robot”, Proc. of ICT-ISPC, 2014.

N. Moknarong, A. Suchato, P. Punyabukkana, “Detecting romanized Thai tokens in social media texts”, Proc. of ICSEC, pp. 36-41, 2013.

S. Chomphan, “Modeling of fundamental frequency contour of Thai expressive speech using Fujisaki’s model and structural model", Journal of Computer Science 7 (8) (2011) 1310-1317

Downloads

Published

2024-02-12

How to Cite

1.
Wutiwiwatchai C, Hansakunbuntheung C, Rugchatjaroen A, Saychum S, Kasuriya S, Chootrakool P. Thai Text-to-Speech Synthesis: A Review. j.intell.inform. [Internet]. 2024 Feb. 12 [cited 2024 Dec. 23];2(October). Available from: https://ph05.tci-thaijo.org/index.php/JIIST/article/view/119