Towards Electronic Version of the Royin Thai Dictionary from Information-Heavily Semi-structured Data Source
Keywords:
Dictionary Development, Electronic Dictionary, Information Extraction, Semi-structure SourceAbstract
As to provide knowledge of Thai words, the Royin dictionary has been decided to become digitised. In this work, processes of extracting information from printing version of the dictionary are described. Since the information source is in semi-structured format, an automatic method of type detection is used to extract respective details into database. Patterns and format of the source are fully used in consequence as a hint for extraction. Moreover, ambiguities and their solution in extracting process are discussed. As a result, lexical entries are systematically stored with distinguishable details, and entries are connected with other by interoperable relations. From evaluation, the automatic extraction processes can handle more than 80% of entries in overall, and the remaining ambiguous entries were sent to experts for decision-making.
References
Office of the Royal Society, Royin Thai Dictionary, 2011. (in Thai)
Palingoon P., Chantanapraiwan P., Theerawattanasuk S., Charoenporn T. and Sornlertlumvanich V., Qualitative and Quantitative Approaches in Bilingual Corpus-Based Dictionary, In Proc. Of The 5th Symposium on Natural Language Processing 2002 & Ori-ental COCOSDA Workshop 2002, 2002.
LEXiTRON Thai-English Electronic Dictionary, Available online at http://lexitron.nectec.or.th
Ruangrajitpakorn T. and Supnithi T., Pali-Thai Dictionary: A semi-automatic approach on form-based to content-based structure, In Proc. Of International Workshop on e-Learning Tools, Techniques and Applications for Cultural Heritage in The 17th International Conference on Computers in Education 2009 (ICCE 2009), 2009.
Kozaki K., Hayashi Y., Sasajima M., Tarumi S., and Mizoguchi R., Understanding Seman-tic Web applications,In Proc. of the 3rd Asian Semantic Web Conference (ASWC2008), 2008, pp. 524–539.
ISO 1951:2007 - Presentation/representation of entries in dictionaries -- Requirements, recommendations and information, Available online at https://www.iso.org/standard/36609.html