Extracting Co-occurrences of Emojis and Words as Important Features for Human Trafficking Detection Models

Authors

  • Chawit Wiriyakun Faculty of Engineering and Technology, Mahanakorn University of Technology, Bangkok, 10530, Thailand.
  • Werasak Kurutach Faculty of Engineering and Technology, Mahanakorn University of Technology, Bangkok, 10530, Thailand.

Keywords:

Human Trafficking, Emoji, Machine Learning

Abstract

Human trafficking is an illegal activity and a major problem of humanity that governments of most countries are trying to prevent. Recently, traffickers have been using social media on the Internet to promote and advertise their business, especially prostitution. Emoji as well as some special words, semantically recognized only in their community, have been used to conveying messages in their advertising communication. This makes it harder for law enforcement officers to track and prevent the activities. In this paper, we propose a feature selection approach focusing on the co-occurrence of emoji and important words for training machine learning (ML) models to detect human trafficking advertisement on social media. In our experimentation, we employed 3 ML models in order to compare our work against the baseline models of E. Tong et al. using the trafficking-10k data set. The result has shown that our method significantly outperforms the other’s in terms of the F1-score.

Author Biographies

Chawit Wiriyakun, Faculty of Engineering and Technology, Mahanakorn University of Technology, Bangkok, 10530, Thailand.

Chawit Wiriyakun received the B.Eng. (Information and Communication Engineering) degree from Mahanakorn University of Technology in 2011. He received the M.S. (Information Technology) degree from Kasetsart University in 2014. He is currently an engineer at the National Science and Technology Development Agency(NSTDA) in Thailand. He is also a Ph.D. Student at the Faculty of Engineering and Technology, Mahanakorn University of Technology.

Werasak Kurutach, Faculty of Engineering and Technology, Mahanakorn University of Technology, Bangkok, 10530, Thailand.

Werasak Kurutach received his PhD in Computer Science and Engineering from The University of New South Wales, Australia, in 1996. Currently, he is an associate professor in computer engineering at Mahanakorn University of Technology, Bangkok, Thailand

References

E. Tong, A. Zadeh, C. Jones, and M. Louis-Philippe, “Combating Human Trafficking with Multimodal Deep Models,” in Proceedings ofthe 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017.pp. 1547-1556.

J. Zhu, L. Li, and C. Jones, “Identification and Detection of Human Trafficking Using Language Models,” in: European Intelligence and Security Informatics Conference (EISIC), 2019. pp. 24-31.

M.C. Lee, C. Vajiac, A. Kulshrestha, S. Levy, N. Park, C. Jones, C., R. Rabbany, and C. Faloutsos, “InfoShield: Generalizable InformationTheoretic Human-Trafficking Detection,” in 37th IEEE International Conference on Data Engineering (ICDE), 2021.pp. 1116 1127.

Q. Bai, Q. Dan, Z. Mu, and M. Yang, “A Systematic Review of Emoji: Current Research and Future Perspectives,” in Frontiers in Psychology (10), 2019.

R. McAlister, “Webscraping as an Investigation Tool to Identify Potential Human Trafficking Operations in Romania,” in Proceedings of the ACM Web Science Conference (WebSci '15), 2015. pp. 1–2.

S. Roshan, S. V. Kumar, and M. Kumar, “Project spear: Reporting human trafficking using crowdsourcing,” in 4th IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics (UPCON), 2017. pp. 295-299.

M. Hultgren, M. E. Jennex, J. Persano, and C. Ornatowski, “Using Knowledge Management to Assist in Identifying Human Sex Trafficking,” in 49th Hawaii International Conference on System Sciences (HICSS), 2016. pp. 4344-4353.

R. Kapoor R., M. Kejriwal, and P. Szekely, “Using Contexts and Constraints for Improved Geotagging of Human Trafficking Webpages,” in Proceedings of the Fourth International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, Article 3. 2017.

A. Mensikova, and C.A. Mattmann, “Ensemble Sentiment Analysis to Identify Human Trafficking in Web Data,” in Proceedings of ACM Workshop on Graph Techniques for Adversarial Activity Analytics, 2018. 6 pages.

M. Hernández-Álvarez, “Detection of Possible Human Trafficking in Twitter,” in International Conference on Information Systems and Software Technologies (ICI2ST), 2019. pp. 187-191.

M. Diaz and A. Panangadan. “Natural Language-based Integration of Online Review Data sets for Identification of Sex Trafficking Businesses,” in IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI), 2020.pp.259-264.

Full Emoji List, https://unicode.org/emoji/charts/full-emoji-list.html, last accessed 2021/08/04.

Downloads

Published

2022-04-07

How to Cite

1.
Wiriyakun C, Kurutach W. Extracting Co-occurrences of Emojis and Words as Important Features for Human Trafficking Detection Models. j.intell.inform. [internet]. 2022 Apr. 7 [cited 2025 Aug. 9];7(April):12. available from: https://ph05.tci-thaijo.org/index.php/JIIST/article/view/153