Extracting Co-occurrences of Emojis and Words as Important Features for Human Trafficking Detection Models
Keywords:
Human Trafficking, Emoji, Machine LearningAbstract
Human trafficking is an illegal activity and a major problem of humanity that governments of most countries are trying to prevent. Recently, traffickers have been using social media on the Internet to promote and advertise their business, especially prostitution. Emoji as well as some special words, semantically recognized only in their community, have been used to conveying messages in their advertising communication. This makes it harder for law enforcement officers to track and prevent the activities. In this paper, we propose a feature selection approach focusing on the co-occurrence of emoji and important words for training machine learning (ML) models to detect human trafficking advertisement on social media. In our experimentation, we employed 3 ML models in order to compare our work against the baseline models of E. Tong et al. using the trafficking-10k data set. The result has shown that our method significantly outperforms the other’s in terms of the F1-score.
References
E. Tong, A. Zadeh, C. Jones, and M. Louis-Philippe, “Combating Human Trafficking with Multimodal Deep Models,” in Proceedings ofthe 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017.pp. 1547-1556.
J. Zhu, L. Li, and C. Jones, “Identification and Detection of Human Trafficking Using Language Models,” in: European Intelligence and Security Informatics Conference (EISIC), 2019. pp. 24-31.
M.C. Lee, C. Vajiac, A. Kulshrestha, S. Levy, N. Park, C. Jones, C., R. Rabbany, and C. Faloutsos, “InfoShield: Generalizable InformationTheoretic Human-Trafficking Detection,” in 37th IEEE International Conference on Data Engineering (ICDE), 2021.pp. 1116 1127.
Q. Bai, Q. Dan, Z. Mu, and M. Yang, “A Systematic Review of Emoji: Current Research and Future Perspectives,” in Frontiers in Psychology (10), 2019.
R. McAlister, “Webscraping as an Investigation Tool to Identify Potential Human Trafficking Operations in Romania,” in Proceedings of the ACM Web Science Conference (WebSci '15), 2015. pp. 1–2.
S. Roshan, S. V. Kumar, and M. Kumar, “Project spear: Reporting human trafficking using crowdsourcing,” in 4th IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics (UPCON), 2017. pp. 295-299.
M. Hultgren, M. E. Jennex, J. Persano, and C. Ornatowski, “Using Knowledge Management to Assist in Identifying Human Sex Trafficking,” in 49th Hawaii International Conference on System Sciences (HICSS), 2016. pp. 4344-4353.
R. Kapoor R., M. Kejriwal, and P. Szekely, “Using Contexts and Constraints for Improved Geotagging of Human Trafficking Webpages,” in Proceedings of the Fourth International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, Article 3. 2017.
A. Mensikova, and C.A. Mattmann, “Ensemble Sentiment Analysis to Identify Human Trafficking in Web Data,” in Proceedings of ACM Workshop on Graph Techniques for Adversarial Activity Analytics, 2018. 6 pages.
M. Hernández-Álvarez, “Detection of Possible Human Trafficking in Twitter,” in International Conference on Information Systems and Software Technologies (ICI2ST), 2019. pp. 187-191.
M. Diaz and A. Panangadan. “Natural Language-based Integration of Online Review Data sets for Identification of Sex Trafficking Businesses,” in IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI), 2020.pp.259-264.
Full Emoji List, https://unicode.org/emoji/charts/full-emoji-list.html, last accessed 2021/08/04.