Implementation of Machine Learning for Spam Detection and Topic Modeling for Emails in Bahasa Indonesia

한국인공지능학회
인공지능연구
Vol.12 No. 4
2024.12

9 - 19 (11 pages)
DOI : 10.24225/kjai.2024.12.4.9

원문저장

원문보기

Indonesia ranks fifth as the country of origin for spammers. Attention is urgently needed to tackle spam, especially in Bahasa Indonesia (Indonesian language), which can be achieved by building the best spam detection model. This study aims to compare machine learning models for spam detection, study spam email modeling topics, and design the implementation on the REST API. Spam detection is carried out using machine learning algorithms, i.e., Long Short Term Memory (LSTM), K-Nearest Neighbours (KNN), Naive Bayes, Random Forest, Adaboost, and Support Vector Machine (SVM) combined with slang preprocessing convert and translate. Furthermore, Latent Dirichlet Allocation (LDA) is used for topic modeling of spam emails. The results show that slang processes convert and translate can improve accuracy and f1-score, Long Short Term Memory (LSTM) was the best method with accuracy 93.15% and f1-score of 93.01%, compared to the other methods. In addition, there were five main topics on data categorized as spam: promotions, job vacancies, educational offers, bulletins and news, and investment and finance. A REST API model was successfully developed to separate spam categories based on promotional and other topics.

1. Introduction

2. Related Works

3. Methodology

4. Result and Analysis

5. Conclusions

References

Implementation of Machine Learning for Spam Detection and Topic Modeling for Emails in Bahasa Indonesia

(0)

(0)

(0)

(0)