Implementation of Random Forest Algorithm in Classifying Public Sentiment Towards Free Nutritious Meal Program
Abstract:
The free nutritious meal program in Indonesia has garnered public attention and reactions on social media, especially on X. This study aims to analyze public sentiment towards the program using the Random Forest algorithm. The data were collected from X and labeled with positive (2371 tweets) and negative (432 tweets) using the InSet Lexicon. The optimal Random Forest model was determined through hyperparameter tuning using the GridSearchCV technique. The results of the study showed that Random Forest with parameters max_features = , n_estimators = 100, max_depth = 40, min_sample_split = 2, and min_sample_leaf = 2 gave the best performance with accuracy of 87.54% and AUC score 0.8723. Based on the results, the Random Forest method proved to be effective in classifying public opinion on X regarding this program. The wordcloud visualization shows that the word “jepang” appears most frequently in positively labeled tweets, while the word “program” is more dominant in negatively labeled tweets. The results can inform government policy evaluations.
KeyWords:
Free Nutritious Meal Program, Sentiment Classification, Random Forest, GridSearchCV
References:
Afdhal, I., Kurniawan, R., Iskandar, I., Salambue, R., Budianita, E., & Syafria, F. (2022). Penerapan algoritma Random Forest untuk analisis sentimen komentar di YouTube tentang Islamofobia. Jurnal Nasional Komputasi dan Teknologi Informasi, 5(1).
Breiman, L., & Cutler, A. (2003). Manual on setting up, using, and understanding Random Forest V4.0.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software.
Buyukkececi, M., & Okur, M. C. (2023). A comprehensive review of feature selection and feature selection stability in machine learning. Journal of Science, 36(4), 1506–1520.
De’ath, G., & Fabricius, K. E. (2000). Classification and regression trees: A powerful yet simple technique for ecological data analysis. Ecology, 81(11), 3178–3192.
Fitri, E., Yuliani, Y., Rosyida, S., & Gata, W. (2020). Analisis sentimen terhadap aplikasi Ruangguru menggunakan algoritma Naive Bayes, Random Forest dan Support Vector Machine. Transformatika, 18(1), 71–80.
Intan, R., & Defeng, A. (2006). Hard: Subject-based search engine menggunakan TF-IDF dan Jaccard’s coefficient. Jurnal Teknik Industri: Jurnal Keilmuan dan Aplikasi Teknik Industri, 8(1), 61–72.
Koto, F., & Rahmaningtyas, G. Y. (2017). InSet Lexicon: Evaluation of a word list for Indonesian sentiment analysis in microblogs. IEEE Xplore, 391–394.
Mailo, F. F., & Lazuardi, L. (2021). Analisis sentimen data Twitter menggunakan metode text mining tentang masalah obesitas di Indonesia. Journal of Information Systems for Public Health, 6(1), 28–36.
Mardiana, L., Kusnandar, D., & Satyahadewi, N. (2022). Analisis diskriminan K-fold cross validation untuk klasifikasi kualitas air di Kota Pontianak. Buletin Ilmiah Matematika, Statistika dan Terapannya (Bimaster), 11(1), 97–102.
Normawati, D., & Prayogi, S. A. (2021). Implementasi Naïve Bayes Classifier dan Confusion Matrix pada analisis sentimen berbasis teks pada Twitter. Jurnal Sains Komputer & Informatika (J-SAKTI), 5(2), 697–711.
Ogundunmade, T. P., Adepujo, A. A., & Allam, A. (2022). Stock price forecasting: Machine learning models with K-fold and repeated cross validation approaches. Modern Economy and Management, 1(1).
Rahmi, I. A., Afendi, F. M., & Kurnia, A. (2023). Metode AdaBoost dan Random Forest untuk prediksi peserta JKN-KIS yang menunggak. Jambura Journal of Mathematics, 5(1), 83–94.
Stern, R. H. (2021). Interpretation of the area under the ROC curve for risk prediction models. arXiv.