PENERAPAN ALGORITMA C4.5 UNTUK PREDIKSI PRESTASI SISWA DI MTS PGII BANJAR BERDASARKAN FAKTOR AKADEMIK DAN NON-AKADEMIK
DOI:
https://doi.org/10.25157/jmsig.v2i2.5928Keywords:
C4.5, Performance Prediction, Data Mining, Educational Data Mining, Madrasah TsanawiyahAbstract
Student academic performance is an important indicator for measuring the success of learning processes in educational institutions. This study aims to apply the C4.5 algorithm to predict student performance at MTs PGII Banjar based on academic and non-academic factors. This research uses a quantitative approach with computational experimental methods following the CRISP-DM methodology. The research data were obtained from 652 students of MTs PGII Banjar for the academic years 2021/2022-2023/2024 selected using purposive sampling technique. Research variables include academic factors (subject grades, attendance) and non-academic factors (learning motivation, parental support, socioeconomic status). The C4.5 algorithm implementation was conducted using RapidMiner Studio with parameters of minimum instances per leaf = 5, confidence factor = 0.25, and minimum gain threshold = 0.01. The results show that the prediction model using the C4.5 algorithm achieved an accuracy of 78.68%, precision of 77.84%, recall of 78.12%, and F1-score of 77.98%. The AUC-ROC value of 0.842 indicates excellent model discrimination capability. Validation using 10-fold cross validation demonstrated consistent performance with low standard deviation (0.57%). Information gain analysis shows Mathematics grade as the strongest predictor (0.847), followed by Science grade (0.723), attendance level (0.689), and learning motivation (0.634). The generated decision tree identified 23 classification rules with an average confidence of 84.2% that can be interpreted as an early warning system for identifying at-risk students. This model can be implemented as a decision support system to improve academic management quality through data-driven decision making.
References
Djamarah, S. B. (2015). Psikologi belajar (Edisi 3). Jakarta: Rineka Cipta.
Ghozali, I. (2018). Aplikasi analisis multivariate dengan program IBM SPSS 25 (Edisi 9).
Semarang: Badan Penerbit Universitas Diponegoro.
Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2019). Multivariate data analysis (8th ed.). Boston: Cengage Learning.
Han, J., Kamber, M., & Pei, J. (2022). Data mining: Concepts and techniques (4th ed.).
Morgan Kaufmann Publishers.
Hastie, T., Tibshirani, R., & Friedman, J. (2017). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). New York: Springer Science & Business Media.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning: With applications in R (2nd ed.). New York: Springer.
Kotsiantis, S., Pierrakeas, C., & Pintelas, P. (2014). Machine learning in education: Techniques and applications. New York: IGI Global.
Krathwohl, D. R. (2018). A revision of Bloom’s taxonomy: An overview (Revisi ed.). Theory
Into Practice Educational Resources.
Larose, D. T., & Larose, C. D. (2019). Data mining and predictive analytics (2nd ed.). John Wiley & Sons.
Provost, F., & Fawcett, T. (2020). Data science for business: What you need to know about data mining and data-analytic thinking. O’Reilly Media, Inc.
Quinlan, J. R. (2014). C4.5: Programs for machine learning (Edisi terbaru). San Mateo, CA: Morgan Kaufmann Publishers.








