Predicting Student's Performance Using Combined Heterogeneous Classification Models
ABSTRACT
With the development of information technology, universities have become more concerned about their student?s data. Therefore, educational data mining has contributed to extracting useful information from this data by analyzing and predicting student performance. This paper compares and analyses a number of the most recent algorithms, including logistic regression, K-nearest neighbour, decision tree, support vector machine, naive Bayes, multilayer perceptron, random forest, gradient boosting, Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and light gradient boosting machine (LightGBM), to predict students? academic performance. According to the analysis of the results, each of the classifiers used in the experiments produced an accurate result. However, the CatBoost algorithm produced the most accurate result compared to all others, reaching 93.15% in the student status prediction model; the XGBoost had an accuracy rate of 93%; and the RF provided a 92.9% accuracy rate. The heterogeneous model result had 93.46% accuracy.