Evaluating Machine Learning Algorithms for Predictive Modeling of Large-scale Event Attendance

Deni Kurnianto Nugroho, Marwan Noor Fauzy, Kardilah Rohmat Hidayat

Abstract


Predicting attendance at large-scale public events is a critical task to support better resource planning, logistics, and safety management. This study investigates the performance of various machine learning models in forecasting event attendance using metadata features such as event type, venue, location, date, and duration. The dataset comprises over 19526 event records obtained from a U.S. government open data repository, covering multiple years and diverse event categories. Model performance was evaluated using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the Coefficient of Determination (R²). Among the models tested, ensemble methods particularly Gradient Boosting Regressor and XGBoost outperformed others, achieving the lowest MAE (61.37 and 59.52, respectively) and the highest R² values (0.22 and 0.15). These results suggest superior generalization capability in capturing complex nonlinear patterns in the data. In contrast, linear models and simpler non-parametric methods such as Decision Trees and K-Nearest Neighbors (KNN) exhibited relatively weaker predictive accuracy, with R² scores close to or below 0.14. While the R² values indicate that metadata alone provides a limited view of attendance dynamics, the relatively low MAE across models implies that reasonable point predictions are still achievable. These findings highlight the potential of ensemble-based methods for baseline forecasting tasks. Furthermore, the study underscores the importance of incorporating richer feature sets such as pricing, weather, promotional activity, and social sentiment for future model improvement. This research provides a foundational benchmark for data-driven attendance forecasting and offers practical implications for event organizers seeking scalable, automated prediction tools to support strategic planning.

Full Text:

PDF

References


A. Smith and R. Stewart, “Attendance demand: Past, present and future,” Sport Management Review, vol. 2, no. 1, pp. 13–33, 1999.

G. Zaman, M. A. Shah, and A. Wahid, “Crowd estimation for large-scale events using machine learning techniques: A review,” IEEE Access, vol. 8, pp. 197503–197520, 2020.

T. Petukhina et al., “Event attendance prediction using machine learning and heterogeneous data sources,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, pp. 2497–2505.

Y. Li, B. Yu, and H. Gao, “Forecasting sports attendance using neural networks: A study of football matches,” Knowledge-Based Systems, vol. 191, 2020.

Y . Li, B. Yu, and H. Gao, “Forecasting sports attendance using neural networks: A study of football matches,” Knowledge-Based Systems, vol. 191, 2020.

M. Ahmed, F. A. Khan, and H. Malik, “Ensemble-based attendance prediction for live music events using hybrid data sources,” in Proceedings of the 2022 ACM Web Conference (WWW '22), pp. 1856–1865.

Y. Liu, J. Chen, and X. Ma, “Predicting public event attendance with urban sensing and gradient boosted models,” Journal of Big Data, vol. 6, no. 1, 2019.

A. Montgomery, G. J. G. et al., “Linear regression analysis,” Journal of Statistical Software, vol. 8, no. 2, pp. 1–20, 2003.

H. Hoerl and R. Kennard, “Ridge Regression: Biased Estimation for Nonorthogonal Problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970.

R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 58, no. 1, pp. 267–288, 1996.

H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society: Series B, vol. 67, no. 2, pp. 301–320, 2005.

L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Wadsworth, Belmont, CA, 1984.

L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.

J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, 2001.

T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794, 2016.

A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Statistics and Computing, vol. 14, no. 3, pp. 199–222, 2004.

C. Zhang and Y. Ma, Ensemble Machine Learning: Methods and Applications, Springer, 2012, ch. 3 (KNN regression).




DOI: https://doi.org/10.29040/ijcis.v6i3.249

Article Metrics

Abstract view : 6 times
PDF - 4 times

Refbacks

  • There are currently no refbacks.


situs toto

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License