Prediksi Stroke Menggunakan Extreme Gradient Boosting

Danang Triantoro Murdiansyah

Abstract


Stroke merupakan salah satu penyakit yang paling banyak menyebabkan disabilitas dan kematian pada orang dewasa di seluruh dunia. Salah satu hal yang penting terkait stroke adalah pengobatan dini, sehingga stroke tidak berkembang ke level yang parah pada seseorang. Oleh karena itu prediksistroke pada seseorang sebelum penyakit tersebut berkembang lebih jauh adalah sangat penting. Penelitian ini berisi prediksi stroke pada seseorang menggunakan algoritma berbasis machine learning, yaitu algoritma Extreme Gradient Boosting, disebut juga dengan XGBoost.Algoritma XGBoost dipilih karenamemiliki potensi kemampuan yang baik untuk melakukan prediksi (klasifikasi). XGBoost telah banyak digunakan oleh para peneliti untuk mencapai hasil yang bagus dalam memecahkan berbagai kasus menggunakan machine learning. Pada penelitian ini model machine learning yang dirancang dengan menggunakan XGBoost dibandingkan dengan model machine learning lain yang telah digunakan sebelumnya, yaitu model jenis Stacking, Random Forest, dan Majority Voting. Hasil pengujian menunjukkan XGBoost dapat mencapai performa yang baik dalam seluruh metrik evaluasi, termasuk akurasi yang mendapatkan nilai 95.4%, namun XGBoost pada penelitian ini performanya belum bisa mengungguli Stacking dan Random Forest, yang mana Stacking menempati performa terbaik dengan nilai akurasi 98%.

Keywords


Stroke; Prediksi; Klasifikasi; XGBoost; Extreme Gradient Boosting

References


W. Wang et al., “A systematic review of machine learning models for predicting outcomes of stroke with structured data,” PLOS ONE, vol. 15, no. 6, p. e0234722, Jun. 2020, doi: 10.1371/journal.pone.0234722.

K. A. Blackham et al., “Endovascular therapy of acute ischemic stroke: report of the Standards of Practice Committee of the Society of NeuroInterventional Surgery,” Journal of NeuroInterventional Surgery, vol. 4, no. 2, pp. 87–93, Mar. 2012, doi: 10.1136/neurintsurg-2011-010243.

“Heart Disease and Stroke Statistics—2023 Update: A Report From the American Heart Association | Circulation.” Accessed: Feb. 29, 2024. [Online]. Available: https://www.ahajournals.org/doi/full/10.1161/CIR.0000000000001123

A. S. Ferrell and G. W. Britz, “Developments on the horizon in the treatment of neurovascular problems,” Surg Neurol Int, vol. 4, no. Suppl 1, pp. S31–S37, Mar. 2013, doi: 10.4103/2152-7806.109194.

T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, in KDD ’16. New York, NY, USA: Association for Computing Machinery, Aug. 2016, pp. 785–794. doi: 10.1145/2939672.2939785.

M. Monteiro et al., “Using Machine Learning to Improve the Prediction of Functional Outcome in Ischemic Stroke Patients,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 15, no. 6, pp. 1953–1959, Nov. 2018, doi: 10.1109/TCBB.2018.2811471.

M. M. Islam, S. Akter, M. Rokunojjaman, J. H. Rony, A. Amin, and S. Kar, “Stroke Prediction Analysis using Machine Learning Classifiers and Feature Technique,” International Journal of Electronics and Communications Systems, vol. 1, no. 2, Art. no. 2, Dec. 2021, doi: 10.24042/ijecs.v1i2.10393.

S. Mainali, M. E. Darsie, and K. S. Smetana, “Machine Learning in Action: Stroke Diagnosis and Outcome Prediction,” Frontiers in Neurology, vol. 12, 2021, Accessed: Feb. 29, 2024. [Online]. Available: https://www.frontiersin.org/journals/neurology/articles/10.3389/fneur.2021.734345

“Stroke Prediction Dataset.” Accessed: Jan. 28, 2024. [Online]. Available: https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.

G. A. Pradipta, R. Wardoyo, A. Musdholifah, I. N. H. Sanjaya, and M. Ismail, “SMOTE for Handling Imbalanced Data Problem : A Review,” in 2021 Sixth International Conference on Informatics and Computing (ICIC), Nov. 2021, pp. 1–8. doi: 10.1109/ICIC54025.2021.9632912.

E. Dritsas and M. Trigka, “Stroke Risk Prediction with Machine Learning Techniques,” Sensors (Basel), vol. 22, no. 13, p. 4670, Jun. 2022, doi: 10.3390/s22134670.

T. N. Rincy and R. Gupta, “Ensemble Learning Techniques and its Efficiency in Machine Learning: A Survey,” in 2nd International Conference on Data, Engineering and Applications (IDEA), Feb. 2020, pp. 1–6. doi: 10.1109/IDEA49133.2020.9170675.

V. K. Ayyadevara, “Gradient Boosting Machine,” in Pro Machine Learning Algorithms : A Hands-On Approach to Implementing Algorithms in Python and R, V. K. Ayyadevara, Ed., Berkeley, CA: Apress, 2018, pp. 117–134. doi: 10.1007/978-1-4842-3564-5_6.

G. Biau and B. Cadre, “Optimization by Gradient Boosting,” in Advances in Contemporary Statistics and Econometrics: Festschrift in Honor of Christine Thomas-Agnan, A. Daouia and A. Ruiz-Gazen, Eds., Cham: Springer International Publishing, 2021, pp. 23–44. doi: 10.1007/978-3-030-73249-3_2.

D. Ruta and B. Gabrys, “Classifier selection for majority voting,” Information Fusion, vol. 6, no. 1, pp. 63–81, Mar. 2005, doi: 10.1016/j.inffus.2004.04.008.

G. Seni and J. Elder, Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions. Morgan & Claypool Publishers, 2010.

L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324.

P. Pandey and R. Prabhakar, “An analysis of machine learning techniques (J48 & AdaBoost)-for classification,” in 2016 1st India International Conference on Information Processing (IICIP), Aug. 2016, pp. 1–6. doi: 10.1109/IICIP.2016.7975394.

S. Chandra and S. Maheshkar, “Verification of static signature pattern based on random subspace, REP tree and bagging,” Multimed Tools Appl, vol. 76, no. 18, pp. 19139–19171, Sep. 2017, doi: 10.1007/s11042-017-4531-2.

B. Pavlyshenko, “Using Stacking Approaches for Machine Learning Models,” in 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP), Aug. 2018, pp. 255–258. doi: 10.1109/DSMP.2018.8478522.

X. Zhao, J. Liang, and C. Dang, “A stratified sampling based clustering algorithm for large-scale data,” Knowledge-Based Systems, vol. 163, pp. 416–428, Jan. 2019, doi: 10.1016/j.knosys.2018.09.007.

J. Wu, X.-Y. Chen, H. Zhang, L.-D. Xiong, H. Lei, and S.-H. Deng, “Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimizationb,” Journal of Electronic Science and Technology, vol. 17, no. 1, pp. 26–40, Mar. 2019, doi: 10.11989/JEST.1674-862X.80904120.

H. K, S. Tayal, P. M. George, P. Singla, and U. Kose, Bayesian Reasoning and Gaussian Processes for Machine Learning Applications. CRC Press, 2022.

B. Charbuty and A. Abdulazeez, “Classification Based on Decision Tree Algorithm for Machine Learning,” Journal of Applied Science and Technology Trends, vol. 2, no. 01, Art. no. 01, Mar. 2021, doi: 10.38094/jastt20165.

L. M. Gladence, M. Karthi, and V. M. Anu, “A Statistical Comparison of Logistic Regression and Different Bayes Classification Methods for Machine Learning,” vol. 10, no. 14, 2015.

A. Rácz, D. Bajusz, and K. Héberger, “Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics,” Molecules, vol. 24, no. 15, Art. no. 15, Jan. 2019, doi: 10.3390/molecules24152811.

S. Putatunda and K. Rama, “A Comparative Analysis of Hyperopt as Against Other Approaches for Hyper-Parameter Optimization of XGBoost,” in Proceedings of the 2018 International Conference on Signal Processing and Machine Learning, in SPML ’18. New York, NY, USA: Association for Computing Machinery, Nov. 2018, pp. 6–10. doi: 10.1145/3297067.3297080.

L. Sun, “Application and Improvement of Xgboost Algorithm Based on Multiple Parameter Optimization Strategy,” in 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), Dec. 2020, pp. 1822–1825. doi: 10.1109/ICMCCE51767.2020.00400.

D. Eriksson and M. Poloczek, “Scalable Constrained Bayesian Optimization,” in Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR, Mar. 2021, pp. 730–738. Accessed: Feb. 19, 2024. [Online]. Available: https://proceedings.mlr.press/v130/eriksson21a.html




DOI: http://dx.doi.org/10.26798/jiko.v8i2.1295

Article Metrics

Abstract view : 517 times
PDF (Bahasa Indonesia) - 235 times

Refbacks

  • There are currently no refbacks.




Copyright (c) 2024 Danang Triantoro Murdiansyah


JIKO (Jurnal Informatika dan Komputer)

Published by
Lembaga Penelitian dan Pengabdian Masyarakat
Universitas Teknologi Digital Indonesia (d.h STMIK AKAKOM)

Jl. Raya Janti (Majapahit) No. 143 Yogyakarta, 55198
Telp. (0274)486664

Website : https://www.utdi.ac.id/

e-ISSN : 2477-3964 
p-ISSN : 2477-4413