Predicting Breast Cancer Recurrence and Metastasis: A Hybrid Approach Using Chi-Square and ADA Boost on the SEER Dataset
DOI:
https://doi.org/10.64149/J.Carcinog.24.2s.150-166Abstract
Breast cancer remains one of the most prevalent malignancies among women globally, significantly contributing to cancer-related mortality. Despite advancements in diagnostic methods and treatments, the management of breast cancer is still challenging due to the risks of recurrence and metastasis. We have proposed a hybrid approach for feature reduction using the Chi-square test and the leave-one-out method, followed by the application of various machine learning and ensemble learning techniques. This research utilizes the SEER database, which spans the years 2017 to 2022 and encompasses detailed information on patient demographics, tumor characteristics, treatment modalities, and metastasis status, comprising a total of 79 features. In the pre-processing step refined the dataset to 24 features and 105,404 instances, key factors contributing to recurrence and metastasis. The analysis concentrated on a sample of 5,387 cases of breast cancer-related deaths. Statistical evaluations revealed that approximately 30% of the deaths occurred in patients aged 85 and above, with HR+/HER2- being the predominant breast cancer subtype. The majority of patients had a single tumor, with 2,143 cases. Further to predict the recurrence and metastasis, multiple machine learning models, including K-nearest neighbors, logistic regression, random forest, Adaboost, and gradient boosting, were trained and rigorously tested. By applying these algorithms, we achieved 100% accuracy in predicting recurrence and metastasis using the proposed approach. This research emphasizes the development of interpretable and actionable predictive models to assist in clinical decision-making. By identifying significant features and enhancing predictive accuracy, the models aim to improve patient outcomes in breast cancer management through early and personalized treatment plans.




