Machine Learning Models for Predicting Carcinogenesis Pathways Using Genomic and Environmental Data
DOI:
https://doi.org/10.64149/J.Carcinog.24.4s.966-974Keywords:
Precision Oncology, Predictive Modeling, Environmental Exposure, Genomic Data, Carcinogenesis Pathways, Machine LearningAbstract
Advances in machine learning (ML) provide a powerful framework for understanding the complex interplay between genetic and environmental factors in carcinogenesis. This work develops and evaluates predictive machine learning models to classify the primary oncogenic pathways that lead to the development of cancer in people. By integrating high-dimensional genomic data (such as somatic mutations, copy number variations, and gene expression profiles) with structured environmental exposure data (such as toxins, lifestyle variables, and somatic mutations), the models discover complex, non-linear relationships that traditional statistical methods frequently miss. We compare ensemble methods like Random Forest and Gradient Boosting to more complex deep neural networks (DNNs) for this multi-class classification task. The models are trained and validated using large-scale cohorts such as The Cancer Genome Atlas (TCGA), and key predictive features are identified using feature significance analysis. Because ML models can accurately predict the primary carcinogenesis routes, the results demonstrate that they are a powerful tool for etiological study. This approach enables a more personalised understanding of cancer aetiology, which is critical for developing precision oncology and targeted prevention strategies.




