ABSTRACT
Background: Breast Cancer (BCa) predisposition has 30% genetic and 70% environmental (E), health & life style (HLS) components. Exposures to exogenous (viruses, chemicals and radiation) or endogenous (estrogen) carcinogens contribute to the etiology of BCa. Available tools offer statistical probability of risk at a population level but, not at an individual level. Our aim is to build predictive models for BCa risk for personalized screening. Hypothesis: HLS/E information collected by population biobanks could serve as surrogates to identify risk factors and help build machine learning based predictive models. Methods: We collected data (378 features related to Diet/HLS/E) from 810 healthy subjects/576 Bca cases. We divided the data into training and validation sets. We used WEKA tools for implementation and tested 13 different algorithms. Results and Conclusions: HLS/E factors as features (age, ethnicity, and type of food input, social involvement, traveling, physical activity and body measurement) produced a good predictive model; Bayes Network in the training (10-fold cross validation) and validation sets showed an accuracy of 87.65% and 95.68% respectively. Serum profiling (molecular/ metabolome) of the subjects may help to gain mechanistic insights to disease etiology. Our model will potentially aid in screening of individuals who are predisposed to breast cancer risk.Read More…