BREAST CANCER RISK PREDICTION USING DATA MINING CLASSIFICATION TECHNIQUES
Keywords:breast cancer, classification, prediction, risk factors, naïve bayes, J48 decision trees
Breast cancer poses serious threat to the lives of people and it is the second leading cause of death in women today and the most common cancer in women in developing countries in Nigeria where there are no services in place to aid the early detection of breast cancer in Nigerian women. A number of studies have been undertaken in order to understand the prediction of breast cancer risks using data mining techniques. Hence, this study is focused at using two data mining techniques to predict breast cancer risks in Nigerian patients using the naïve bayes’ and the J48 decision trees algorithms. The performance of both classification techniques was evaluated in order to determine the most efficient and effective model. The J48 decision trees showed a higher accuracy with lower error rates compared to that of the naïve bayes’ method while the evaluation criteria proved the J48 decision trees to be a more effective and efficient classification techniques for the prediction of breast cancer risks among patients of the study location.
American Cancer Society (2005). "Breast Cancer Facts & Figures 2005–2006" (PDF). Archived from the original on 13 June 2007. http://web.archive.org/web/20070613192148/http://www.cancer.org/downloads/STT/CAFF2005BrFacspdf2005.pdf. Retrieved 2013-02-26.
American Cancer Society (2007). "Cancer Facts & Figures 2007" (PDF). Archived from the original on 10 April 2007. http://web.archive.org/web/20070410025934/http://www.cancer.org/downloads/STT/CAFF2007PWSecured.pdf. Retrieved 2012-11-26.
American Cancer Society (2007). "Cancer Facts & Figures 2007" (PDF). Archived from the original on 10 April 2007. http://web.archive.org/web/20070410025934/http://www.cancer.org/downloads/STT/CAFF2007PWSecured.pdf.
Blackburn, GL; Wang, KA (2007). "Dietary fat reduction and breast cancer outcome: results from the Women's Intervention Nutrition Study (WINS).” The American journal of clinical nutrition 86 (3): s878-81. PMID 18265482.
Boffetta P, Hashibe M, La Vecchia C, Zatonski W, Rehm J (August 2006). "The burden of cancer attributable to alcohol drinking". International Journal of Cancer 119 (4): 884–7. doi:10.1002/ijc.21903. PMID 16557583.
Boris Pasche (2010). Cancer Genetics (Cancer Treatment and Research). Berlin: Springer. pp. 19–20. ISBN 1-4419-6032-5.
Collaborative Group on Hormonal Factors in Breast Cancer (August 2002). "Breast cancer and breastfeeding". Lancet 360 (9328): 187–95. doi:10.1016/S0140-6736(02)09454-0. PMID 12133652.
Delen, D., Walker, G., Kadam, A. (2005) Predicting breast cancer survivability: a comparison of three data mining methods. Artificial Intelligence in Medicine ,vol. 34, pp. 113-127 , June 2005.
Ferro, Roberto (1 January 2012). "Pesticides and Breast Cancer". Advances in Breast Cancer Research 01 (03): 30–35. doi:10.4236/abcr.2012.13005.
Gage, M; Wattendorf, D; Henry, LR (1 April 2012). "Translational advances regarding hereditary breast cancer syndromes". Journal of surgical oncology 105 (5): 444–51. doi:10.1002/jso.21856. PMID 22441895.
Grey, N and Sener, S. (2006) Reducing the global cancer burden, http://www.hospitalmanagement.net/features/feature648/, Date accessed 21 November 2012.
Gupta, S.; Kumar, D., Sharma, A (2011). Data Mining Classification Techniques Applied For Breast Cancer Diagnosis and Prognosis. Indian Journal of Computer Science and Engineering (IJCSE). Vol. 2 No. 2 pg 198-195, April, 2011. ISSN: 0976-5166.. Accessed on June 24, 2014.
Hendrick, RE (October 2010). "Radiation doses and cancer risks from breast imaging studies.". Radiology 257 (1): 246–53. doi:10.1148/radiol.10100570. PMID 20736332.
Johnson KC, Miller, AB, Collishaw, NE, Palmer, JR, Hammond, SK, Salmon, AG, Cantor, KP, Miller, MD, Boyd, NF, Millar, J, Turcotte, F (2009). "Active smoking and secondhand smoke increase breast cancer risk: the report of the Canadian Expert Panel on Tobacco Smoke and Breast Cancer Risk (2009).". Tobacco control 20 (1): e2. doi:10.1136/tc.2010.035931. PMID 21148114.
Lundin M., Lundin J., BurkeB.H.,Toikkanen S., Pylkkänen L. and Joensuu H.,(1999) “Artificial Neural Networks Applied to Survival Prediction in Breast Cancer”, Oncology International Journal for Cancer Resaerch and Treatment, vol. 57, 1999.
Mangasarian,D.S.;Street, W.N.,Wolberg, W.H (1995). Breast cancer diagnosis and prognosis via Linear programming, Operations Research, 43(4), pages 570-577, July-August 1995.
Parkin, D.M., Ferlay, J., Hamdi-Cherif, M., Sitas, F., Thomas, J.O., Wabinga, H., Whelan, S.L. (2003). Cancer in Africa Epidemiology and Prevention, IARC (WHO) Scientific Publications no. 153, IARC Press, Lyon, France.
Poongodi, M.,Manjula, L., Pradeepkumar, S., Umadevi, M. Cancer Prediction Technique Using Fuzzy Logic. International Journal of Current Research, Vol. 3, Issue 11, pg 333-336, December 12, 2001. http://www.journalera.com. ISSN: 0975-833X. Accessed on June 24, 2014.
Rajesh, K., Anand, S (2012). Analysis of SEER dataset for breast cancer diagnosis using C4.5 classification algorithm. International Journal of Advanced Research in Computer and Communication Engineering Vol. 1, Issue 2, April 2012. ISSN 2278-1021. http://www.ijarcee.com pg. 72 – 77.
Santoro, E., DeSoto, M., and Hong Lee, J (February 2009). "Hormone Therapy and Menopause". National Research Center for Women & Families. http://www.center4research.org/2010/03/hormone-therapy-and-menopause/.
Sariego J (2010). "Breast cancer in the young patient". The American surgeon 76 (12): 1397–1401. PMID 21265355.
Shajahaan, S.S; Shanthi, S., Chitra, V.M. (2013). Application of Data Mining Techniques to model Breast Cancer Data. International Journal of Emerging Technology and Advanced Engineering Vol 3, Issue 11, November 2013. ISSN 2250-2459. http://www.ijetac.com pg 362 – 369.
WHO. (2002). National Cancer Control Programmes; policies and managerial guidelines, 2nd edition.
Yager JD (2006). "Estrogen carcinogenesis in breast cancer". New Engl J Med 354 (3): 270–82. doi:10.1056/NEJMra050776. PMID 16421368.