8th ISF
COMP032 - DNA Methylation Biomarker Analysis for Early-Stage Lung Cancer Detection and Subtype Classification Using Machine Learning
Speak/Pause/Resume
Stop
Lung cancer is the second most common cancer type in which patients often get detected in metastatic stage. It is classified into three major subtypes: small cell lung cancer (SCLC), lung adenocarcinoma (LUAD), and lung squamous cell carcinoma (LUSC). In this study, we identified cancer-specific DNA methylation sites to apply as biomarkers for lung cancer detection and subtype classification. Machine learning models were trained on DNA methylation data from TCGA and GEO databases. The Random Forest Classifier (RFC) is a baseline model, and it is compared to three additional algorithms: Naïve Bayes, Support Vector Machine, and XGBoost. To enhance model performance, we investigated different pre-processing and feature selection methods. The detection of lung cancer tumor shows the best F1 score at 0.99 after filter selection; meanwhile, the classification of three lung cancer subtypes shows the best F1 score at 0.97 after filter, embedded, and wrapper selections. RFC showed the highest performance compared to other algorithms. By tracing potential methylation sites back to original genes, we found that genes identified by lung cancer detection model are associated with cancer cell suppression, proliferation, and metastasis pathways. Genes identified by subtype classification model are also related to specific lung cancer subtypes, including RUNX3, SIX3, CMIP, and A2TML1.
Show MoreName :
Email :
Advisor :
School :
PROJECT QR CODE