Asosiasi Single Nucleotide Polymorphism pada Diabetes Mellitus Tipe 2 Menggunakan Random Forest Regression
Abstract
Precision medicine can be developed by determining association between genomic data, represented by Single Nucleotide Polymorphism (SNP), and phenotype of diabetes mellitus type 2 (T2D). The number of SNP is actually very abundance. Thus, sorting and filtering the SNP is required before conducting association. The purpose of this paper was to associate SNP with T2D phenotypes. SNP ranking was conducted to choose significant SNPs by calculating importance score. Selected SNPs were associated with T2D phenotype using random forest regression. Moreover, the epistasis was also examined to show the interactions among SNPs affecting phenotype. This paper obtained 301 importance SNPs. Top ten SNPs have association with five T2D protein candidates. The evaluation results of the proposed models showed the Mean Absolute Error (MAE) of 0.062. This results indicate the success of random forest regression in conducting SNP and phenotype association and epistatic examination between two SNPs.
References
X.D. Zhang, “Precision Medicine, Personalized Medicine, Omics and Big Data: Concepts and Relationships,” Journal of Pharmacogenomics Pharmacoproteomics, Vol. 06, No. 02, hal. 1–2, 2015.
B.E. Huang, W. Mulyasasmita, dan G. Rajagopal, “The Path from Big Data to Precision Medicine,” Expert Rev. Precis. Med. Drug Dev., Vol. 1, No. 2, hal. 129–143, 2016.
Y. Yu, B. Wang, Z. Wang, F. Wang, dan L. Liu, “Wrapper Feature Selection Based Multiple Logistic Regression Model for Determinants Analysis of Residential Electricity Consumption,” 2017 Asian Conf. on Energy, Power and Transport. Electrification (ACEPT), 2017, hal. 1-8.
R.L. Perlman, “Mouse Models of Human Disease: An Evolutionary Perspective,” Evol. Med. Public Health, Vol. 2016, No. 1 , hal. 170–176, 2016.
K. Zarkogianni, M. Athanasiou, A.C. Thanopoulou, dan K.S. Nikita, “Comparison of Machine Learning Approaches Towards Assessing the Risk of Developing Cardiovascular Disease as a Long-Term Diabetes Complication,” IEEE J. Biomed. Heal. Informatics, Vol. 22, No. c, hal. 1637-1647, 2017.
A. Boutorh dan A. Guessoum, “Engineering Applications of Artificial Intelligence Complex Diseases SNP Selection and Classification by Hybrid Association Rule Mining and Artificial Neural Network — based Evolutionary Algorithms,” Eng. Appl. Artif. Intell., Vol. 51, hal. 58–70, 2016.
B.W. Kang, H. Jeon, Y.S. Chae, S.J. Lee, J.Y. Park, J.E. Choi, J.S. Park, G.S. Choi, dan J.G. Kim, “Association between GWAS-Identified Genetic Variations and Disease Prognosis for Patients with Colorectal Cancer,” PLoS One, Vol. 10, No. 3, hal. 1–9, 2015.
H.J. Lee, J.W. Lee, S.H. Jin, H.J. Yoo, dan M. Park, “Detecting Highdimensional Genetic Associations using a Markov-Blanket in a Familybased Study,” 2016 IEEE Int. Conf. on Bioinf. and Biomed. (BIBM), 2016, hal. 1767–1770.
L. Zhang, Q. Pan, Y. Wang, X. Wu, dan X. Shi, “Bayesian Network Construction and Genotype-Phenotype Inference Using GWAS Statistics,” IEEE/ACM Trans. on Comp. Biol. and Bioinf., Vol. 16, No. 2, hal. 475-489, 2019.
J.H. Oh, S. Kerns, H. Ostrer, S.N. Powell, B. Rosenstein, dan J.O. Deasy, “Computational Methods using Genome-wide Association Studies to Predict Radiotherapy Complications and to Identify Correlative Molecular Processes,” Nat. Publ. Gr., hal. 1–10, 2017.
C. Yao, D.M. Spurlock, L.E. Armentano, C.D. Page Jr., M.J. Vandehaar, dan D.M. Bickhart, “Random Forests Approach for Identifying Additive and Epistatic Single Nucleotide Polymorphisms Associated with Residual Feed Intake in Dairy Cattle,” J. Dairy Sci., Vol. 96, No. 10, hal. 6716–6729, 2013.
D. Setiawan, W.A. Kusuma, dan A.H. Wigena. "SNP Selection using Variable Ranking and Sequential Forward Floating Selection with Two Optimality Criteria," J. Eng. Sci. Tech. Rev., Vol. 11, No. 5, hal. 76–85, 2018.
L. Crawford, P. Zeng, S. Mukherjee, dan X. Zhou, "Detecting Epistasis with the Marginal Epistasis Test in Genetic Mapping Studies of Quantitative Traits," PLoS Genetics, Vol. 13, No. 7, hal. 1-37, 2017.
C. Sandor, N. L. Beer, dan C. Webber, “Diverse Type 2 Diabetes Genetic Risk Factors Functionally Converge in a Phenotype-focused Gene Network,” PLoS Comput. Biol., Vol. 13, No. 10, hal. 1–23, 2017.
U. Ilhan, G. Tezel, dan C. Özcan, “Tag SNP Selection Using Similarity Associations between SNPs,” Proc. 2015 Int. Symp. Innov. Intell. Syst. Appl. (INISTA), 2015, hal. 1-8.
T.-T. Nguyen, J. Huang, Q. Wu, T. Nguyen, dan M. Li, “Genome-wide Association Data Classification and SNPs Selection Using Two-stage Quality-based Random Forests,” BMC Genomics, Vol. 16, Suppl. 2, hal. 1-11, 2015.
J.K. Jaiswal dan R. Samikannu, “Application of Random Forest Algorithm on Feature Subset Selection and Classification and Regression,” 2017 World Congr. on Comp. and Comm. Tech. (WCCCT), 2017, hal. 65-68.
K. Fawagreh, M.M. Gaber, dan E. Elyan, “Random Forests: From Early Developments to Recent Advancements,” Syst. Sci. Control Eng., Vol. 2, No. 1, hal. 602–609, 2014.
X. Guo, Y. Meng, N. Yu, dan Y. Pan, “Cloud Computing for Detecting High-order Genome-wide Epistatic Interaction via Dynamic Clustering,” BMC Bioinformatics, Vol. 15, No. 102, hal. 1–16, 2014.
A. Liaw dan M. Wiener, “Classification and Regression by randomForest,” R News, Vol. 2/3. hal. 18-22, 2002.
A. Mahajan, M.J. Go, W. Zhang, J.E. Below, K.J. Gaulton, et al, “Genome-Wide Trans-ancestry Meta-analysis Provides Insight into the Genetic Architecture of Type 2 Diabetes Susceptibility,” Nat Genet., Vol. 46, No. 3, hal. 234-244, 2014.
M. Kayri, I. Kayri, dan M.T. Gencoglu, “The Performance Comparison of Multiple Linear Regression, Random Forest and Artificial Neural Network by using Photovoltaic and Atmospheric Data,” 2017 14th Int. Conf. on Eng. of Modern Electric Systems (EMES), 2017, hal. 1-4.
A. Wonkam, V.J.N. Bitoungui, A.A. Vorster, R. Ramesar, R.S. Cooper, B. Tayo, G. Lettre, dan J. Ngogang, “Association of Variants at BCL11A and HBS1L-MYB with Hemoglobin F and Hospitalization Rates among Sickle Cell Patients in Cameroon,” PLoS One, Vol. 9, No. 3, hal. 1-9, 2014.
S.A. Haddad, J.R. Palmer, K.L. Lunetta, dan M.C.Y. Ng, “A Novel TCF7L2 Type 2 Diabetes SNP Identified from Fine Mapping in African American Women,” PLoS One, Vol. 12, No. 3, hal. 1–15, 2017.
C.E. Arámbul-carrillo dan M.E. Ramos-márquez, “Association between Polymorphism in the AKT1 Gene and Type 2 Diabetes Mellitus in a Mexican Population,” Rev. Mex. Endocrinol. Metab. Nutr., Vol. 2, hal. 167–170, 2015.
C.L. Schmalohr, J. Grossbach, M. Clément-Ziza, dan A. Beyer, “Detection of Epistatic Interactions with Random Forest Author Summary,” PLOS, hal. 1–23, 2018.
© Jurnal Nasional Teknik Elektro dan Teknologi Informasi, under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License.