Implementation of Ensemble Methods on Classification of CDK2 Inhibitor as Anti-Cancer Agent

Isman Kurniawan(1*), Mela Mai Anggraini(2), Annisa Aditsania(3), Erwin Budi Setiawan(4)

(1) School of Computing, Telkom University, Bandung
(2) School of Computing, Telkom University, Bandung
(3) School of Computing, Telkom University, Bandung
(4) School of Computing, Telkom University, Bandung
(*) Corresponding Author


Cancer is known as the second leading cause of death worldwide. About 7-10 million cases of death by cancer occur every year. The recent treatment to heal the cancer is chemotherapy. However, chemotherapy treatment is known to have side effects and cell resistance issues to certain drugs. Therefore, it is required to develop a new drug that can reduce the side effects and provide a better treatment effect. In general, anti-cancer drugs are developed by targeting Cyclin-Dependent Kinase 2 (CDK2) enzyme. Conventional drug design is not effective and efficient for obtaining new drug candidates because of no information about the biological activity before it is synthesized. In this study, we aim to develop a model to predict the activity of CDK2 inhibitors by using ensemble methods, i.e.,  XGBoost, Random Forest, and AdaBoost. The study was conducted by calculating several fingerprints, i.e., Estate, Extended, Maccs, and Pubchem, as feature variables. Based on the results, we found that Random Forest with Pubchem fingerprint gives the best result with the value of Matthews Correlation Coefficient (MCC) and Area Under the ROC Curve (AUC) values are 0.979 and 0.999, respectively. From this study, we contributed to revealing the potency of the ensemble with fingerprint in bioactivity prediction, especially CDK2 inhibitors as anti-cancer agents.


QSAR; CDK2; XGBoost; random forest; AdaBoost

Full Text:



[1] D. Lu, T.-R. Lu, and H. wu, “Personalized Cancer Therapy: A Perspective,” Clin. Exp. Pharmacol., vol. 04, p. 153, Jan. 2014, doi: 10.4172/2161-1459.1000153.

[2] D. Lu, T.-R. Lu, J.-Y. Che, and N. sastry Yarla, “Individualized Cancer Therapy, What is the Next Generation?,” vol. 2, Jun. 2018.

[3] M. F. Aziz, “Gynecological cancer in Indonesia,” J. Gynecol. Oncol., vol. 20, no. 1, pp. 8–10, Mar. 2009, doi: 10.3802/jgo.2009.20.1.8.

[4] “Cancer today.” [Online]. Available: [Accessed: Oct. 19, 2022]

[5] B. A. Chabner and T. G. Roberts, “Chemotherapy and the war on cancer,” Nat. Rev. Cancer, vol. 5, no. 1, pp. 65–72, Jan. 2005, doi: 10.1038/nrc1529.

[6] N. Carelle, E. Piotto, A. Bellanger, J. Germanaud, A. Thuillier, and D. Khayat, “Changing patient perceptions of the side effects of cancer chemotherapy,” Cancer, vol. 95, no. 1, pp. 155–163, Jul. 2002, doi: 10.1002/cncr.10630.

[7] A. Coates et al., “On the receiving end--patient perception of the side-effects of cancer chemotherapy,” Eur. J. Cancer Clin. Oncol., vol. 19, no. 2, pp. 203–208, Feb. 1983, doi: 10.1016/0277-5379(83)90418-2.

[8] M. de Boer-Dennert et al., “Patient perceptions of the side-effects of chemotherapy: the influence of 5HT3 antagonists.,” Br. J. Cancer, vol. 76, no. 8, pp. 1055–1061, 1997.

[9] “American Cancer Society | Information and Resources about for Cancer: Breast, Colon, Lung, Prostate, Skin.” [Online]. Available: [Accessed: Oct. 19, 2022]

[10] D. B. Longley and P. G. Johnston, “Molecular mechanisms of drug resistance,” J. Pathol., vol. 205, no. 2, pp. 275–292, Jan. 2005, doi: 10.1002/path.1706.

[11] K. Lingfei, Y. Pingzhang, L. Zhengguo, G. Jianhua, and Z. Yaowu, “A study on p16, pRb, cdk4 and cyclinD1 expression in non-small cell lung cancers,” Cancer Lett., vol. 130, no. 1, pp. 93–101, Aug. 1998, doi: 10.1016/S0304-3835(98)00115-3.

[12] R. N. Rao, “Targets for cancer therapy in the cell cycle pathway,” Curr. Opin. Oncol., vol. 8, no. 6, pp. 516–524, Nov. 1996.

[13] S. Vilar, G. Cozza, and S. Moro, “Medicinal chemistry and the molecular operating environment (MOE): application of QSAR and molecular docking to drug discovery,” Curr. Top. Med. Chem., vol. 8, no. 18, pp. 1555–1572, 2008, doi: 10.2174/156802608786786624.

[14] S. K. Singh, N. Dessalew, and P. V. Bharatam, “3D-QSAR CoMFA study on indenopyrazole derivatives as cyclin dependent kinase 4 (CDK4) and cyclin dependent kinase 2 (CDK2) inhibitors,” Eur. J. Med. Chem., vol. 41, no. 11, pp. 1310–1319, Nov. 2006, doi: 10.1016/j.ejmech.2006.06.010.

[15] S. K. Singh, N. Dessalew, and P. V. Bharatam, “3D-QSAR CoMFA study on oxindole derivatives as cyclin dependent kinase 1 (CDK1) and cyclin dependent kinase 2 (CDK2) inhibitors,” Med. Chem. Shariqah United Arab Emir., vol. 3, no. 1, pp. 75–84, Jan. 2007, doi: 10.2174/157340607779317517.

[16] P. Lan, W.-N. Chen, G.-K. Xiao, P.-H. Sun, and W.-M. Chen, “3D-QSAR and docking studies on pyrazolo[4,3-h]qinazoline-3-carboxamides as cyclin-dependent kinase 2 (CDK2) inhibitors,” Bioorg. Med. Chem. Lett., vol. 20, no. 22, pp. 6764–6772, Nov. 2010, doi: 10.1016/j.bmcl.2010.08.131.

[17] T. Chen et al., “xgboost: Extreme Gradient Boosting.” Apr. 16, 2022 [Online]. Available: [Accessed: Oct. 19, 2022]

[18] R. P. Sheridan, W. M. Wang, A. Liaw, J. Ma, and E. M. Gifford, “Extreme Gradient Boosting as a Method for Quantitative Structure–Activity Relationships,” J. Chem. Inf. Model., vol. 56, no. 12, pp. 2353–2360, Dec. 2016, doi: 10.1021/acs.jcim.6b00591.

[19] Y. Qi, “Ensemble Machine Learning,” pp. 307–323.

[20] “ChEMBL Database.” [Online]. Available: [Accessed: Jan. 08, 2020]

[21] N. M. O’Boyle, M. Banck, C. A. James, C. Morley, T. Vandermeersch, and G. R. Hutchison, “Open Babel: An open chemical toolbox,” J. Cheminformatics, vol. 3, no. 1, p. 33, Dec. 2011, doi: 10.1186/1758-2946-3-33.

[22] I. Kurniawan, M. Rosalinda, and N. Ikhsan, “Implementation of ensemble methods on QSAR Study of NS3 inhibitor activity as anti-dengue agent,” SAR QSAR Environ. Res., vol. 31, no. 6, pp. 477–492, Jun. 2020, doi: 10.1080/1062936X.2020.1773534.

[23] J. H. Friedman, “Stochastic gradient boosting,” Comput. Stat. Data Anal., vol. 38, no. 4, pp. 367–378, Feb. 2002, doi: 10.1016/S0167-9473(01)00065-2.

[24] Y. Saeys, I. Inza, and P. Larrañaga, “A review of feature selection techniques in bioinformatics,” Bioinformatics, vol. 23, no. 19, pp. 2507–2517, Oct. 2007, doi: 10.1093/bioinformatics/btm344.

[25] Y. L. Pavlov, Random Forests. 2019.

[26] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” in Computational Learning Theory - 2nd European Conference, EuroCOLT 1995, Proceedings, Jan. 1995, pp. 23–37, doi: 10.1007/3-540-59119-2_166 [Online]. Available: [Accessed: Oct. 19, 2022]

[27] M. A. Friedl and C. E. Brodley, “Decision tree classification of land cover from remotely sensed data,” Remote Sens. Environ., vol. 61, no. 3, pp. 399–409, Sep. 1997, doi: 10.1016/S0034-4257(97)00049-7.

[28] J. R. Quinlan, “Induction of decision trees,” Mach. Learn., vol. 1, no. 1, pp. 81–106, Mar. 1986, doi: 10.1007/BF00116251.

[29] “Classification and Regression Trees | Leo Breiman | Taylor & Francis e.” [Online]. Available: [Accessed: Oct. 19, 2022]

[30] S. Boughorbel, F. Jarray, and M. El-Anbari, “Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric,” PLOS ONE, vol. 12, no. 6, p. e0177678, Jun. 2017, doi: 10.1371/journal.pone.0177678.

[31] T. Fawcett, “An introduction to ROC analysis,” Pattern Recognit. Lett., vol. 27, no. 8, pp. 861–874, Jun. 2006, doi: 10.1016/j.patrec.2005.10.010.


Article Metrics

Abstract views : 1278 | views : 863


  • There are currently no refbacks.

Copyright (c) 2023 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright of :
IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
ISSN 1978-1520 (print); ISSN 2460-7258 (online)
is a scientific journal the results of Computing
and Cybernetics Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133 |

View My Stats1
View My Stats2