A Machine Learning Approach for Prediction of Gibberellic Acid Metabolic Enzymes in Monocotyledonous Plants
DOI:
https://doi.org/10.14738/tmlai.24.375Keywords:
GA, SVM, WEKA, BLAST, HMMERAbstract
Gibberellins (GA) are one of the most important phytohormones that control different aspects of plant growth and influence various developments such as seed germination, stem elongation and floral induction. More than 130 GAs have been identified; however, only a small number of them are biologically active. In this study, five enzymes in GA metabolic pathway in monocots have been thoroughly researched namely, ent-copalyl-diphosphate synthase (CPS), ent-kaurene synthase (KS), ent-kaurene oxidase (KO), GA 20-oxidase (GA20ox), and GA 2-oxidase (GA2ox). We have designed and implemented a high performance prediction tool for these enzymes using machine learning algorithms. ‘GAPred’ is a web-based system to provide a comprehensive collection of enzymes in GA metabolic pathway and a systematic framework for the analysis of these enzymes for monocots. WEKA-based classifiers (Naïve-Bayes) and Support Vector Machine (SVM) based-modules were developed using dipeptide composition and high accuracies were obtained. In addition, BLAST and Hidden Markov Model (HMMER-based model) were also developed for searching sequence databases for homolog’s of enzymes of GA metabolic pathway, and for making protein sequence alignments.
References
. Phinney, B.O., The history of gibberellins. In: The Biochemistry and Physiology of Gibberellins, Crozier, A. (Ed.), Praeger Publishers, New York , USA , 1983. 1: p. 19–52.
. Lichtenthaler, H.K., Rohmer, M. and Schwender, J., Two independent biochemical pathways for isopentenyl diphosphate biosynthesis in higher plants. Physiologia Plantarum, 1997. 101: p. 643–652.
. Graebe, J.E., Gibberellin biosynthesis and control. Annual Review of Plant Physiology, 1987. 38: p. 419–465.
. Chappell, J., Biochemistry and molecular biology of the isoprenoid biosynthetic pathway in plants. Annual Review of Plant Physiology and Plant Molecular Biology, 1995. 46: p. 521-547.
. MacMillan, J., Biosynthesis of the gibberellin plant hormones. Natural Product Reports, 1997. 14: 221–244 .
. Vapnik, V.N., An overview of statistical learning theory, Neural Networks. IEEE Transactions, 1999. 10: p. 988-999.
. Cortes, C. and Vapnik, V. Support Vector Networks. Machine Learning, 1995. 20: p. 273-297.
. Burges, C.J.C., A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 1998. 2: p. 121-167.
. Noble, W.S., Support vector machine applications in computational biology. In: Kernel Methods in Computational Biology. Schoelkopf, B., Tsuda, K. and Vert, J. P. (Eds.), Cambridge, MA: MIT Press, 2004. p. 71–92.
. Joachims, T., Making large-scale SVM learning practical. In: Advances in Kernel Methods: Support Vector Learning. Schoelkopf, B.,Burges, C. and Smola,A. (Eds.), Cambridge MA: MIT Press, 1999. p. 41–56.
. Üney, F. and Türkay, M., A mixed-integer programming approach to multi-class data classification problem. European Journal of Operational Research, 2006. 173(3): p. 910-920.
. Chou, K.C. and Zhang, C.T., Prediction of protein structural classes. Critical Reviews in Biochemistry and Molecular Biology, 1995. 30: p. 275–349.
. Matthews, B.W., Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta, 1975. 405: p. 442-451.
. Baldi ,P., Brunak, S., Chauvin, Y., Andersen, C.A.F. and Nielsen, H., Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics, 2000. 16: p. 412-424.
. Carugo, O., Detailed estimation of bioinformatics prediction reliability through the Fragmented Prediction Performance Plots. BMC Bioinformatics, 2007. 8: p. 380.
. Park, J., Karplus, K., Barrett, C., Hughey, R., Haussler, D., Hubbard, T. and Chothia, C., Sequence comparisons using multiple sequences detect three times as many remote homologues as pair-wise methods. Journal of Molecular Biology, 1998. 284: p. 1201-1210.
. Hughey, R. and Krogh, A., Hidden Markov models for sequence analysis: extension and analysis of the basic method. Computer applications in the Biosciences, 1996. 12: p. 95-107.
. Mitchison, G.J. and Durbin, R., Tree-based maximal likelihood substitution matrices and hidden Markov models. Journal of Molecular Evolution, 1995. 41: p. 1139-1151.
. Swets, J.A., Measuring the accuracy of diagnostic systems. Science, 1998. 240: p. 1285–1293.