Predictive models are of utmost importance for reliable property predictions of drug candidates. Ensemble methods have recently proved very successful for improving the predictive ability of regression and classification models and were thus quickly adapted in chemoinformatics. A large number of methods exists, yet little information is available on so-called hetero-ensembles, which combine different machine learning methods to lower the prediction error.
William Kew and John B.O. Mitchell, Biomedical Science Research Complex and EaStCHEM School of Chemistry, University of St Andrews, Scotland, UK, have systematically studied how the ‘wisdom of the crowds’ principle can be applied to hetero-ensemble predictors. They compared 15 machine learning methods against two different hetero-ensemble methods using a standardized workflow with a stringent validation scheme. The first hetero-ensemble was a simple linearly stacked ensemble consisting of the benchmarked machine learning methods. The second one, which is called greedy ensemble, weights the contribution of the individual ensemble members according to their performance.
None of the individual models performed best across the broad range of regression problems but the greedy hetero-ensemble performed consistently as well as, or better than, any of its individual members. Thus, the greedy ensemble makes model selection among the individual methods unnecessary, while performing at least as well as the best ensemble member. This minimizes the risk of choosing a suboptimal method.
- Greedy and Linear Ensembles of Machine Learning Methods Outperform Single Approaches for QSPR Regression Problems,
William Kew, John B. O. Mitchell,
Mol. Inf. 2015, 34, 634–647.
DOI: 10.1002/minf.201400122
This article is part of the Special Issue: Chemoinformatics in the United Kingdom