Machine Learning Approaches for Predicting Catalyst Performance

Machine Learning Approaches for Predicting Catalyst Performance

Author: Johannes Voss, Vera Köster

Predicting catalyst performance under reaction conditions is challenging due to the complexity of catalytic surfaces, their in-situ evolution, various reaction paths, and the presence of solid-liquid interfaces in electrochemistry. Dr. Johannes Voss, SLAC National Accelerator Laboratory, SUNCAT Center for Interface Science and Catalysis, Menlo Park, CA, USA, and colleagues show how relatively simple machine-learning models can be found.

The team demonstrates the predictive power of their strategy with experiments on scandium-doped antimony oxide as a catalyst for the oxygen reduction reaction (ORR). Platinum-based systems are currently the most efficient ORR catalysts. The researchers achieve a modest increase in ORR onset potential over the undoped oxide.


What did you do?

Our team applied machine learning to discover human-interpretable models for the performance of catalysts composed of nonprecious elements for the oxygen reduction reaction (ORR).

We then derived strategies to improve the performance through changes in the catalyst composition and correspondingly synthesized new catalysts.

Finally, we experimentally confirmed the machine-learning predicted, improved performance of these nonprecious transition-metal antimony oxide ORR catalysts.


Why are you doing this?

Platinum-based systems are currently the most efficient catalysts for the ORR, and efficient ORR catalysts are of crucial importance for fuel-cell technology. Replacing platinum with earth-abundant elements will reduce the cost of these catalysts, and transition-metal antimony oxides are one class of materials with promising ORR performance. The catalyst performance, however, requires further improvement.

Atomic-level computer simulations of model catalysts are a powerful tool for understanding catalytic reactions and for screening for new catalyst materials. Real catalysts, in particular supported oxides, display structural complexity and dynamic changes during reaction that are often too complex to be simulated at an atomic or electronic level.

We have, therefore, employed machine learning to bridge between measured ORR performance on practical catalysts and simple atomic-level simulations. Our aim here was to find models that, based on catalyst composition-dependent electronic and atomic properties and experimental conditions, will predict and give insight into realistic catalyst performance.


What is new and cool about your work?

Our machine-learning work treated simulated atomic-level information on the same footing as the more macroscopic parameters of the experimentally tested catalysts. With only small amounts of experimental training, testing, and validation data, we were able to search for models with predictive power by employing the technique of genetic programming. We algorithmically enforced mathematical simplicity and compatibility of physical units and were able to interpret and rationalize the discovered models. Arriving at these models and using them to design and then test new catalysts required an integrated effort of data science, experiments, and atomic-level computer simulations.


What is the main significance of your results? What are your key findings?

Our results show how important it is for modeling supported oxide catalyst performance to consider not only atomic and electronic metrics of the catalyst particles, but also macroscopic parameters, e.g., of the conductive support. Only when considering such a full range of parameters does it make sense to compare the performance of different supported oxide catalysts and to develop models to predict and improve their performance.


What is the longer-term vision for your research?

We learned how useful interpretable machine learning and physical constraints are for model discovery in heterogeneous catalysis even with only small amounts of data.

Considering this work on transition-metal antimony oxides a prototype study, we will extend our efforts to integrate experimental and simulated catalysis data to other catalysts and reactions, and will expand our sharing of catalysis models and integrated data on our platform is a web platform designed for computational catalysis research, offering thousands of reaction energies and barriers derived from density functional theory (DFT) calculations on surface systems


What part of your work was the most challenging?

Understanding the influence of the many parameters governing performance is key for developing new catalysts. To enable such understanding for practical catalysts, experiments spanning a range of these parameters are necessary. Here, not only data on performant catalysts are needed, but data on poorly performing catalysts are relevant for developing predictive models, too.

Unless the experiments can be performed and carefully characterized by a self-driving laboratory, the amount of experimental catalysis data will typically be limited in comparison to the much more readily automated computer simulations. Limited amounts of data render the use of machine-learning techniques difficult. Our solution to this problem was to employ a constrained model search by using genetic programming, which allowed us to guide the algorithm to identify mathematically reasonable models and avoid overfitting.


In short, what is the advantage of combining theory, experiment, and data science?

I believe that tightly integrated efforts of theory, experiment, and data science will be increasingly important in the field of heterogeneous catalysis due to its inherent complexity and the resulting limitations to new insights that these approaches would often yield without their integration.


Thank you for these insights.


The article they talked about


Johannes Voss is staff scientist at SLAC National Accelerator Laboratory. He leads the data science and electronic structure method efforts at the SUNCAT Center for Interface Science and Catalysis.


Leave a Reply

Kindly review our community guidelines before leaving a comment.

Your email address will not be published. Required fields are marked *