ChatGPT Used for Text Mining of MOF Syntheses in the Literature

ChatGPT Used for Text Mining of MOF Syntheses in the Literature

Author: ChemistryViews

Large language models, such as the GPT series of models used in ChatGPT, are trained using large amounts of text and can predict the probabilities of series of words in a given language. This can be used for a variety of applications, e.g., to generate a probable text output based on a user input. The chemical literature also contains vast amounts of text, and performing a comprehensive literature review and extracting useful data and insights for a specific application quickly can be challenging. Large language models could help with this issue.

Omar M. Yaghi, University of California, Berkeley, USA, and King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia, and colleagues have used ChatGPT to automate text mining and quickly create datasets on difficult-to-aggregate research about metal-organic frameworks (MOFs). The team curated 228 relevant peer-reviewed research papers, and then used ChatGPT to process the relevant sections in the papers and to extract, clean up, and organize the data. ChatGPT successfully extracted 26,257 distinct synthesis parameters for ca. 800 MOFs reported in the selected research articles. It mined the synthetic conditions of the MOFs with high accuracy and very quickly.

The extracted datasets can then be used to inform predictive models, which might help chemists to develop new MOFs. Using the data gathered by text mining, the team created a machine-learning model that achieved 87 % accuracy in predicting MOF experimental crystallization outcomes. According to the researchers, the text-mining approach can be easily transferred to other contexts with minimal coding knowledge. Further exploration of large language models for AI-assisted chemistry could, thus, be useful for accelerating research.


Leave a Reply

Kindly review our community guidelines before leaving a comment.

Your email address will not be published. Required fields are marked *