Researchers have trained ChatGPT to create a chemistry lab assistant that summarizes information about synthesis from papers with high accuracy (J. Am. Chem. Soc. 2023, DOI: 10.1021/jacs.3c05819). In particular, this program extracts over 26,000 parameters from peer-reviewed articles and supporting information about metal-organic frameworks (MOFs). Once trained, the interactive chatbot is able to answer questions about the preparation of MOFs quickly and accurately.
“We’ve always been interested in simplifying and speeding up chemical synthesis,” says Omar Yaghi from the University of California, Berkeley, lead author of the study. The ChatGPT models mined the supporting information of hundreds of MOF papers, where information on synthesis is unstructured and sparse, often extended over hundreds of pages. Thus, “we developed a filtering strategy that excludes the least relevant sections—like references, crystal coordinates, acknowledgments—increasing the efficiency,” adds Yaghi.
Large language models, like ChatGPT, can be prone to what are called hallucinations. These are responses that seem correct, but aren’t. The team minimized the emergence of misleading affirmations with careful prompt engineering. “It’s a means of training ChatGPT,” says Yaghi. “We ensure the prompt contains information . . . to help improve the response.” This approach advises the algorithm to recognize uncertainty, rather than fabricating fake answers. For example, when asked about a MOF not present in the training database, the program will simply say: “I do not know.” Additionally, this iterative approach helps researchers get structured responses such as bulleted lists and step-by-step synthesis of many MOFs, which reference the correct sources.
This process of curation could have taken a chemist months, but ChatGPT scans and registers synthetic procedures in a fraction of the time, says Yaghi. “It only takes one minute per paper.”
The team envisions researchers applying their publicly available ChatGPT model to other fields of chemistry, after training it with the relevant papers and datasets. Eventually, the chatbot could predict the outcome of chemical reactions, or propose potential synthetic routes leveraging its knowledge and understanding of the chemical space.