Credit: Jason Alden | At Imperial College London, a new PhD-training program will teach students like Oliver Newton (left) and Lenka Cuprova to use high-tech synthesis equipment that is already common in industry.
High-tech synthesis equipment is relatively common in the pharmaceutical industry, where researchers use high-throughput robotic platforms to optimize reaction conditions or screen catalysts. But these facilities are still rare in academia, and many PhD students in synthetic chemistry have little or no experience with the technology. Academic programs are now aiming to close that skills gap and redefine what it means to be a synthetic chemist. One of these programs is based at the Centre for Rapid Online Analysis of Reactions (ROAR), an automated synthesis suite that opened this year at Imperial College London. ROAR’s launch is part of a broader drive to transform synthetic chemistry into a truly data-driven discipline and may herald a time when automated synthesis suites are as ubiquitous at universities as nuclear magnetic resonance facilities.
A stint in process development at BASF, a role analyzing Australia’s natural gas supply chain, and a summer in private equity. It’s not your typical résumé for a chemistry student, but Filip Horváth-Gerber is not your traditional chemist. And the PhD he started this month is not your traditional chemistry program—at least not yet.
The EPSRC Centre for Doctoral Training (CDT) in Next Generation Synthesis and Reaction Technology at Imperial College London is a new, 4-year PhD program that aims to equip its students for the future of synthetic chemistry. Time-honored lab staples like round-bottom flasks and chromatography columns are out. Instead, students will make molecules using high-throughput robotic platforms and continuous-flow reactors. They will study reactions in real time, using cutting-edge analytical instruments to record the waxing and waning of chemical intermediates. And they’ll learn to wrangle the huge data sets this work generates.
Much of their training will happen at Imperial’s £4.7 million ($6.1 million) Centre for Rapid Online Analysis of Reactions (ROAR), a pristine suite of automated synthesis equipment that opened earlier this year. “The facility brings everything together: the chemistry, the chemical engineering, and the data science,” Horváth-Gerber says.
High-tech synthesis suites such as ROAR’s are now relatively common in the pharmaceutical industry. But they are still rare in academia, and many PhD students in synthetic chemistry have little or no experience using this equipment. “There is definitely a skills gap between academic training and the experimental methods used routinely in industry,” says Steven Raw, associate director of process chemistry at AstraZeneca and a member of ROAR’s advisory board.
“On the whole, the UK is pioneering this approach” to training, says the University of Southampton’s Richard J. Whitby. Whitby leads Dial-a-Molecule, a UK network that since 2010 has been coordinating data-driven synthesis efforts, and ROAR is its flagship facility.
What makes ROAR unique is that it also operates as a user facility for the whole chemistry community. As for a synchrotron or a telescope, any researchers can bid for time at ROAR and work with technicians there to run experiments on the equipment. Contrast this with one-off labs at universities around the globe where scientists are building custom automated synthesis systems, or with automated synthesis suites at research institutes that are exclusively for in-house staff.
Mimi Hii, ROAR’s director, hopes that open facilities like hers will help transform synthetic chemistry into a truly data-driven discipline. Automation enables researchers to gather and share huge amounts of data about chemical reactions, which can then be used to optimize processes and even forecast the outcomes of entirely new reactions. ROAR may therefore offer a glimpse of what synthetic chemistry is poised to become, and it heralds a time when automated synthesis suites will be as ubiquitous as university nuclear magnetic resonance facilities. “The ambition is that in 10–20 years, every chemistry department should have something like this,” Hii says.
Since the 1960s, chemists have contemplated the idea of a universal synthesis machine—a wonder device with smart software that could compute how to build any given molecule, and robotics to execute the synthesis (Nature 2019, DOI: 10.1038/s41586-019-1288-y). “In principle, you would just draw the molecule and press Make on your app,” says Ian W. Davies, who is principal of the Princeton Catalysis Initiative and was previously involved in pioneering high-throughput synthesis techniques at Merck & Co. A multitalented robot chemist may still be a distant dream, but scientists are making progress toward its becoming a reality.
Robotic systems that run multiple simultaneous reactions have been widely used in industry since the 1990s to optimize reaction conditions or screen catalysts, for example. Their robotic arms dispense reagents into racks of vials or into plates containing up to 1,536 individual wells, which serve as miniature reaction vessels. These systems are certainly fast, but they still need a lot of tending by human acolytes.
Meanwhile, various computer programs are increasingly adept at suggesting plausible sequences of reactions to make a target molecule. Last year, for example, MilliporeSigma released Synthia, a program that dissects a target molecule and then offers a synthetic route for stitching it back together, a process known as retrosynthesis. (Synthia evolved from Chematica, a system developed by Bartosz A. Grzybowski at the Ulsan National Institute of Science and Technology.) And in July, CAS, a division of the American Chemical Society, launched its own computer-aided retrosynthesis planner as part of SciFindern, a workflow tool based on CAS’s chemical database. ACS publishes C&EN.
These programs are still works in progress—retrosynthetic route prediction is certainly not a solved problem. And predicting exactly how to run the reactions proposed by such programs is vastly harder. Each step in a calculated synthetic route will have different optimum conditions, including temperatures, pressures, solvents, catalysts, and other reagents. Right now, “there is virtually no way to predict exactly how to get a reaction to work successfully at the first attempt,” says Matthew Gaunt, who leads the CDT in Automated Chemical Synthesis Enabled by Digital Molecular Technologies (SynTech) at the University of Cambridge.
The answer, many researchers believe, is more data—and lots of it. Researchers have previously tried to train machine-learning algorithms by feeding them data from the chemical literature, but this comes with a lot of drawbacks. For starters, much of the information in a published chemistry paper is not in a machine-readable format, and often it is not linked to the underlying raw data. Published chemistry also tends to be highly biased toward conditions that scientists have previously determined to work for a particular reaction (Nature 2019, DOI: 10.1038/s41586-019-1540-5). All too often, chemists don’t quantify details such as a room’s temperature or the exact time that a reaction took to be completed. And chemists have a bad habit of not providing details about reactions that did not produce the outcome they hoped for, leaving an enormous amount of potentially useful information unpublished, further skewing a computer’s training set. “Synthetic chemists in academic labs are not collecting the right data and not reporting it in the right way,” says Benjamin J. Deadman, ROAR’s facility manager.
Many researchers argue that to make progress in computer-assisted reaction planning, algorithms need to be trained on fresh, high-quality data sets that record the outcomes of hundreds or thousands of iterations of the same reaction, each run with slightly different substrates or conditions. “Right now, though, there are very few robust training sets,” Princeton’s Davies says. “If you can make more of those data sets available, that would really help things move along in this area.”
That’s where automation comes in. High-throughput systems are often seen as a way to simply do faster chemistry, but their real value is in generating vast numbers of accurate data points. By analyzing these data, statistical tools could identify the optimum reaction conditions for a particular substrate, and machine-learning algorithms could tease out trends to make accurate predictions about the best ways to make new molecules. For example, Abigail G. Doyle’s team at Princeton University last year used high-throughput techniques to generate a data set containing 4,608 iterations of a palladium-catalyzed C–N coupling reaction. Then the researchers applied the data to train a machine-learning algorithm, which accurately predicted the yields of reaction conditions for entirely new substrates (Science 2018, DOI: 10.1126/science.aar5169). Meanwhile, the Massachusetts Institute of Technology is partnering with pharmaceutical and other companiesin the Machine Learning for Pharmaceutical Discovery and Synthesis Consortium, which aims to apply data libraries and machine learning to drug-discovery problems.
This area is a major frontier in synthesis, but not many academic labs have the equipment or skills to carry out the work. State-of-the-art automated synthesis machines can cost hundreds of thousands of dollars, and few individual labs have the resources to invest in them. Also, synthetic chemists have traditionally not been trained to handle big data sets and tend to be apprehensive about adopting new technologies. “People in academia just feel that this technology is inaccessible,” says Timothy Cernak at the University of Michigan, who previously worked on technology-enabled synthesis at Merck & Co.
ROAR isn’t going to solve these problems on its own, but Hii hopes it can help break down the barriers. “It’s about changing the mind-set of the whole community,” Hii says.
Rather than assembling its own tailor-made automated synthesis platforms, as some academic researchers are doing, ROAR has taken a different approach. It is furnished with commercial machines that are commonly found in industry labs. The idea is that these systems are better suited to support researchers with a broad range of needs, whatever their levels of expertise. Novice users have less of a hurdle with off-the-shelf technologies, says Jason Chen, scientific director of Scripps Research’s Automated Synthesis Facility in California, which has been serving in-house researchers since 2017. “If someone just needs a tool to solve their problem, then they’re more willing to try if it’s a known quantity.”
ROAR’s suite includes high-throughput robotic reaction platforms from Unchained Labs, which can weigh reagents with an accuracy of 0.1 mg and dispense liquids and solids into racks carrying up to ninety-six1 mL vials. One machine boasts a row of eight small pressurized vessels that can operate at 200 °C and 28 bar and may be sampled at any point during a reaction without altering the pressure.
The equipment takes months of training to use properly, so Deadman and his colleagues are on hand to plan the setup and provide guidance to visiting researchers on how to get the most out of the facility during a typical weeklong stint.
The machines run small-scale, highly parallel versions of the processes used in traditional chemistry labs, where reactants are mixed in a vessel to produce a discrete batch of product. “We’ve spent a decade miniaturizing high-throughput batch chemistry so we can run more combinations with similar amounts of material,” says Neal W. Sach, an associate research fellow at Pfizer who has worked in high-throughput experimentation for the past decade. He says that the smallest batch reactor vials now hold just 100 μL of solvent—any smaller and the liquid would evaporate in the blink of an eye.
This practical limit is prompting some teams to shift to flow systems, in which reactions happen continuously as reagents are pumped through tubes, producing a steady stream of products. Sach’s group has demonstrated a flow system that could run a series of 5,760 reactions in just a few days, using only about 50 μg of substrate in each reaction—two orders of magnitude lessthan a batch system would allow (Science 2018, DOI: 10.1126/science.aap9112).
Flow reactors are also well suited to study the impact of fluctuating variables like temperature and pressure, and they can work safely at higher temperatures and pressures than batch reactors, Deadman says. ROAR has several continuous-flow reactors, such as ThalesNano’s Phoenix, which can operate at up to 450 °C and 200 bar. During his PhD studies, Horváth-Gerber will use ROAR’s flow reactors to study the safety aspects of scaling up reactions for continuous manufacturing.
The final part of ROAR’s lab hosts equipment that can analyze individual reactions in real time. Machines like the EasyMax 102 can precisely control conditions including temperature, pH, and stirring rate, while various probes use infrared spectrometry, conductivity, and other measurements to monitor how the reaction mixture changes over time. The information collected is vital for understanding the kinetics of a reaction—in other words, how changes in these parameters affect the rate—which can help reveal the overall reaction mechanism.
Benjamin M. Partridge at the University of Sheffield was one of the first external researchers to bring his research to ROAR. He is developing a copper-catalyzed reaction to transform alkyl boron compounds into alkylamines, potentially offering a milder synthetic route to the amines than the traditional methods. He’s using ROAR’s tools to understand the kinetics of the reaction so he can fine-tune his catalyst. “This facility is fantastic,” he says. “It gives me access to a really good suite of analytical instruments that I don’t have dedicated access to. Plus I have the expertise of the technicians, who have essentially taught us how to perform these experiments.”
Imperial College has provided £1.1 million of ROAR’s funding, while industry partners have chipped in £800,000 in cash and instruments. ROAR also won a £2.8 million grant from the UK’s Engineering and Physical Sciences Research Council (EPSRC), which means that academic users like Partridge do not have to pay for access, although industrial users are expected to contribute. Hii is also developing collaborations with academic researchers in other countries, including South Africa, to enable even wider use of the facility.
Users get access to ROAR by submitting proposals to a review committee, which awards instrument time according to scientific merit. The facility’s current funding runs through the end of 2020, and Hii aims to build up case studies of ROAR’s work to win further support.
ROAR will also make all the data generated by academic users at the facility publicly accessible 1 year after they are collected. Users must agree to this stipulation when they sign up for the facility. “It’s about democratizing the data,” Hii says. She has already been contacted by researchers who are eager to get their hands on ROAR data so that they can advance their own synthesis-planning algorithms.
“To the best of my knowledge, the ROAR model is unique,” Scripps’s Chen says. A handful of similar university department facilities exist in the US—at the California Institute of Technology and the University of Pennsylvania, for example—but none has the same remit to serve as an open-access community facility. He sees ROAR as a positive step in expanding the use of automation beyond a privileged few labs to a wider range of outside users. “They’ve got great equipment and great staffing,” he says. “I really hope they succeed, because it’ll be good for the field.”
ROAR may not be unique for long. The US National Center for Advancing Translational Sciences (NCATS, part of the National Institutes of Health) has various initiatives aimed at advancing the use of technology in synthesis, with an eye toward accelerating drug development. Several researchers familiar with its programs say that NCATS is considering creating a national facility for automated synthesis that would improve researchers’ access to these tools. Meanwhile, Gaunt says that the University of Cambridge is planning an automated synthesis facility that will be accessible to external researchers from academia and industry.
Interestingly, Gaunt’s SynTech CDT aims to attract not just chemistry students but also engineers and data scientists. “We want to train chemists in new techniques for small-molecule synthesis and teach engineers and data scientists that small-molecule synthesis has a lot of challenges to offer for their areas of research,” Gaunt says.
This multidisciplinary approach is a key part of the CDT model, which for the past decade has been one of the three routes for doctoral training in the UK. Rather than focusing on a traditional 3-year apprenticeship to a single supervisor, a CDT offers broader scientific research training with multiple supervisors, often focusing on interdisciplinary collaboration.
Graduates from CDTs like SynTech who are literate in synthesis, machine learning, and computation should be well placed to join the multidisciplinary research teams already found in industry. “We’re confident these students are going to be snapped up like hotcakes” Gaunt says.
The students graduating from the new CDTs will also help shape a field that still faces some daunting challenges.
Chemists are just beginning to teach machine-learning algorithms how to understand the chemical quirks of different molecules, for example. To do this, they use dozens of chemical descriptors, often generated computationally, such as bond lengths, dipole moments, and vibrational frequencies. Together, these provide a fingerprint of the reactivity of a molecule that not only helps predict the outcome of a particular reaction but can also reveal details about its mechanism. At the moment, though, nobody really knows what the right descriptors for any particular reaction will be, Gaunt says.
The reaction data used to train these algorithms are also being presented in a variety of different formats, making it harder to share the information with other groups. “There is absolutely no standard format,” says the University of Southampton’s Whitby.
Still, Whitby says, the community is starting to grapple with this problem. Last year, for example, the Pistoia Alliance, a nonprofit collaboration founded by AstraZeneca, GlaxoSmithKline, Novartis, and Pfizer, released the Unified Data Model, a file format for collating information about chemical reactions.
This lack of standardization is not helped by the proliferation of bespoke automated synthesis systems scattered among universities. “People have to work towards more unification—we can’t have 100 different labs developing 100 different systems and expect everyone to adopt them,” says Stephen G. Newman of the University of Ottawa, who works on automated synthesis.
Meanwhile, commerical automated synthesis systems are still evolving. Hii notes that it would be helpful for flow systems to gather more fine-grained data, for example, so that they are able to analyze reaction mixtures at time points less than 1 s apart. She says ROAR is providing feedback to instrumentation companies to suggest potential improvements to their equipment.
Finding better ways to unite software and hardware could even help develop self-optimizing robotic systems, with the results feeding back into reaction-planning software to inform the next round of experiments. “That cycle is not closed yet,” Sach says. “But it will be, for sure.”
Ultimately, expense still remains a significant barrier to wider adoption. “The equipment either has to become so good that it becomes necessary to use, or it has to become cheaper and user friendly,” Newman says. “The bar is really high for technology to become commonplace in every single lab.”
But if the technology does become a standard fixture in academic departments, it will be vital to have a new generation of chemists with the skills to exploit it fully. The reality is that even with advanced technology, synthetic chemistry is still hard, Cernak says. “If you run 96 reactions and something goes wrong, you’ve got 96 problems to solve.”
Still, Scripps’s Chen is optimistic that the obstacles in laboratory automation and data processing can be overcome. “There is zero doubt in my mind that if you look at the success of high-throughput teams in industry, a data-driven approach is here to stay.”
Mark Peplow is a freelance science writer based in the UK.