If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.


Physical Chemistry

How The Internet Ignited Modern Computational Chemistry

The power of interconnected computers has taken molecular modeling from the Pony Express to the cloud

by Elizabeth K. Wilson
August 16, 2015 | A version of this story appeared in Volume 93, Issue 32

An illustration of a brain connected to various forms of electronics.
Credit: Shutterstock/C&EN

It was only 50 years ago, but it could have been hundreds. In the 1960s, academic computational chemists shared their computer programs via a Pony Express-type service run by scientists at Indiana University, Bloomington, called the Quantum Chemistry Program Exchange.

Members learned about new software in circulated newsletters, and for a small fee, they could order the programs’ source codes, which were sent by mail on computer punch cards or magnetic tape.


◾ Distributed computing: In which numerous individual computers perform small tasks and send their results to a main computer center.

◾ Grid computing: In which infrastructures of connected clusters of computers are made available to groups, largely in academia and government.

◾ Cloud computing: In which banks of connected computer servers provide computing power, supplied by commercial organizations.

Doing the actual computational science was just as ponderous, recalls Henry Rzepa, a chemistry professor at Imperial College London. As a graduate student at the University of Texas, Austin, in the 1970s, Rzepa spent days in a dedicated computation center, wrestling with punch cards. “It was a lot of tedious, repetitious work, punctuated by the occasional discovery,” he says.

Then came the 1980s. The growing development of the Internet swiftly made such tortured, slow communication and scientific progress a distant memory. The seemingly simple act of connecting computers to one another completely transformed the computational landscape, eventually leading to today’s ability to perform molecular calculations on demand, with almost limitless computing power.

Thirty years ago, though, few laypeople had e-mail, let alone dial-up modems. But that didn’t stop academic institutions from anticipating the massive scientific paradigm shift that was about to occur.

In 1985, for example, a consortium of Dutch chemists formed the Dutch National Facility for Computer Assisted Organic Synthesis & Computer Assisted Molecular Modelling. The center developed ways to link together computers at distant facilities. After attending a 1987 conference in the Netherlands titled “Chemical Structures: The International Language of Chemistry,” attendees reported the design of a user-friendly graphics menu interface that allowed “even the novice user direct access to the module(s) of his choice.”

Chemical structure of protein binding site.
Credit: Courtesy of Natalie Tatum/Newcastle University
This simulation, produced by a cloud-based program from the Cambridge Crystallographic Data Centre, shows how an antituberculosis drug (green) might dock into a transcriptional repressor protein (gold). The measured X-ray structure of the drug (gray) is shown for comparison.

But it was the World Wide Web that really opened up the floodgates to progress in computational chemistry, Rzepa says. In 1994, Rzepa and his colleagues published a prescient paper in Chemical Communications, “Chemical Applications of the World-Wide-Web System” (DOI: 10.1039/c39940001907).

Online Volunteers Help Tackle Big Scientific Challenges, By Alán Aspuru-Guzik

When I was an undergraduate student in Mexico in the late 1990s, I was fascinated by the power of distributed computing projects, which harness many individual computers to carry out complex calculations and analysis. These projects are, in a way, the greenest form of computing. They use otherwise unused CPU (central processing unit) cycles from volunteer donors around the world.

In particular, the SETI@home project at the University of California, Berkeley, launched in 1999, was an inspiration: Over the years, this search for extraterrestrial life in radio telescope signals has been powered by the idle CPU cycles of hundreds of thousands of volunteer machines around the world.

When I became an assistant professor at Harvard University back in 2006, I was excited about the possibility of using distributed computing to run the theoretical calculations needed to discover novel materials.

So in collaboration with the IBM World Community Grid, my group and I started the Harvard Clean Energy Project (CEP), an effort to find novel organic electronic materials capable of converting sunlight into energy. Having employed more than 35,000 CPU years of computer time, CEP is now the largest computational quantum chemistry project that’s been carried out to date.

CEP and subsequent projects in my group have taught us how to more efficiently design materials. With our experimental collaborators, we have discovered new types of organic molecules for flow batteries and organic light-emitting diodes.

One of the most satisfying aspects of the project over the years has been the interaction with the project participants in the online forums. Seeing the enthusiasm for scientific discovery among the citizens of the world makes me optimistic that the Internet will continue delivering revolutionary tools that can help us tackle the scientific challenges associated with the 21st century.

Alán Aspuru-Guzik is a professor of chemistry at Harvard University

Suddenly, chemists could turn the scads of numbers—bond angles, dipole moments, and the like—they’d been using to represent molecules into two- and three-dimensional pictures. Rzepa credits in particular the open-source molecular structure viewing program Jmol for harnessing the power of the Web, “showing how you could take computational chemistry software and a Web browser and convert it into rotating pictures.”

The World Wide Web also allowed scientists to harness the power of personal computers that had entered homes en masse since the 1980s. Most computers spend a majority of their time sitting idly, their processors unused. Instead, scientists realized, these computers could be performing small tasks during their downtime, sending results to a central computing center. The collected results could then be used to solve big problems. This is a strategy now known as distributed computing.

In 1999, scientists at the University of California, Berkeley, famously launched SETI@home, in which people volunteered to use their home computers to analyze radio telescope data for signs of intelligent life elsewhere in the universe.

Around that time, Vijay Pande was just starting his career as an assistant chemistry professor at Stanford University. “I wanted to do something big,” he says. “The limiting factor in computation was the paucity of computer power.”

Pande recognized the potential for distributed computing to solve complicated computational problems in chemistry. He developed methods to break up large calculations into many small ones to predict how a protein folds. His lab launched Folding@home in October 2000.

Fifteen years later, Folding@home is still going strong, with more than 140,000 participants. It has been joined by numerous other distributed computing projects such as Rosetta@home, which predicts protein structures, and, which models climate change.

Meanwhile, in the 1990s, academicians and governments began connecting large, geographically distant computer clusters, creating so-called grids. Grids could be used by many different groups and gave scientists unprecedented computing power without having to build their own supercomputing facilities.

These grids’ more commercial cousin, what is now called “the cloud,” also makes use of large systems of linked computers. Unlike systems of linked supercomputers, which require time sharing, the cloud is a tremendous flexible resource, providing as much on-demand computing power for as long as it’s needed. Largely run by companies such as Amazon or Google, the cloud offers even less technological commitment on the part of a scientist. Pharmaceutical companies have embraced the cloud, purchasing cloud computing time to search drug databases or to perform docking calculations on compound libraries.

Today, most scientists, even academic researchers, agree the future of computational chemistry lies largely in the cloud.

Paul Davie, who manages the Cambridge Crystallographic Data Centre’s site at Rutgers University, sees access to the cloud as a “game changer” for smaller biotech companies. In the cloud, these companies have at their hands a wealth of computing resources without having to invest in a large computer. “It’s like renting a good hotel room instead of buying a house,” he says.

Initially, Davie says, pharmaceutical and biotech companies balked at the idea of the cloud, in part because of security concerns. Then, they realized that companies such as Amazon have invested a tremendous amount in security. “Their reputation depends on security resources,” Davie says. “I think that’s been accepted.”

Of course the cloud can’t solve every chemical problem. Some types of problems, such as lengthy molecular dynamics simulations, will always require frequent communication among the speedy processors of supercomputers.

Still, academic chemists, Pande says, are also realizing the benefits of the cloud’s instant availability for short-term projects. “Universities don’t build their own phone systems,” Pande observes, so “there’s no reason to put together their own [computer] clusters—especially when companies are doing it at extremely low cost.” ◾


This article has been sent to the following recipient:

Chemistry matters. Join us to get the news you need.