Issue Date: May 16, 2011
Learning To Love The Cloud
The phenomenon of cloud computing—essentially outsourcing companies’ or facilities’ Internet-like services to do jobs that were once handled by supercomputers or smaller clusters of on-site computers—has swept the pharmaceutical industry.
The rest of the scientific community, however, has been slower to embrace the cloud, questioning whether it can be an effective tool for computational research. Academic and government scientists have been wary of the significant alterations they’d need to make to their software codes, which were originally designed to be used on clusters or supercomputers.
And a cloud, which could contain tens of thousands of individual processors, serves some types of computational problems better than others. It lends itself well, for example, to the simultaneous screening of millions of compounds for drug discovery. But for other problems that require a lot of communication between processors, such as calculating the electronic structure of large molecules, the cloud performs poorly compared with in-house clusters.
The many traditional bottlenecks to swift, unfettered computation include machine access, communication between processors, and cost. The cloud, with its instant availability, almost limitless processors, and low overhead, offers potentially greater data management capacity, speed, and lower cost.
But at least for now, the applications for chemists are limited to very specific tasks, says Shane Canon, who directs research on Magellan, a cloud test facility at Lawrence Berkeley National Laboratory’s National Energy Research Scientific Computing Center. Magellan also serves as a cloud test bed project at Argonne National Laboratory’s Argonne Leadership Computing Facility. It was funded through the Department of Energy by the American Recovery & Reinvestment Act of 2009 to the tune of $32 million.
Still, “the response from the science community was tepid,” says Armando Fox, an adjunct associate electrical engineering and computer sciences professor at the University of California, Berkeley. But proponents of cloud computing, including Fox, are working to change that.
Fox recently described the scientific promise of the cloud in Science (DOI: 10.1126/science.1198981), noting that new cloud-computing hardware is more compatible with scientific ap plica tions.
Marina G. Guenza is one of the few but committed cloud devotees in the academic chemistry community. “Cloud computing mimics the easy access for the user that is typical of an Internet website, where the user is not aware of which specific machine has the program loaded and running,” says Guenza, a chemistry professor at the University of Oregon who was an architect of the university’s Applied Computational Instrument for Scientific Synthesis. ACISS, as the cloud test facility has been dubbed, was built with a nearly $2 million grant from the recovery act.
Scientists from all corners of the university, including chemists, physicists, and biologists, have applied for access to the ACISS cloud. Guenza thinks the facility will allow her lab to perform molecular dynamic simulations of coarse-grained macromolecular systems with an increasing number of molecules and different timescales. These simulations, she adds, “are useful for the study of important systems with applications in emerging technology and in biology.”
“We envision ACISS becoming a major tool for our lab,” Guenza says. “Through ACISS, our trajectory data and codes will be easily shared among the members of our group and remotely with our collaborators.”
Geraldine Richmond, a University of Oregon chemistry professor who is also participating in ACISS, performs molecular dynamics simulations of molecular processes at liquid surfaces. Her lab’s computational methods—which include a mix of classical, ab initio, and density functional theory—have very high central processing unit (CPU) and memory requirements.
“We are delighted to have access to the ACISS cloud,” Richmond says. “For my research group, the cloud will allow for much higher throughput,” she says. “The cloud will allow our simulations to run at much higher speeds than our current computer cluster, enabling us to get data for our research that was previously unobtainable due to computational limits.” In fact, she says, if her lab were given the time and access, the cloud could replace her lab’s existing computing cluster. “Currently, we are limited by available computational resources,” she says.
In addition to government programs, large companies such as Microsoft, Google, and Amazon are encouraging scientists to experiment with the cloud’s potential by funding cloud test bed programs and offering grants. The recently announced Google Exacycle for Visiting Faculty grant program, for instance, offers university researchers access to the company’s cloud for intensive computational research.
But the phenomenon has been relatively slow to catch on. A few computational chemistry software producers, such as NWChem and Schrödinger, are tailoring their products to allow computation in the cloud. But right now, cloud computing makes the most sense for scientists, whether commercial or academic, who need to perform large but infrequent bouts of data crunching.
“Our analysis shows that where cloud systems are going to be competitive is in a situation where there’s a user that has large needs that are very infrequent,” Canon says. “If workloads are extremely ‘bursty,’ it’s going to look attractive. The cost of acquiring and operating a large system is expensive,” he says.
“You can think of cloud as a form of outsourced information technology that you can access at whatever reasonable scale only at the moment you want it,” elaborates Peter S. Shenkin, vice president of Schrödinger. “Imagine that twice a year you want to screen a list of a million drug candidates against a target, a project that’s going to take over 100 CPUs for a day. If you’re using the cloud, you don’t have to build a data center or worry about a system administrator. You just pay for computational use as you need it.”
Cloud purveyors, recognizing the potential boon of luring scientists to their services, are also now attempting to configure their systems for high-performance computing. Companies such as Amazon and SGI are now offering access to clouds that contain graphics processing units, or GPUs. Once the purview of video gamers, GPUs, with lightning calculation speeds afforded by their high number of processors, are now a popular tool for computational scientists (C&EN, Nov. 1, 2010, page 27).
“Systems like SGI’s Cyclone cloud behave essentially like a cluster, but it’s offered in a cloud model,” Canon says. “On those systems, the performance is probably going to be what you get on traditional clusters.”
Shenkin notes, for example, that some problems, such as molecular dynamics simulations, in which participating computers need to check in with others regularly, do not always run efficiently on the cloud. But the increasingly prevalent ability to run such a simulation on a single computer equipped with fast GPUs makes it more feasible to conduct such simulations on the cloud.
Using commercial software in the cloud also has the potential to circumvent some of the complications of licensing, Shenkin explains. Many companies, including Schrödinger, use “floating licenses” for their software, which allows the customer a certain number of uses under that license. If, for example, a customer has a license to use Schrödinger’s docking program Glide 10 times, and it has a large job, it could perform 10 simultaneous jobs on 10 different computers. But dividing the screening of, say, 2 million compounds into only 10 jobs means the calculation will take a month to complete.
“And some companies have only one license,” Shenkin says. It could take such a company nearly the whole year to do the same screening job on a single computer with a single license. But with the cloud’s thousands of available processors, the same company could get that year’s worth of computation power by running 365 processors for a day, or one processor for a year, or anything in between, at the same cost. Shenkin says Schrödinger’s partnership with cloud middleware purveyor Cycle Computing has already attracted several clients.
Fox envisions that more scientists will eventually embrace the cloud as the architecture evolves to make it more compatible with doing science. “I understand that many scientists are skeptical that the cloud will replace traditional computing,” he says. “But for every researcher doing big science projects, there are a hundred doing medium-sized projects who wish they had the resources of the cloud. I think this is going to create a new middle class.”
- Chemical & Engineering News
- ISSN 0009-2347
- Copyright © American Chemical Society