Advertisement

If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.

ENJOY UNLIMITED ACCES TO C&EN

Physical Chemistry

Turbocharging Computers

New computer architectures, algorithms, and hardware increase computing power for computational chemists

by ELIZABETH K. WILSON, C&EN WEST COAST NEWS BUREAU
September 27, 2004 | A version of this story appeared in Volume 82, Issue 39

HOMEBOUND
[+]Enlarge
Credit: COURTESY OF V. PANDE
The PC grid folding@home calculated ligand binding energies for the protein FKPB.
Credit: COURTESY OF V. PANDE
The PC grid folding@home calculated ligand binding energies for the protein FKPB.

Back in the day, a supercomputer was a big, self-contained box, usually built by Cray or IBM. It cost an arm and a leg, and groups of scientists had to fight for precious use time.

Today, high-performance computing machines come in all sizes and designs, from the more traditional supercomputers like IBM's Blue Gene to the wildly popular and cheap Linux clusters, and more recently, huge, loosely connected "grids" of desktop PCs. Computer use time is still coveted, but scien tists are more likely to have a high-performance machine at their own company, in their own department, or even in their own lab.

And as technology advances, demand increases, whether the problems involve molecular kinetics or thermodynamics. Particularly in the pharmaceutical industry, researchers want and need ever-more-turbocharged computing power to model the behaviors of great libraries of compounds or to determine how proteins fold.

To keep up with the need for faster and more accurate computing, scientists are putting more muscle--and brains--into their machines.

"As large leaps are made in computing power, we are able not only to do longer and bigger calculations, but we also have the opportunity to reevaluate our basic approach to a scientific problem," said Wendy Cornell, a director in the molecular systems group at Merck. She was coorganizer of a symposium on the topic that was held last month at the American Chemical Society national meeting in Philadelphia. The symposium was sponsored by the Division of Computers in Chemistry.

Dozens of chemists and computer scientists turned out to hear speakers from all walks of high-performance computing. Attendees learned that computers are being beefed up with new chips that accelerate calculation speed and new algorithms that replace age-old strategies by solving problems with fewer calculations. New physical architectures and software are poised to take advantage of massively parallel machines like Blue Gene. And the Linux cluster continues to evolve.

BLUE GENE PREP
[+]Enlarge
Credit: IBM IMAGES
Simulations of rhodopsin embedded in a lipid bilayer (top) test the capabilities of Blue Gene prototypes and of Blue Matter software; IBM simulations of the trpzip2 protein (bottom) helped solve an experimental mystery.
Credit: IBM IMAGES
Simulations of rhodopsin embedded in a lipid bilayer (top) test the capabilities of Blue Gene prototypes and of Blue Matter software; IBM simulations of the trpzip2 protein (bottom) helped solve an experimental mystery.

In just the past five years, Linux clusters have become major supercomputing players. Consisting of groups of inexpensive machines cobbled together, clusters in many ways have supplanted traditional supercomputers. As the name implies, clusters run on the open-source operating system Linux. Many pharmaceutical companies have at least one large cluster, ranging from tens to hundreds of machines. Vendors are now building machines such as blade servers to facilitate cluster building.

CLUSTERS, THOUGH, are not without their problems. They're bulky, they get hot, and they take a lot of effort to maintain. The individual machines, known as nodes, still can't talk to each other as quickly as nodes in a traditional supercomputer--although new technologies are narrowing this gap. Keith Milligan, scientific systems administrator at Locus Pharmaceuticals, ran into numerous challenges as he and his colleagues tried to triple the size of their company's cluster. As a case in point, a program they used for job scheduling, known as OpenPBS, balked at dealing with any dead or dysfunctional machines and sometimes refused to schedule jobs at all. "We discovered fairly early on that once you expanded clusters from 300 to over 1,000 machines, OpenPBS couldn't scale up," Milligan said.

Milligan, along with Matthew Clark, director of scientific computation, set out to rework the cluster's architecture, settling on a cluster architecture they call Titan and replacing OpenPBS with a Sun grid engine. Additionally, a program called Run Manager, developed at Locus, allows scientists to submit and manage their simulation jobs to the computer using a Web browser. "Computational chemists love it," Milligan said.

The PC grid is a relative newcomer to the high-performance computing world. Most desktop computers sit idle at times. Tapping this downtime to perform tasks was an idea researchers had "just an inkling about five years ago," said symposium coorganizer Vijay S. Pande, chemistry professor at Stanford University.

The individual processors inside today's desktop computers are as fast as those in supercomputers. What can be ramped up is the number of processors and the speed with which they talk to each other. "Those are the variables you can play with," Pande said. PC grids exploit the former.

Pande's folding@home project exemplifies the PC grid concept: More than 150,000 participating desktop computers around the world are connected through the Internet, simulating the folding of small proteins. Though this grid has far more processors than a typical 1,000-processor supercomputer, there's a trade-off in communication between the processors, which is much faster on a supercomputer.

Additionally, you can't just throw more processors at a problem to make the computer solve it faster. Some problems are difficult to break up into little pieces that can run independently of each other. "It's hard to split a problem involving 1,000 atoms among 100,000 computers," Pande said. He likens it to having a baby in one month by splitting the task among nine women.

Pande's group has devised an approach to partitioning an element of a calculation among numerous processors, exploiting the randomness of individual fluctuations. In this way, many short simulations can provide almost as much information as one long simulation.

CLEARLY PARALLEL
[+]Enlarge
Credit: CLEARSPEED PHOTO
ClearSpeed's CS301 chip with dozens of parallel processors accelerates the performance of desktop PCs.
Credit: CLEARSPEED PHOTO
ClearSpeed's CS301 chip with dozens of parallel processors accelerates the performance of desktop PCs.

For example, consider the drug target protein FKBP. Molecules that inhibit FKBP are immunosuppressants and therefore potentially valuable in preventing transplant organ rejection. There's lots of experimental data on FKBP, so it was a good system for Pande's group to use to test their methodology. They calculated the free energy of ligands binding to FKBP using folding@home, and their results agreed well with experiment.

One big barrier to simulating biomolecular dynamics is timescale. Proteins fold on the order of microseconds or even milliseconds. But to resolve bond vibrations, you need time steps on the order of femtoseconds. To simulate a single folding event would require billions of time steps, and you'd then need to run hundreds of those to get adequate statistics. "This is really the reason protein folding is so hard to study," said William C. Swope, a computational chemist at IBM Almaden Research Center in California.

IBM is gearing up to complete its ambitious massively parallel Blue Gene supercomputer, which, when finished, will be the fastest, most powerful in the world. Its major focus will be the thermodynamics and kinetics of protein folding.

To that end, scientists have built a number of Blue Gene racks that will serve as modules for the full-sized Blue Gene. In fact, two of those prototypes, a two-rack and a four-rack, are on the list of the world's top 10 supercomputers.

Swope and colleagues, including computational chemist Jed Pitera, are using molecular dynamics applications that will take advantage of Blue Gene's massively parallel architecture.

ALGORITHMIC
[+]Enlarge
As the number of computer processors increases, the benefit of using the neutral territory method becomes greater.
As the number of computer processors increases, the benefit of using the neutral territory method becomes greater.

TO FURTHER STRENGTHEN the connections between simulation and experiment, the group examined the melting of trpzip2, a peptide developed at Genentech. Chemistry professor Martin Gruebele at the University of Illinois, Urbana-Champaign, couldn't explain the different melting temperatures he obtained with different spectroscopic techniques. The IBM simulations showed that the various experiments were probing different parts of the protein's structure, which were stable at different temperatures. The group is also studying the effects of membrane composition on protein dynamics and activation using rhodopsin embedded in a lipid bilayer as a test case. This project, led by Michael Pitman at the IBM T. J. Watson Research Center in Yorktown Heights, N.Y., employs one of the Blue Gene prototypes as well as software being developed for Blue Gene, called Blue Matter.

Meanwhile, at Novartis Institutes for Biomedical Research, the strategy is to embrace all forms of the supercomputer platforms. Research scientist Dmitri Mikhailov espouses such diversity "so we're not putting all our bets on a single platform," he said, whether it be a large traditional supercomputer, a Linux cluster, or a PC grid.

TRIMMED DOWN
[+]Enlarge
The region from which a processor must import data (blue) to a cube (green) is much smaller in the neutral territory method (right) than in a traditional method (left).
The region from which a processor must import data (blue) to a cube (green) is much smaller in the neutral territory method (right) than in a traditional method (left).

The informatics group is developing approaches that categorize methods. Some require the use of an expensive supercomputer, but some tasks are ideally suited for PC grids, which are the cheapest of all.

As always, speed is the name of the game, and improvements in one aspect of computing can have a big effect. A new algorithmic strategy for performing biomolecular simulations on parallel computer systems using molecular dynamics techniques greatly reduces the amount of data that needs to be communicated between processors, thus speeding computation time.

As David E. Shaw, chief scientist at D. E. Shaw Research & Development, explained at the meeting, these simulations typically require the calculation of the forces exerted on each atom by all other atoms that lie within a given "interaction radius." These calculations are so time-consuming that the simulation of, say, a protein swimming in water for a period of a millisecond would require a great number of processors working together. But the time required for all these processors to communicate with each other limits this approach's effectiveness.

A common technique for such simulations divides up the space containing the molecule and water into a bunch of cubes. Each processor is responsible for calculating all forces acting on the atoms within a specific cube. To do this, the processor must "import" data from a roughly hemispherical region containing about half of those atoms that lie within the interaction radius. "It's like entertaining the other atoms within your own home," Shaw explained.

But in Shaw's new algorithm, which he calls the neutral territory method, each pair of atoms instead meets within some cube in which neither atom lives. The meeting places are chosen in such a way that the amount of data imported by each processor is much smaller, allowing a much larger number of processors to share the computational burden.

Another strategy for speeding computation comes from ClearSpeed, a company that produces chips that accelerate a computer's ability to perform calculations. As Simon McIntosh-Smith, director of architecture at ClearSpeed, explained, though Linux clusters are a boon to scientists, scientists still want more. "What they'd really like is their own cluster, so they can use it all the time," McIntosh-Smith remarked. "But it's not practical because it's too expensive and just takes up too much space."

ClearSpeed's chip accelerators aim to give a desktop PC the capabilities of a small cluster. The trick, McIntosh-Smith said, is massive data parallelism. Each chip contains a tightly packed collection of dozens of parallel processors. The processors themselves don't run that fast, but the sheer number of them provides a problem-solving advantage.

ClearSpeed tested its chips on a Linux-based PC using the open-source molecular dynamics code GROMACS. With the chips, GROMACS ran four times faster than with conventional chips, McIntosh-Smith said.

The company is set to announce a new line of chips, with even more acceleration power, on Oct. 6 at the Fall Processor Forum in San Jose, Calif.

"The increased performance comes from faster clock speed and from even more data parallelism," McIntosh-Smith said. "That's our road map for the future."

Article:

This article has been sent to the following recipient:

0 /1 FREE ARTICLES LEFT THIS MONTH Remaining
Chemistry matters. Join us to get the news you need.