Advertisement

If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.

ENJOY UNLIMITED ACCES TO C&EN

Analytical Chemistry

Spellbinding Data

New online databases combine structure and binding of protein-ligand complexes

by ELIZABETH K. WILSON, C&EN WEST COAST NEWS BUREAU
July 11, 2005 | A version of this story appeared in Volume 83, Issue 28


DATA, BASICALLY
Photo at left: Smith (from left), Benson, and Carlson assembled Binding MOAD. Photo at right: Shaomeng Wang (front) and Renxiao Wang (from left) graduate student Yipin Lu, and postdoc Xueliang Fang created PDBbind..
Credit: UNIVERSITY OF MICHIGAN PHOTOS

Nothing's more frustrating to a researcher than having access to a pile of information but no easy way to use it.

Those who develop computational strategies to find new HIV-thwarting protease inhibitors or cancer-battling tyrosine kinase inhibitors know the feeling all too well. They want to anticipate and screen potential drugs' ability to interlock with, and inhibit the action of, proteins.

There's a great body of knowledge that could help them--the experimentally determined three-dimensional structures of known protein-ligand complexes and explicit details about the strength of their bonds--but it's scattered among thousands of research papers spanning decades.

In what could be described as a labor of love, two research groups have painstakingly gathered this dispersed information into two large online databases encompassing all known protein-ligand complex structures and their binding information.

Assistant medicinal chemistry professor Heather A. Carlson and associate internal medicine professor and assistant medicinal chemistry professor Shaomeng Wang, both at the University of Michigan, and their teams independently devoted several years of their labs' time to scouring all the available protein-ligand complex literature for binding data.

Carlson's Binding MOAD (Mother Of All Databases) (Prot. Struct. Func. Gen., published online June 21, http://dx.doi.org/10.1002/prot.20512) and Wang’s PDBbind (J. Med. Chem. 2004, 47, 2977; 2005, 48, 4111) are an order of magnitude larger than any other of their kind, each with structure and binding data for more than 1,000 complexes. Although they're similar, Binding MOAD and PDBbind have different features that complement each other, observers say.

With the databases, which are free to academic researchers and nonprofit agencies, scientists can dispense with time-consuming efforts to cull a few structures from library searches. Instead, they can devote themselves to creating models that predict the best binders or to discovering patterns in the binding properties of different complexes.

"What Heather and Shaomeng have done is an incredible service to all of us," says Alexander Tropsha, professor in the School of Pharmacy at the University of North Carolina, Chapel Hill.

"For us, it's a terrific research tool," says University of California, San Francisco, pharmaceutical chemistry professor Brian K. Shoichet, who uses Binding MOAD.

Although it's easy to develop a docking or virtual program that's good at predicting the behavior of a small subset of complexes, Shoichet notes, these databases challenge researchers to widen their horizons. "Now there's no excuse not to test things out on a really diverse set of compounds," he says.

The structures in Binding MOAD and PDBbind are mined from the Protein Data Bank (PDB), the world's repository for large biomolecule structural information. PDB currently contains around 31,500 protein structures, and thousands are added every year.

A number of public databases, such as BIND, Binding DB, KiBank, and PDSP Ki Database, contain large amounts of binding information. But what's been missing is a large database that interweaves binding affinity data and 3-D structures, both of which are necessary for developing models that predict protein-ligand binding. Until recently, resources were limited to a handful of similarly named efforts, including the Protein Ligand Database and the Ligand-Protein Database, which contain structure and binding data on several hundred protein-ligand complexes.

Unbeknownst to each other, Carlson and Wang, who have similar research interests and who both have appointments at the University of Michigan's School of Pharmacy, decided to tackle the problem. Wang's team, including researcher Renxiao Wang (no relation), and Carlson's team, including graduate students Mark L. Benson and Richard D. Smith, winnowed the structures in PDB down to about 6,000 complexes. The groups examined each of the 6,000 papers referenced by the PDB listings by hand, looking for binding data.

Their synergistic efforts came to light when Wang's graduate student, on whose committee Carlson served, described PDBbind during an oral exam. "I said, 'Shaomeng, I think we have to talk--we are doing almost the same thing,' " Carlson recalls.

The overlap resulted in unusually rigorous quality control: The two groups agreed to exchange their data, allowing them to verify it in the process. "We thought it would be beneficial to both groups and also for the general science community," Wang says.

Each group labored to separate valid ligands from invalid ones, such as salts and buffers. Their focuses are different. For example, PDBbind eliminates any complexes without binding data. Binding MOAD groups proteins into families, selecting the tightest binder as a representative.

THE DAY IT WAS POSTED online, Binding MOAD, which is funded by a Beckman Foundation Young Investigator Award, grabbed 10 users in six different countries. Since its first incarnation was released in 2004, PDBbind has attracted nearly 400 users worldwide.

John Mitchell, creator of the Protein Ligand Database and molecular informatics lecturer at Cambridge University, says the databases will increase the reliability of computational affinity prediction.

Carlson and Michael Gilson, a professor at the University of Maryland Biotechnology Institute and creator of the Binding DB database, are considering linking their two databases. This step would allow researchers to, say, analyze binding data in Binding DB and corresponding structural data in Binding MOAD.

In addition to performing regular updates, Carlson and Wang say they intend to expand their databases' capabilities. Carlson wants to incorporate protein flexibility into Binding MOAD to help researchers study how motions affect binding affinity. Wang plans to add binding assay details as well as protein-protein and protein-nucleic acid complexes to PDBbind.

Researchers such as Gilson, Mitchell, and Carlson predict that the future of data mining lies in teaching computers to do the work for them. With a sophisticated language-processing program, a computer could intelligently scan papers, including tables and graphs.

"If a computer could take a PDF file and pull out information like binding data, temperature, and pH, that would make our lives so much easier," Carlson says.

 

Article:

This article has been sent to the following recipient:

0 /1 FREE ARTICLES LEFT THIS MONTH Remaining
Chemistry matters. Join us to get the news you need.