Issue Date: December 24, 2012
Chemical Structure: Databases Grow In Popularity
If journal citations measure a piece of work’s impact on the scientific community, then it’s clear what’s been most important to chemists during the past decade: molecular structural data.
“The Cambridge Structural Database: A Quarter of a Million Crystal Structures and Rising,” is a treatise published in 2002 on the exponentially expanding collection of small-molecule crystal structures curated by the Cambridge Crystallographic Data Centre (CCDC), in England (Acta Cryst. 2002,B58, 380). The paper had garnered 4,689 citations as of Dec. 12, the most of any paper in the chemical sciences published in 2002, according to an analysis conducted for C&EN by the American Chemical Society’s Chemical Abstracts Service (CAS).
The Cambridge Structural Database (CSD) was designed to house molecular structures and property data of small molecules of interest to chemists and life scientists, including organic and organometallic compounds of up to 1,000 atoms. Since its inception in 1965, the database has grown from a fledgling project managed by a few staff members who collected data by hand, to a state-of-the-art facility with dozens of team members who design and manage software for molecular searches, analysis, and visualization.
CSD complements, and sometimes overlaps with, three other major structural databases, also started in the 1960s and 1970s. These are the Protein Data Bank, managed by Rutgers University and the University of California, San Diego; the Inorganic Crystal Structure Database, managed by FIZ Karlsruhe–Leibniz Institute for Information Infrastructure, in Germany; and CRYSTMET, a database for metals and alloys, managed by Toth Information Systems, in Canada.
In the 2002 paper, CCDC’s then-director Frank H. Allen predicted that CSD would collect half-a-million crystal structures by 2010—it came in just under the wire, reaching that number on Dec. 1, 2010. The database now contains more than 630,000 structures and is growing by more than 40,000 structures each year. “We are clearly heading for a million rather rapidly,” says Allen, who retired from the CCDC directorship in 2008.
By comparison, the CAS Registry, the world’s largest collection of publicly disclosed substance information, ended 2002 with information on just more than 20.7 million small molecules, Schenck notes. “Reflecting the phenomenal growth in chemistry research worldwide, the CAS Registry added its 70 millionth small molecule in early December,” he adds.
“I’m not surprised CSD is being so heavily used and cited, as it does contain so much information about molecular structure and intermolecular interactions,” says Sarah L. (Sally) Price, a professor of physical chemistry at University College London.
Price, who has cited CSD dozens of times in her publications, says her ability to retrieve crystal structures of families of related molecules and then analyze them using CCDC’s software has been “an essential starting point for looking at supramolecular behavior.”
CSD data have innumerable other uses, Allen notes, for scientists studying conformer generation, protein-ligand docking, and solid-state phenomena such as drug polymorphism and cocrystallization.
“The types of papers that use and cite CSD continue to surprise and delight us,” says Colin Groom, current director of CCDC.
- Chemical & Engineering News
- ISSN 0009-2347
- Copyright © American Chemical Society