CHEMISTS ARE undeniably creative, but synthetically speaking they're fond of the familiar. That's one take-home message from a new study of the structures of nearly all the organic compounds in the Chemical Abstracts Service (CAS) Registry.
The analysis shows that "a relatively small number of framework shapes dominates organic chemistry," says research scientist Alan H. Lipkus, who carried out the study with Qiong Yuan, Karen A. Lucas, and other colleagues at CAS, a division of the American Chemical Society (J. Org. Chem., DOI: 10.1021/jo8001276). The results suggest that some areas of chemical "structure space" may warrant more intensive exploration.
"We found more than 800,000 different framework shapes among the 24 million compounds we studied, and yet half of the compounds can be described by only 143 of those shapes," Lipkus says.
The authors derived the frameworks by paring each molecule down to the shape of its rings and the linkers that connect them. Thus, both cyclohexane and aniline become a hexagon, which is the most common framework shape in the CAS Registry.
The findings echo those of smaller previous studies in which drug-related compounds were found to have limited structural diversity, notes Ad P. IJzerman, a medicinal chemistry professor at Leiden University, in the Netherlands, who has dubbed the common frameworks "chemical clich??s." These earlier studies included an investigation of the diversity of about 5,000 drugs by Guy W. Bemis, research fellow, and Mark A. Murcko, chief technology officer, of Vertex Pharmaceuticals in Cambridge, Mass. (J. Med. Chem. 1996, 39, 2887), and a structural analysis of a database of 250,000 potential drugs by IJzerman and colleagues (J. Chem. Inf. Model. 2006, 46, 553).
The new study "makes clear that the apparent lack of scaffold diversity among drug molecules is not unique but is also found among organic molecules in general," Bemis says.
Lipkus used a version of CAS's SubScape software, which is available to subscribers of SciFinder, CAS's primary search and navigation tool. After analyzing and visualizing structural frameworks in the CAS Registry with the software, he was surprised to find that the frequency with which any particular framework is represented in the registry apparently conforms to a power law. This means that most frameworks occur infrequently while a few recur many times.
"The study reinforces the view that the exploration of organic chemistry space has tended to concentrate on a relatively small number of structural types," Lipkus says. "This suggests that there are many underexplored regions in this space."
Bemis notes that "the findings give chemists a good reason to believe that all the useful molecules have not yet been made. There is plenty of room to play around on the low-frequency side of the power-law graph. The analysis in this paper is one of the few that point us toward molecules that don't yet exist." Medicinal chemists, for instance, might zero in on less well-represented frameworks to find novel leads for drugs.
Lipkus speculates that to minimize synthesis costs, chemists repeatedly return to familiar structures as the basis for derivatives. But IJzerman cautions that "this cost-effectiveness seems to preclude our chemical astronauts from exploring lesser known regions of chemical space. In that sense, the Lipkus paper should be regarded as an invitation to do just that."