Advertisement

If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.

ENJOY UNLIMITED ACCES TO C&EN

Policy

Dealing With Data Deluge

Chemical informatics professionals help scientists cope with and benefit from information overload

by Sarah Everts
July 17, 2006 | A version of this story appeared in Volume 84, Issue 29

DATA DELUGE
[+]Enlarge
Credit: Photo By Amanda Yarnell
Kajosalo (right) and chemistry professor Catherine Drennan (left) of MIT discuss how Drennan, a crystallographer, can best manage diffraction data.
Credit: Photo By Amanda Yarnell
Kajosalo (right) and chemistry professor Catherine Drennan (left) of MIT discuss how Drennan, a crystallographer, can best manage diffraction data.

If you prefer blogs over beakers and would rather work with data than generate them at a lab bench, then a career in chemical information may be the path for you. People employed in chemical information use their techie tendencies and chemistry know-how to help scientists find, analyze, and manage data.

STREAMLINE
[+]Enlarge
Credit: Photo By Jeff Glenn/Dow
Rothman (back) pours over data management schematics with Dow colleague Jan Baumgras.
Credit: Photo By Jeff Glenn/Dow
Rothman (back) pours over data management schematics with Dow colleague Jan Baumgras.

They are academic librarians who show big research groups how to share experimental data and relevant papers by using internal lab group blogs. They are the patent searchers in industry who track down every last reference to a possible drug lead—even if it was an oral presentation in an esoteric language in a faraway country—before patent applications are written. They are chemical curators, individuals who sift through volumes of published papers, to annotate chemical databases that researchers use to plan reaction schemes or to search for protein structures. They are computer whizzes who use their chemistry background to design software to better manage the myriad data that new technology is allowing researchers to generate.

As the volume of scientific information and variety of places to acquire it begin to overwhelm even the most enthusiastic seeker, chemical informaticians in academia, government, and industry escort scientists through the data deluge and back to their lab bench with essential, organized information in hand.

"The best part of the job is the problem solving, cracking the puzzle to find information that people need to do their research." says Mary Talmadge-Grebenar, associate director of information and knowledge integration at Bristol-Myers Squibb and a biochemist who has been helping industrial scientists track down obscure information since the 1980s. "You learn something new every day. Scientists will come to you with a moiety and ask if any other drugs have it. Or they'll ask all the potential ways to make product C from product A."

If you are a patent searcher, industry scientists and patent attorneys are relying on you to alert them if their proposed research infringes on a competitor's intellectual property, helping them avoid fruitless work. In a "patentability" search, lawyers rely on patent searchers to scour conference presentations, journals, hard-bound books, and even patent publications in foreign languages, says Donna Kaye Wilson, of Pfizer. "And if you don't find it, it could still exist. But you cannot leave any stone unturned."

The trick is knowing where to look. "There used to be just one place to find information: inside the library," says Michaeleen Trimarchi, the chemistry librarian at Scripps Research Institute's Kresge Library. "It's like movies. It used to be that if you wanted to watch a film, you had to go to a theater.

"Now you also have the option of buying a movie or renting. And you have the additional option of getting the movie at the video store, online at Netflix, or off digital satellite. There's a multitude of choices," Trimarchi says. "It's the same with science information. Many people find the addition of databases and online information sources to be overwhelming."

Consequently, chemistry librarians spend less time in the stacks and more time in labs and classrooms, showing students and faculty how to search beyond Google to retrieve information on molecules, reactions, papers, and patents.

And there's a plethora of information available. This past June, Elsevier MDL's Beilstein database added its 10 millionth structure-searchable reaction. In 2005, the American Chemical Society's Chemical Abstracts Service added almost 2 million small molecules to its chemical database, three times the number of molecules added two decades earlier, in 1985. The numbers of patents and papers produced is also rising dramatically. In 1907, 11,847 paper and patent abstracts were added to the CAS collection. CAS predicts 1 million abstracts will be entered this year, bringing the total number to 25 million.

Librarians also teach people how to set up RSS feeds, which alert receivers that a new website, blog entry, or paper devoted to their field of research has gone online. "These days, librarians absolutely have to be tech savvy," says Erja Kajosalo, Massachusetts Institute of Technology's chemistry librarian.

Faced with a growing trend of chemical libraries closing, many chemistry librarians are applying their digital knowledge and organization skills directly in the lab setting. For example, Kajosalo is involved in a new project to help MIT research groups store, share, and manage enormous data sets, such as those acquired by X-ray crystallographers and NMR spectroscopists.

It's a growing trend at other universities, too. "The researchers we've talked to recognize that they don't have the skills or the time for dealing with the organization of their data-formatting it, enhancing it, curating, and/or archiving it. At Purdue University, they look to librarians to help them," wrote D. Scott Brandt, associate dean for research at Purdue Libraries, in a recent online discussion forum.

A lab's data may be disorganized, but librarians aren't, trained as they are to manage reams of information. At Purdue, there's also a move to match librarians with data-intensive research projects, even so far as to possibly include chemical librarians in grant applications and publications, says Jeremy R. Garritano, chemical information specialist at Purdue's chemistry library.

One should not be tempted to think that chemistry knowledge is optional in chemical information jobs. Although library or informatics graduate degrees are required for some positions, the general consensus is that informatics can be picked up on the job, but not the chemistry know-how. "I get the sense that faculty and students that I work with appreciate that I understand what they are talking about," Kajosalo says.

A chemistry background and strong communication skills allow Kajosalo to figure out what kind of information a person wants or needs, "which is often different from what they ask for."

This is what L. David Rothman of Dow Chemical calls "solving the unstructured problem." He co-leads a team of 60 people in research computing, a third of whom are "split personality" chemists with informatics skills who design software to run high-throughput equipment, as well as collect, manage, and analyze the consequent data.

"It's really great when someone hands you all the descriptors of a problem so all you have to figure out is what it takes to solve the problem. This is usually not the case," Rothman says. "It takes talking to people about what they really need, not what they are necessarily asking for, to solve it."

Rothman is convinced that smart management of data is essential to ramp up speed and efficiency of discovery, as high-throughput technology is increasingly adopted in industry to run, for example, parallel reactions. It's a field of chemistry called cheminformatics, which sprang up to handle combinatorial chemistry and proteomics data.

The goal is to make "generating the ideas for the next experiments you are going to do" the rate-limiting step, he says. He points out that the chemical industry can benefit from pharma's head start. "Pharma has gone down this road earlier than us and has had to swallow a much more expensive pill than we will," says Rothman. "We are able to enter a world where there's a lot more commercial software available.

"Still, there are times when people on the team lock their door and we slide pizza underneath it, when they are just writing code." But as much as there's a lot of fiddling with equipment and code-writing, Rothman says "face time" with staff scientists is often essential. For example, on the June day Rothman spoke with C&EN, "a robot that was supposed to weigh things was flipping weighing pans onto the floor. You don't solve a problem like that on the other end of a Web camera connection halfway around the world."

Outside of pharmaceutical and chemical companies, chemical informatics opportunities exist at software companies that design specialized chemistry software on contract, says Julian Hayward. He's managing director of Digital Chemistry, a small start-up in the U.K. that employs chemical informaticians to design software that searches combinatorial libraries for small-molecule mimics of potential drug leads.

[+]Enlarge
Credit: Photo courtesy of Bristol Myers Squibb
Talmadge-Grebenar
Credit: Photo courtesy of Bristol Myers Squibb
Talmadge-Grebenar

Then there are the giants of the chemical information industry, such as Elsevier's MDL and CAS. In addition to employing software developers, they also hire Ph.D. and M.S. scientists to annotate their databases. Called curators, editorial analysts, or data annotators, these folks ensure the data are entered into private and public science databases accurately.

[+]Enlarge
Credit: Photo courtesy of Kyle Burkhardt
Burkhardt
Credit: Photo courtesy of Kyle Burkhardt
Burkhardt

CAS employs more than 600 people to keep up with the constant publishing of papers. Each document can take between 30 minutes to a few hours to process, says Ida Copenhaver of CAS's editorial operations department. Each paper or patent is read carefully, and the analysts assign index terms that link the paper to related chemical information, such as structural or reactivity data. For example, a paper on gasoline would likely be linked to petroleum-related terms.

At the public Protein Data Bank, where scientists deposit their structural biology data, annotators are hired to check entries for self-consistency. "We often find mistakes that the researcher misses or wasn't aware of," says Kyle Burkhardt, data annotation leader at PDB's Research Collaboratory for Structural Bioinformatics (RCSB), which is housed at Rutgers University and the University of California, San Diego. Full-time curators process four structures a day to stay on top of the estimated 7,000 that will be submitted to PDB this year.

Here, too, most of the curators have Ph.D.s, typically in nuclear magnetic resonance spectrometry or X-ray crystallography. "They've deposited their own structures into the database before coming to us," Burkhardt says. She has a bachelor's degree in chemistry, but says, "I'm the only one without a Ph.D. left in the annotator room.

"To succeed, you have to be really diligent and patient, with good attention to detail. You have to work on a computer all day. This can be a problem for a chemist who is used to tinkering in a lab.

"But in a lab you could do two months' worth of research and have no results," Burkhardt adds. "Working two months here, you'll definitely have processed a large amount of structures in that time. It's good to go home at night and say, 'I got something done today.' It's great to learn something new every day."

This is a typical sentiment among chemical informatics professionals. Faced with an inundation of information, a career in chemical information suits computer-savvy people who like to discover new information every day—just without the lab goggles.

Education

To Degree Or Not To Degree

The only shared educational standard among people in chemical information is the chemistry background. Most chemistry librarians at universities have a master's degree in library science. In corporations, information searchers often start off as bench chemists with a science M.S. or Ph.D. and a knack for computers. They learn the essentials of searching along the way.

Database annotators almost always have a Ph.D. in chemistry or a related subject. A foreign language is considered a plus, especially for deciphering foreign patents.

If your tastes tilt toward the digital early on, there are new programs in chemical informatics or cheminformatics at Indiana University; the University of Sheffield, in England; and the University of Manchester Institute of Science & Technology, also in England. Sheffield's one-year master's program is based out of the informatics department and focuses on information storage and retrieval, designing databases to do so, and object-oriented programming. UMIST's one-year program, based out of the chemistry department, focuses more on applying in silico tools to solve chemical problems. Indiana's program is a mix of both. All three schools also have Ph.D. programs, but these cater to those who seek an academic research career in cheminformatics.

Article:

This article has been sent to the following recipient:

0 /1 FREE ARTICLES LEFT THIS MONTH Remaining
Chemistry matters. Join us to get the news you need.