The Incredible Vastness of Data | June 11, 2007 Issue - Vol. 85 Issue 24 | Chemical & Engineering News
Volume 85 Issue 24 | pp. 54-55
Issue Date: June 11, 2007

Cover Stories: A Century Of CAS

The Incredible Vastness of Data

In the hands of CAS, a morass of data points ends up telling epic research stories, page by page
Department: ACS News, Science & Technology

From an airplane, you can get the big picture, an entire landscape. But if it is the trees you really need to see, and not the forest, then you have to get your feet on the ground.

In 2005, CAS unveiled a new data analysis and visualization product—STN AnaVist—with which users can navigate the vast landscape of chemical research through the eyes of CAS scientists. It's a perspective that reveals all of the scientific hot spots and regional interconnections, as well as the local detail, all the way down to the full text of the patents, abstracts, papers, and other types of information that make up the global view.

The interactive portal at the right leads to two sets of visualizations—one derived from the full set of about 12,000 records that were on hand in 1907, and one derived from 19,000, or roughly 10%, of the 2007 research records that CAS scientists had created as of the first few months of the year. In each case, we show a global view of the data in which the density of points corresponds to the hottest areas of research at the time. The different colors of the points denote some of the most intense research arenas. These global views serve as snapshots of chemical history. In 1907, for example, some of the most notable locations of the research landscape denote work on the chemistry of air and other gases, and on dye chemistry. A century later, some of the hottest areas include the chemistry of biological molecules such as proteins and antibodies.

Shown along with each global portrait of research accessible through the portal are several more detailed local views that reveal subareas. Follow the colored circles to drill deeper and then deeper into the data. Individual points correspond to specific papers, patents, conference proceedings, or other relevant and retrievable records in the CAS databases. The relative proximity of points is a measure of the conceptual relatedness of the documents.

These views from STN AnaVist provide the merest of glimpses of what 100 years worth of data gathering, categorizing, analyzing, processing, and searching have made possible, notes CAS's Anthony Trippe, one of the tool's designers.

 
Chemical & Engineering News
ISSN 0009-2347
Copyright © American Chemical Society

Leave A Comment

*Required to comment