Issue Date: June 11, 2007
CAS Surveys Its First 100 Years
TO GRASP HOW LONG ago Chemical Abstracts began, consider all that had not yet happened in 1907. Roald Amundsen had not reached the South Pole. Both the first Model T Ford and the Titanic had yet to be constructed. World War I would not occur for another decade. Twenty years would pass before Alexander Fleming noticed that a certain Penicillium mold possessed the curious power to kill various bacteria.
But dramatic developments were already stirring in that dawn of 20th-century science. Albert Einstein, who had published his paper on special relativity in 1905, conceived the basis of his general relativity theory in 1907. That same year, Ivan Pavlov was studying conditioned reflexes, Louis Lumière developed a process for color photography, and Emile Fischer published his "Researches on the Chemistry of Proteins." A paper by Marie Curie on the atomic weight of radium also appeared in 1907. Scientists had much new information to assimilate, and clearly, they needed an information tool to keep it all in view.
Of course, the idea of a publication that indexed and summarized other publications—a "secondary" reference work—was not a new concept at the turn of the century. Abstracts and secondary information collections in the form of journals and reference works had been in existence since at least the 17th century. For example, the Philosophical Transactions of the Royal Society published abstracts and was not the first journal to do so.
Among the other publications that provided abstracts before CA were Comptes Rendus (1776), Bulletin de la Société Chimique de France (1863), Index Medicus (1879), and Engineering Index (1884). The most comparable secondary publication for chemistry was Chemisches Zentralblatt, which stemmed from Pharmaceutisches Zentralblatt, first published in 1830. In addition, well-known sources of substance information that preceded CA include the Beilstein and Gmelin handbooks, and information services evolving from them remain in use to this day.
CREATING CHEMICAL ABSTRACTS. CA was not the first of its kind, but it does occupy a special niche. This is well-expressed on a University of Texas chemistry library Web page: "The year Chemical Abstracts began indexing the world's chemical literature is the watershed date that now serves as a somewhat arbitrary demarcation between modern and historical chemistry" (www.lib.utexas.edu/chem/info/old.html).
The immediate precursor of CA was the Review of American Chemical Research, established in 1895 by Arthur A. Noyes of Massachusetts Institute of Technology. This publication was a collection of abstracts intended in part to draw attention to the work of American chemists, who were perhaps overshadowed by their counterparts in Europe. Originally part of the MIT Technology Quarterly, the Review became incorporated into the Journal of the American Chemical Society in 1897. William A. Noyes, a cousin of Arthur's, became the editor of JACS in 1902 and expanded and improved this collection of abstracts.
Soon it became evident that a separate abstracting publication was warranted. A committee of industrial chemists, who did not agree with some theoretical chemists that the American Chemical Society should be limited to "pure chemists," suggested that ACS produce an abstracting journal with the aim of ensuring that "the whole field of chemistry, the world over, be covered by abstracts." In 1906, the ACS Council authorized the creation of this publication. Named Chemical Abstracts, its first issue appeared with the cover date Jan. 1, 1907.
From its inception, CA was intended to provide a broad focus, not only in terms of international coverage but also in its aim to cover both applied and theoretical chemistry. By contrast, Chemisches Zentralblatt did not begin to include industrial chemistry until 1919. Ironically, in view of the interest in acknowledging the work of American chemists, the first abstract in CA was for an article in a German publication. From its first issue, CA clearly was both global and inclusive in its view of chemistry. As CAS does today, the coverage included "biological chemistry" among other subdisciplines, and the abstracts were not only for literature but also for patents.
Considering the broad view of the chemical sciences provided by CA, Charles L. Parsons, secretary of ACS, 1907–45, remarked, "Chemical Abstracts is the bond that holds the American Chemical Society together."
William Noyes first edited CA from the National Bureau of Standards in Washington, D.C., where he served as chief chemist. But he and the fledgling publishing operation moved to the University of Illinois later in 1907, when he was invited to join the chemistry department faculty there. The CA operation moved again in 1909, when Austin M. Patterson, who succeeded Noyes as editor, relocated it to Ohio State University, Columbus, at the invitation of OSU chemistry professor William McPherson.
CA remained on the OSU campus for the next 56 years, and the publication and the university found the association mutually beneficial. While OSU faculty profited from access to the fine collection of journals CA maintained in the course of its work, the university's chemistry faculty often assisted in the editing of CA abstracts.
KEEPING UP WITH THE INFORMATION EXPLOSION. Patterson left CA in 1914 for health-related reasons. John J. Miller replaced him as CA editor, but only until the end of that year. Another dynamic figure was about to take the stage, as CA was about to begin the transition to a modern information service.
Tackling the ambitious mission to chronicle the world's chemistry-related research publications entailed many challenges. And for most of the first half-century of CA, these were admirably faced by Evan J. Crane, who became CA editor in 1915 at the age of 26 and led the organization until 1958. An inescapable challenge was the sheer magnitude of the chemistry-related literature. The list of candidate journals to be monitored for CA consisted originally of 396 titles, but by 1912 there were 600. By 1922, just seven years into Crane's editorship, there were already more than 1,000 journals on the list. That much literature was a lot to keep up with by a service relying only on a small staff long before the days of any electronic support. But additional publications continued to demand attention, and CA had more than 5,000 journals to keep track of by the early 1950s. How could CA accommodate so much new information and—just as important—bring order to the chaos?
Crane dealt with those daunting tasks in several ways. Improving the indexing of chemical information imposed some control over the chaos while making the scientific literature more accessible to the users of CA. He recognized that CA indexes were more valuable information tools than even the abstracts. Various indexes were added to CA volumes over time: From 1907, there were Author and Subject Indexes; in 1912, a Numerical Patent Index was added; in 1916, an Index of Ring Systems; in 1920, a Formula Index; in 1963, a Patent Concordance; in 1968, the CA Index Guide was created to assist in the use of the indexes; and in 1972, the Subject Index was divided into separate Chemical Substance and General Subject indexes. Over the years, the indexes and indexing policies have undergone adjustment to reflect the changing nature of the chemical sciences, but successive CA editors always carried on Crane's penchant for thorough and serviceable indexing.
Another crucial factor in the CA mission was the people who wrote the abstracts. Crane realized that processing the world's outpouring of new chemical information would quickly overwhelm the small staff he could employ in Columbus. Accordingly, he recruited additional volunteer abstractors from around the world who were willing to share in the editorial work as correspondents. These "iron men," as Crane once called them, were motivated not by money (none received more than nominal payment) but by their personal enthusiasm for the literature and belief in the value of their work.
Eventually, the team of volunteer abstractors reached almost 3,300 worldwide before the abstracting effort was shifted to a largely in-house process in the 1970s. To keep the far-flung team of abstractors in sync with editorial practices, motivated, and conscious of being members of a team, Crane devised a creative management tool: the Little CA. Published from 1930 until 1966, this irregular periodical included such varied items as brief practical tips, for example, a reminder to double-space those abstract submissions, exhortations regarding "pride in workmanship," and biographical notes about some of the individual volunteers, including their nicknames and favorite magazines and radio programs.
An item headed "National Sources of Journals" that appeared in the Little CA in December 1952 notes that the 5,236 journals then being abstracted for CA came from 87 different countries. Through the years, the pages of Little CA were enlivened here and there with a sprinkling of cheery clip art cartoons and frequently by light verse written by Crane himself or by staff members such as Mildred Bird, the office librarian.
Even the global calamity of war did not entirely shut off the flow of abstracts into CA. During part of World War II, a team of volunteer abstractors in Switzerland produced abstracts in German or French and sent them via clipper planes to Columbus.
Later, when that route was blocked, special arrangements were made to receive papers on microfilm through the U.S. Office of Scientific Research & Development, the Interdepartmental Committee on the Acquisition of Foreign Periodicals, the Alien Property Custodian, and the American Library Association's Joint Committee on Importations. CA also had an office in the Library of Congress beginning in 1942 for the purpose of abstracting German documents. After the war, this facility enabled CA to abstract selected Soviet literature.
To ensure continued coverage of hard-to-obtain literature as completely as possible, CA also monitored other secondary sources, including Chemisches Zentralblatt and, for some Russian and Eastern European publications, Referativnyi Zhurnal. These sister publications significantly assisted CAS in fulfilling its mission during a difficult period.
After the war, the number of chemical publications skyrocketed. It had taken CA 30 years to publish 1 million abstracts, but only 18 years to publish another million, and only another eight years to publish a third million. The fourth million required less than five years, and so on, at a steadily increasing pace. With the increased literature output, CA faced rising costs and a chronic financial crisis.
Through 1933, ACS membership dues financed CA, and the members could receive their own copy of the publication for free. Later, a $6.00 subscription fee was added. After World War II, it became obvious that subscription fees could not be raised enough to keep up with the costs of production. In 1952, ACS established Corporation Associates so the chemical industry could help to compensate for the CA operating deficit. But in 1955, with expenses exceeding $1 million for the first time, CA faced a deficit of almost $500,000. That year, the ACS Board of Directors addressed the crisis by establishing a new break-even pricing policy: CA must from that time forward be self-supporting and priced accordingly.
Along with the move to self-sufficiency, the organization became an operating division of ACS in 1956. It was then no longer CA but officially Chemical Abstracts Service (CAS), with Crane as the first CAS director. In 1955, the operation moved to its own three-story building, constructed on the OSU campus with support from both the university and ACS. A fourth floor was added in 1961; the building on the OSU campus is today called Watts Hall, and it houses the university's department of materials science and engineering.
The late 1950s to mid-1960s marked a watershed period for the organization now known as CAS. Emblematic of an impending sea change, the Crane years came to an end with Crane's retirement in 1958. Congratulatory letters poured in from around the world in honor of Crane and the organization whose prominence he had done so much to establish. In one of these letters, dated July 25, 1958, Herman Skolnik wrote:
"E. J. Crane is Chemical Abstracts, and what is Chemical Abstracts? As a heavy user, it is to me the tallest tree in the forest of chemistry. From the top of the tree, a chemist gains a view of the whole forest and takes his bearings. ... I salute you for giving chemistry a direction and time perspective oriented to the future."
In the Oct. 31, 1958, issue of Little CA, Crane published his farewell to the staff (in an editorial whimsically titled "Swan Song of a Crane") and introduced his successor. Crane actually named three men to replace him in managing CAS. Dale B. Baker was named director, but he would be assisted by Leonard T. Capell, executive consultant, who was a nomenclature authority, and Charles L. Bernier, the new CA editor. Of Baker, Crane said, he "has demonstrated excellent administrative and business ability." Clearly, Crane had learned that maintaining an outstanding scientific organization also meant running an efficient business.
HARNESSING THE COMPUTER. Baker was a native of Bucyrus, Ohio, and had come up through the ranks of CAS, beginning as a part-time "office boy" in 1939 while attending college at OSU. After graduating with a degree in chemical engineering, he spent four years working as a supervisory chemist at E. I. du Pont de Nemours & Co., before returning to CAS as an assistant editor in 1946. From that point, he rose steadily on the editorial ladder.
Having inherited the problem of keeping up with the chemical literature, Baker sought to modernize the CAS processing system. Fortunately, he could begin from a good foundation, for CAS had established an R&D department in 1955. The purpose of the research was explained in an article by Crane in the June 27, 1955, issue of Chemical & Engineering News: better and wider service, economic operation, and establishment of special services.
In 1959, Baker hired G. Malcolm Dyson to lead the automation effort. In harnessing the computer technology of the time, CAS began to think of the information compiled for CA as a database, which was a new idea for a publisher. A system would be created whereby the various pieces of information compiled for CA issues and indexes would be brought together in a database, then processed efficiently by the computer. Various "outputs" could then be generated, in addition to CA itself.
One of the biggest problems in generating CA substance indexing quickly was determining whether a chemical substance encountered in the literature was truly new or previously described by CAS scientists. Ambiguous nomenclature made it difficult to identify substances because they may be called by a number of different names in the literature. Even systematic nomenclature, based on the molecular composition of a substance, was of limited use.
The International Union of Pure & Applied Chemistry (IUPAC) formulated rules for naming chemicals systematically. Names were organized alphabetically in the Beilstein Handbuch der organischen Chemie and the CA index, but determining what component of the name was the "parent compound" to be alphabetized was not a straightforward exercise. Molecular formulas are also not ideal for indexing purposes, since the same formula can represent more than one compound.
To determine whether a substance was new, CA indexers used to draw it by hand, then name it and manually compare it with those previously indexed. This was not only laborious but also resulted in naming the same compounds again and again over time. CAS realized that computers offered the possibility of accomplishing substance identification more efficiently.
Dyson, the CAS research director, suggested the concept of a chemical registry in the late 1950s. At first, this was tried with a file of fluorine compounds that were coded with Dyson-IUPAC linear notation. Records were stored on edge-notched cards, and each compound was given a "register number."
But the limitations of linear notation were soon evident. Harry L. Morgan of CAS built upon the work of Donald J. Gluck from DuPont to perfect a "connection table," using an algorithm to translate a two-dimensional structural diagram into a table indicating the arrangement of atoms and bonds that can be searched by computer. This Morgan algorithm became the basis of the CAS Chemical Registry System, which was introduced in experimental form in 1964 and became operational in 1965.
Another prominent figure in the formative years of CAS's computerization was Fred A. Tate. Tate had earned his doctorate in organic chemistry from Harvard and, in 1961, was the manager of the scientific section of Wyeth Laboratories. That year, Baker recruited him as CAS assistant director. He later became the associate director and in 1974 was appointed associate director for planning and development.
Up to the time of his death in 1980, Tate was the primary driving force behind the CAS automation efforts and especially the CAS Registry. As Baker put it, "Fred Tate personally conceived many of the key components of CAS's computer-based processing system and assembled the teams necessary to develop them." Baker also gave credit to CA Editor Russell J. Rowlett Jr., "a most outstanding key team member in the integration of the Registry into editorial and processing operations."
Punched cards were initially the means of entering structures into the CAS Registry, and later CAS acquired a special typewriter that could punch structural information on paper tape. Input methods have continued to evolve over the years: a Beehive cathode-ray terminal was introduced in the late 1970s, followed by DEC PC350 terminals, with DEC PDP 11/34 minicomputers to compile the data and IBM 3090 and 3081 mainframe computers to support the registration process. Later technology enabled the graphical input of chemical structures with Sun SPARC station equipment.
A central component of the CAS Registry is the CAS Registry Number identifier, which avoids the ambiguity of chemical names and the unwieldiness of linear notation to identify a chemical substance. In essence, the CAS Registry Number functions as the computer address of a substance record that contains the molecular structure diagram, the systematic CA index name, other names, and additional information associated with the substance. CAS developed a distinctive format for these identifiers, using hyphens to make the numbers easier to read and recognize.
Today, a CAS Registry Number includes up to nine digits that are separated into three parts by hyphens. The first part, starting from the left, has up to six digits, the second part has two digits, and the final part is a single check digit to verify the validity of the total number. For example, a certain statin drug has a systematic chemical name of about 70 characters and is associated with several different trade names. But it is concisely identified by the CAS Registry Number 79902-63-9.
CAS succeeded in speeding the production of CAS indexes, thanks to the efficiencies of the CAS Chemical Registry System, but the value of this system beyond the walls of CAS was soon evident. Many information providers adopted the CAS Registry Number system as a standard for chemical identification, including the Environmental Protection Agency for the Toxic Substances Control Act (TSCA) Inventory. Similar national inventories followed suit, as did many companies that began to use CAS Registry Numbers in Material Safety Data Sheets, product labels, and other materials reporting chemical contents.
Keeping CA up to date was of national importance, and so the development of CAS computerized processing was a likely candidate for National Science Foundation funding. CAS received some NSF funding in the late 1960s and early 1970s. Specifically, the funding was for the development of several capabilities:
- ◾Combination of processing functions (machine-file creation, editing, proofreading) into a single, integrated computer operation.
- ◾Computer control of nomenclature and the structural records of chemical compounds.
- ◾Computerized typesetting of index text.
In hearings before the congressional Committee on Science & Astronautics in 1972, NSF noted that its funding of CAS projects was in the best interest of the scientific community: "The principal objective of the program has been and continues to be the overall improvement of the communication system of science, with the scientific community itself responsible for developing essential information systems and services and providing their ultimate support."
Clearly, NSF, with the agreement of Congress, assumed that CAS's computer-assisted processing and the CAS Registry itself would be self-supporting through the fees paid by scientific users who benefited from more efficient access to information.
Progress in computer-assisted processing for CAS's editorial operations proceeded hand-in-hand with the development of electronic information products for users. One of the earliest was Chemical Titles in 1961, the world's first periodical to be organized, indexed, and composed by computer. It was also the first to use keyword-in-context (KWIC) indexing.
CAS soon produced other new printed and computer-readable information services: Chemical-Biological Activities (CBAC) in 1965 and Polymer Science & Technology (POST) in 1967. In the days before online searching, these services were produced on magnetic tape and could be searched in batch mode. Other CAS services soon followed: CA Condensates (1968) drew upon the CAS database for bibliographic information and keywords; CA Subject Index Alert (CASIA) in 1973 incorporated subject index entries and was thus a complement to CA Condensates.
By 1975, publishing CA issues was entirely computerized and CAS continued to introduce new products. The CA Selects series was introduced in 1976, taking advantage of the computer's ability to generate current-awareness bulletins containing only those CA abstracts pertinent to a given topic. So a scientist who could not afford (and did not need) to receive his or her own copy of CA issues could nevertheless have a focused current-awareness bulletin. Of all the benefits resulting from computerization, however, information users gained the most from their newfound ability to search CAS information online from remote computer terminals.
Through licensing agreements, CAS provided tapes of its computer-readable services to third-party vendors in the fledging online search industry.
A significant advance in these licensed services was CA SEARCH, which brought the CA bibliographic information, keyword content, and detailed subject indexing together in a single file in 1978. The first major online system, RECON from the National Aeronautics & Space Administration, had become operational in 1969. Other early systems were DIALOG from the Lockheed Missiles & Space Co. and ORBIT from System Development Corp., both of which became licensed vendors for CA SEARCH.
CAS ENTERS THE ONLINE ARENA. In the 1970s, remote online access to CAS databases was available only through commercial vendors who licensed the data in the form of tapes from CAS. But this situation raised the concern that CAS was losing its direct contact with chemical information users and a firsthand knowledge of their information needs.
As a result, CAS introduced its own online service, CAS Online, in 1980. This began as a pilot version made available to a limited group of customers. About 500,000 substance records were available and could be searched only by screen numbers representing specific molecular structural features. Searching by screens was not the most convenient method for information users, and yet many found the new system useful.
When CAS Online was introduced to the general public, it provided access to 1.8 million substance records, about one-third of the total Registry database. Other segments of the Registry were added to CAS Online in increments as the search capacity was increased at CAS.
In November 1981, CAS introduced searching by structure or substructure diagram. Users with a specific model of intelligent graphics terminal, the Hewlett-Packard 2647A, could select structure features from a menu and then assemble them on the terminal monitor by using a graphics tablet and stylus. These terminals could display answers with well-drawn structure diagrams.
Structure-based searching remains one of the most remarkable advantages researchers have gained from online access to the wealth of substance information in the CAS Registry. A typical example is the search for new pharmaceuticals. After identifying a certain substructure that accounts for the desired medicinal effect of a known drug, a chemist might discover that the CAS Registry contains many other substances that share this substructure and perhaps the desired effect as well. To accomplish the same result through a search of chemical nomenclature might well be prohibitively difficult and time-consuming.
Another important substance-searching advance was to occur in 1990 when CAS introduced MARPAT, a database of Markush (generic) structures found in patent documents. These structures are commonly used to extend a patent's coverage to include not only specific chemicals but also hypothesized molecules represented by a generic structure. Accordingly, the new MARPAT database offered a valuable tool for comprehensive chemical patent searching.
At the end of 1983, CAS introduced the CA File, a development that was not only to change the definition of CAS Online but also to accelerate CAS's evolution into a major provider of online information services. CAS Online was now both a bibliographic and a substance search system.
With its direct connection to information users revitalized by the introduction of its own online service, CAS formed User Councils for North America, Europe, and Japan in 1983. This move carried on a tradition of welcoming input from information users. Similar bodies of the past were the CAS Advisory Board, established in 1964, and the CAS Editorial Advisory Board, created in 1975. In 1965, ACS established a standing committee on CAS as a liaison between the ACS Board of Directors and CAS; in 1978, the ACS Society Committee on CAS (CCAS) was created to provide oversight. This official role was assumed by a new ACS governing body in 1991, the Governing Board for Publishing, with general responsibility for the operations and performance of CAS and the ACS Publications Division. CCAS evolved into the Joint Board-Council Committee on CAS and provides an important forum for communication between ACS members and CAS.
In April 1984, CAS introduced the CAOLD database, containing CA abstract references for substances that had been added to Registry as the result of a CAS initiative begun in November 1983 called the Pre-1965 Registration Project. Since the CAS Chemical Registry System did not exist until 1965, registering chemical substances indexed in CA prior to 1965 required capturing this older substance information from printed CA indexes, digitizing it by optical scanning equipment and adding it incrementally to the online file. Later, CAS would introduce the CAplus database, incorporating the older records as well as the most recent information available from CAS.
About this same time in 1984, CAS was awarded a contract from the U.S. Department of Commerce to automate the U.S. Patent & Trademark Office, creating a searchable full-text electronic database of U.S. patents with rapidly viewable patent images. This eight-year project involved scores of CAS technical staff, highlighted CAS expertise, and confirmed CAS's leadership in sophisticated search-system technologies.
While CAS continued to enhance CAS Online, plans for a new online network were also taking shape. An agreement between ACS and the German scientific organization FIZ Karlsruhe was signed in 1983 proposing an international network of databases in subject areas beyond chemistry and chemical engineering. Two host computers—one in Germany and one in the U.S.—would be linked and would use the same search software.
Thus users could search files loaded in either country with the same command language. The network, named STN International, the Scientific & Technical Information Network, was introduced in May 1984 and offered access to CAS files and Physics Briefs. The latter was based on the publication Physics Briefs, produced jointly by FIZ Karlsruhe and the American Institute of Physics.
For CAS, the creation of STN meant that European customers could now access CAS files and get search support through a service center on their own continent. CAS Online was no longer identified as a search system and became instead the family of CAS files on STN.
In 1988, CAS introduced its first software product to simplify online searching on personal computers, which had just recently begun to proliferate. This "front-end" software, called STN Express, was an outgrowth of initiatives and cooperation with the European Patent Documentation Group (PDG) and was designed primarily with the information specialist in mind. However, the service also attracted the interest of a new audience—the chemist interested in performing his or her own searches. STN Express permitted automatic log-in to STN and offered searching without mastering a command language. Structure searching was also simplified through the use of structure templates. STN Express quickly generated interest among scientists and the information specialists who serve them. Over the next 10 years, STN Express became the most popular means of searching databases on the STN network.
Baker announced his retirement in 1986 after serving for 28 years as CAS director. Many significant developments in CAS history occurred during Baker's directorship, including the introduction of Chemical Titles, the CAS Chemical Registry System, CAS Online, and the STN International Network. Moreover, the character of CAS evolved greatly during this period as the organization once known almost solely as the publisher of Chemical Abstracts became Chemical Abstracts Service, a leading database producer and innovator in electronic information services.
It was during the Baker era, in 1965, that CAS moved off the OSU campus to a site of more than 50 acres just north of the university along Olentangy River Road. This is CAS's current location, which now encompasses three buildings, including a state-of-the art data center completed in 2001.
Ronald L. Wigington, who formerly served as CAS director of R&D, succeeded Baker as CAS director in 1986, and the organization continued its emphasis on the delivery of electronic information services. During the same year, a third STN Service Center was established in Tokyo as the result of an agreement with the Japan Information Center for Science & Technology (JICST).
At this pivotal point, the business environment had become a critical factor in CAS history. CAS's successful entry into the online industry in the 1980s evoked a potentially disastrous response from a rival information provider. By launching STN, CAS confronted formidable competitors, as versions of the CAS databases were available through several commercial online services, including BRS, Dialog, Datastar, ORBIT, and Telesystemes-Questel. Nevertheless, Dialog Information Services filed suit against ACS in June 1990, alleging that ACS was violating federal antitrust laws by stifling competition. At the heart of this complaint was CAS's refusal to license CAS abstracts to Dialog for online access, a move that would have eliminated the one unique marketing advantage for CAS databases on STN. The lawsuit sought in excess of $150 million in actual and punitive damages in addition to the ability to license the abstracts. ACS responded with a countersuit, claiming Dialog had failed to pay fully the fees it owed CAS for the use of its files.
The Dialog-ACS dispute dragged on for years and engendered not only legal expense for both parties but also considerable acrimony, played out in the press and at public meetings. A number of information professionals sided with Dialog, a stance that mystified CAS, which believed its efforts to enter the online business gave information users an additional choice in a marketplace up to then dominated by well-entrenched commercial services, especially Dialog itself.
Concomitant with its legal and management problems, CAS faced another financial crisis in the late 1980s and early 1990s. Chairman of the Board Joseph A. Dixon said in the 1991 ACS Annual Report, "The Board of Directors became concerned in late 1989 with not only the ability of CAS to prosper, but even its ability to survive given the fast-changing and highly competitive information industry." A study group of chemical and information industry executives appointed by the board concluded that "major changes in the administrative and governance structure of CAS were critically needed."
Following up on the findings of the study, the board instituted a governing board in 1991 that would take on most of the duties previously undertaken by CCAS. The governing board was to include ACS members and nonmembers with executive experience in both industry and academia. The chairman of the ACS Board of Directors, the ACS executive director, and the CAS director would also be members of the governing board. And so, the first governing board, assembled in 1991, consisted of ACS Executive Director John K Crum, Chairman of the Board Dixon, CAS Director Wigington, and several distinguished members of the information community: Joseph Bremner, Theodore Brown, Lester Krough, and Carlos Cuadra.
Changes were not long in coming after the empowerment of the governing board. Its intent was to emphasize the operation of CAS as a business while continuing to support the not-for-profit mission of ACS. When Wigington moved to a new position as ACS director of information technology, Clayton F. Callis, a former Monsanto executive and chairman of the ACS Board of Directors, was appointed interim CAS director in 1991. Callis ably managed CAS in his temporary capacity while the search for a new permanent director proceeded. This culminated in the appointment of Robert J. Massie in the following year.
Massie became the fifth CAS director in 1992 (his title was changed by ACS to president, CAS, in 2003) and brought with him to the organization an extensive fund of experience in the publishing business. Before joining CAS, he served as president and chief executive officer of Gale Research Inc., a subsidiary of the Thomson organization. Earlier, Massie held senior executive positions with Torstar Corp., the largest newspaper and book publishing company in Canada. A major challenge facing the new director was the financial crisis CAS was then experiencing, partly as the result of the fiscal constraints affecting many customers in both industry and academia during a difficult economy. At the same time, competition in the information industry was growing day by day.
Finances and the ongoing Dialog suit were not the only challenges to be dealt with at CAS. ACS was also being sued by the Columbus school board, which had challenged CAS's not-for-profit status and its exemption from paying the property taxes that helped to fund the Columbus public schools. Yet another difficulty for CAS management in those days was an effort by some staff members to organize a union at CAS, focusing on the hundreds of editorial scientists who build CAS databases-thus, a movement affecting the very heart of the organization.
Massie acted quickly to address these various problems. By January 1993, CAS and its management and legal team had worked out a settlement with the Columbus public schools that entailed paying the schools a one-time contribution and ongoing annual support in return for the termination of the claim and an agreement to pursue no other challenges of CAS's tax status in the future. In October 1993, ACS and Dialog announced the resolution of their dispute, as well. Both parties resolved to refocus their efforts on serving information users and even to consider future cooperative efforts. Soon thereafter, the union activity faded out at CAS. Better communication between management and staff was most likely an important factor in this peaceable resolution.
A number of changes were instituted to put CAS on sound financial footing and to solidify its leadership position in the scientific information industry. Among the most important steps were a policy of moderating price increases, an increased emphasis on customer relations, and a greatly strengthened new product development effort. The success of CAS's change in direction was reflected in a statement of ACS Board Chairman Paul Walter just two years after Massie's appointment: "In the past, quite often, customers used CAS because they had to use CAS; now, they're using CAS because they want to use CAS" (C&EN, Oct. 31, 1994, page 19).
After the move to an electronic workflow and a number of improvements in editorial operations, CAS made great strides in the timeliness and currency of the information it delivers to the public. CAS updates its principal databases with new document and substance records daily. CAS databases are now the industry's most current for patent information, with preliminary patent records available within two days after the patents are issued by nine core patent-issuing authorities. Fully indexed and processed records from these same patent offices are available within 30 days. The USPATFULL database provides convenient online access to full text and CAS indexing for published U.S. applications, as well as granted patents.
CAS IN THE END-USER REVOLUTION. Even before the Internet came into its own, CAS recognized the time was ripe for a new paradigm in scientific information search and retrieval. The proliferation of personal computers, efficient networks, and computer-literate scientists created a new opportunity to reestablish a direct connection between CAS databases and the chemists that CA was born to serve. This was the dawn of the much-anticipated end user revolution. CAS responded to this opportunity in 1995 by launching SciFinder, a revolutionary research tool that set out to "change the way scientists conduct research."
CAS had begun to evaluate the idea of a new desktop research tool in August 1991 and formed a new product development group in September 1992. The team included 40 staff members from the organization's research, information systems, marketing, and new product development units. Customers also were involved early in the development process. By March 1993, CAS began meeting with key individuals at large chemical companies, including information directors, research directors, and more than 300 scientists in the research groups.
After assessing user needs, CAS decided a key objective of the new product would be to give scientists more control over the direction they wanted to follow in their research. The new program would find multiple answer sets, providing possible answers despite any syntax errors, spelling mistakes, or other problems that novices encounter. CAS also wanted to give chemists faster access to scientific journals and an easy way to become aware of the new studies regularly recorded in the ever-growing CAS database. Since the vast majority of scientists were unfamiliar with information retrieval techniques and "command language," the process of asking a question had to be conversational and intuitive. This called for a graphical user interface (GUI) that busy scientists could start using with virtually no training.
SciFinder underwent extensive testing by major chemical and pharmaceutical companies worldwide, beginning with the first prototype in July 1993. When it was publicly launched in 1995, SciFinder clearly demonstrated a new and simpler approach to finding information. The basic interface is still in effect today. SciFinder's opening screen presents several pathways to knowledge: Explore, Browse, and Keep Me Posted. From a personal computer or workstation, the user explores information easily by chemical substance (expressing the query as an exact structure; molecular formula; or substance ID, such as a chemical name or CAS Registry Number), reaction, substructure, research topic, author, or document identifier. Searches are conducted in a user-friendly, question-and-answer format, using internal dictionaries and a thesaurus to look up key terms in the request phrase and increase the search power. Users can also browse through the tables of contents of more than 1,000 journals; a Keep Me Posted function monitors new literature on current subjects and alerts users to recent arrivals.
SciFinder is designed to permit easy, conversational interaction with the search system, despite the sophisticated algorithms that come into play in the background. Unlike "command line" search systems that require the user to anticipate the variety of terms a database may contain for a certain concept, SciFinder automatically takes synonyms into account. For example, if a scientist wishes to explore research involving heart disease, he or she could ask the question in SciFinder by using a natural language expression and not even need to specify terms such as "cardiovascular disease" that may have been used in the literature in place of "heart disease." SciFinder's behind-the-scenes intelligence takes the appropriate synonyms into account.
CAS's CASREACT database, which had been available on STN since 1988 with limited use, became a popular feature in the new SciFinder environment, and CAS significantly expanded its content to its current level of more than 13 million single and multistep reactions from more than 600,000 records from journals and patents.
A version of SciFinder for chemistry students and faculty, SciFinder Scholar, was launched in 1998. This product is based on the same easy-to-use interface that distinguishes SciFinder but is especially adapted for campuswide use, for which the university may arrange access for a number of concurrent users.
SciFinder was introduced before the Web became pervasive in the latter half of the 1990s, and SciFinder's implementation as a client-server product rather than a Web resource permitted the inclusion of more search capabilities and better security. However, CAS was quick to embrace Web technology and create several Web-based services:
- ◾The first CAS publicly available website, 1994.
- ◾STN Easy, 1996, the first STN Web interface, offering simplified access to a subset of STN databases.
- ◾ChemPort, 1997, developed as a cooperative service by CAS, the ACS Publications Division, and STN International service centers to provide links from database records to full-text journal articles and patents on the Web.
- ◾STN on the Web, 1999, with the full range of STN databases and search capabilities.
- ◾CA Selects on the Web, 2000, including electronic versions of all titles in the CA Selects series of current-awareness bulletins.
A Web version of SciFinder is currently under development at CAS.
An important opportunity afforded by the Web was the ability to connect secondary information such as CAS database records with primary literature and other full-text documents in electronic form. CAS has always led scientists to chemistry-related literature and patents, since the first issue of CA. However, the possibility of immediately taking a researcher from an abstract to the text of the original document it describes added an entirely new dimension to information retrieval, and CAS and other information providers were quick to recognize the potential.
Accordingly, a new concept in information service called the ChemPort Connection was born in 1997 through an initiative of CAS, the ACS Publications Division, several other publishers, and CAS's STN partners. Once a user of CAS electronic services has performed a search and found database records of interest, he or she can simply click on the links provided and find the full text on the website of journal publishers or patent offices.
Along with the development of search services, the CAS databases have grown and evolved to reflect the changing character of chemistry. In 1990, CAS made several important additions to the information available in the CAS Registry. For the first time, biosequences were accessible, including peptide/protein sequences encountered by CAS analysts as they index journals and patents, in addition to millions of nucleic acid sequences from the GenBank file. The sequence types include those that are chemically modified and those with "uncommon" amino acids or nucleotides. Each different sequence is given a unique CAS Registry Number.
Also in the 1990s, CAS enhanced the Registry by completing a backfile registration project that assigned CAS Registry Numbers to hundreds of thousands of substances indexed prior to the introduction of the CAS Registry in 1965. In 2003, complementing its massive collection of substance records, CAS added millions of index entries to CAplus to make chemical substances and subjects searchable back to the first issue of CA in 1907. Today's Registry also includes experimental and predicted (calculated) property data, as well as tags pointing to references containing experimental property data.
In the early years of the new century, CAS has continued to add new options for exploring the literature in its databases. Beginning in 2000, an important extension of CAS's traditional handling of information has been the inclusion of cited references in CAS databases. These citations in CAS records are for both journal articles and patents and for both the publications cited by a given document ("cited" references) and those which cite the document ("citing" references). In 2006, more than 160 million citation records were available.
ANOTHER COMPETITIVE ISSUE ARISES. In 2005, another competitive crisis came to the fore, but this time it originated from a U.S. government agency. Part of the National Institutes of Health Molecular Libraries Initiative is a database and service called PubChem, begun for the purpose, as originally described, of storing records of small organic molecules discovered as the result of screening substances for biological activity.
NIH was to make the database available free of charge and include links to PubMed, another free NIH database that offers abstracts of biomedical literature. CAS and ACS management were concerned that this taxpayer-financed project had the potential to compete unfairly with the CAS Registry system, considering that NIH appeared to be including substances of more than biomedical interest.
ACS executives brought these concerns privately to NIH management, but news of the discussions was apparently leaked to Business Week, which published a story titled "Whose Molecules Are These?" on April 25, 2005. The story characterized ACS's questioning of the PubChem initiative as "fierce opposition," and when the text of this news item was posted to the CHMINF discussion list somewhat in advance of the magazine's publication, it set off a flurry of messages, most of them critical of the supposed ACS position.
CAS claimed that by extending PubChem to cover substances of all kinds, NIH was violating a long-standing policy of the federal government not to compete with industry. On the other side, PubChem advocates represented the project as a benefit of the Open Access movement, with the promise of making medical information freely available for the public good. The controversy soon spilled into the wider media channels, and the ACS-NIH dispute was the subject of articles in Science, Nature, and C&EN, among other outlets.
A C&EN story ("NIH and ACS Spar Over PubChem," June 13, 2005, page 23) presented both sides of the argument. Massie summarized the objection from CAS's perspective: "We are not saying that PubChem as it exists today will destroy our business. We are saying that it is a platform that can be built upon to eventually replicate the Registry." Representing the NIH view, Jeremy M. Berg is quoted, making the point that there is "no reason to duplicate what CAS has available." However, Berg goes on to say, "if the information is already available, not making it accessible is a violation of our mission."
Both the House and Senate, in approving NIH appropriations later that year, cautioned the agency in a way supportive of the ACS position. A House report, for example, advises NIH "to work with the private-sector providers to avoid unnecessary duplication and competition with private-sector chemical databases."
Adherents on both sides of the debate have seen the congressional action as vindication of their own positions. Meanwhile, PubChem has continued to grow and, by early 2007, contained records for about 10 million substances.
MOVING ON FROM HERE. CAS has continued to extend its reach further back in time with the "Scientific Century" project, which has added information from many early-20th-century and late-19th-century articles from ACS journals and others, as well as U.S. patents available in CAplus. These additional records are for papers from the initial volumes of the Journal of the American Chemical Society and the Journal of Physical Chemistry, as well as other important articles published between 1900 and 1912 that were also cited in CAS databases as recently as 1998 or later (thus indicating their ongoing significance).
Recently, CAS acquired the rights to incorporate information from Chemisches Zentralblatt into CAS databases. As of this writing, CAS has made more than 100,000 records for 19th-century and early-20th-century literature available online.
More recent product development efforts by CAS have involved data-mining techniques and in-depth database exploration, in which a wealth of relationships between chemical substances, biological activities, and thousands of research concepts exist. Such applications present exciting prospects, made possible by the critical mass of information in CAS databases. STN AnaVist is an analysis and visualization product introduced in 2005 that has made significant strides in supporting information professionals, whose role has gone beyond the traditional search function. Those professionals now serve largely as management advisers, recognizing patterns and trends in the research activity represented in scientific terrain that one can view so broadly in the scientific databases.
Looking back from the vantage point of the 21st century, one wonders what the scientists whose work was chronicled in the early issues of CA might think about the science of today. Marie Curie would perhaps be gratified to know about therapeutic and diagnostic applications of radioactive materials in the 21st century. Fischer might be pleased to see that proteins are more than ever a productive focus of research. One can imagine Lumière's amazement at the thought of transmitting digital color photos from Mars. And what about Einstein? Possibly, he would be intrigued to see a CA abstract from the 1990s that reports, "Recent interest in time machines has been largely fueled by the apparent ease with which such systems may be formed in general relativity."
But there is no need to wait for the construction of a time machine to embark on enlightening journeys. Many insights from the past remain fertile ground for inquiry, and the shape of future breakthroughs is most likely foreshadowed in the latest information available from CAS. Researchers can travel back and forth through ideational space and time easily, in the pages of Chemical Abstracts and the databases it engendered.
Eric Shively has worked in Communications at CAS for 29 years.
- Chemical & Engineering News
- ISSN 0009-2347
- Copyright © American Chemical Society