Volume 86 Issue 5 | pp. 26-28
Issue Date: February 4, 2008

It's All About Access

Collections of digital resources expand, enhancing current and ensuring future access to scientific data
Department: Science & Technology
[+]Enlarge
Out of Date
Obsolete hardware and software threaten future access to historic scientific data.
Credit: Chemical Abstracts Service
8605sci1_datacxd
 
Out of Date
Obsolete hardware and software threaten future access to historic scientific data.
Credit: Chemical Abstracts Service

MORE AND MORE scientific journal articles, books, and data are flowing into online archives and databases, potentially broadening access to the material for both scientists and the public. This trend raises issues such as who should have access to what material, what should be saved for the long haul, and how best to preserve the information.

Without proper precautions, a disastrous loss of access to old data could occur. Just think how hard it is to find equipment to play an eight-track tape. Even accessing the data on a 5.25-inch floppy disk is problematic. Skip ahead a century or two, through multiple transformations in software and hardware, and you'll get a sense of the nightmare bedeviling data preservationists.

Other stakeholders are more concerned with providing current access. Google, for instance, is scanning the book collections of several major libraries so readers can view the books online and then buy or borrow them. The National Institutes of Health will soon require its grant recipients to deposit their journal articles in PubMed Central, its free digital archive of biomedical and life sciences literature.

Some publishers welcome the enhanced exposure that these databases proffer. Others believe that Google and NIH are paying inadequate attention to intellectual property rights.

[+]Enlarge
The Twain Meet
The American Physical Society is experimenting with the use of Asian characters for author names.
Credit: American Physical Society
8605sci1_boxart
 
The Twain Meet
The American Physical Society is experimenting with the use of Asian characters for author names.
Credit: American Physical Society

Recent legislation has brought the disagreements with NIH to a head. In December 2007, President George W. Bush signed a bill including a provision requiring NIH-funded investigators to deposit electronic copies of their accepted, peer-reviewed manuscripts in PubMed Central. The manuscripts must be made publicly available in the archive within a year of publication. NIH's previous policy had called for voluntary submission of manuscripts. In January, NIH laid out the specifics of the new process, which affects manuscripts accepted for publication on or after April 7 (C&EN, Jan. 21, page 10).

Allan Adler, vice president for legal and governmental affairs at the Association of American Publishers (AAP), says the new policy is "unprecedented and inconsistent with important U.S. laws and policies regarding the conduct of scientific research and the protection of intellectual property rights." The policy "allows the agency to take important publisher property interests without compensation, including the value added to the article by the publishers' investments in the peer review process and other quality-assurance aspects of journal publication." ACS is a member of AAP.

World Access

Publishers Reach Out To Asia And Developing Nations

Greater access can be achieved in a variety of ways. One way is to make publications more welcoming for users outside the Western Hemisphere.

For instance, the Health InterNetwork Access to Research Initiative (HINARI) provides nonprofit institutions in developing countries with free or low-cost online access to biomedical journals. The International Network for the Availability of Scientific Publications (INASP) manages a similar program for a broader range of journals—including those of the American Chemical Society—and also helps to strengthen local journals.

Publishers are striving to become more inclusive in other ways. For instance, the American Physical Society (APS) is experimenting with a program for Asian authors who publish in the society's Physical Review journals. These authors can now include their names using Chinese, Japanese, or Korean characters in addition to the version in Latin characters.

In a recent editorial about the program (Phys. Rev. Lett. 2007, 99, 230001), APS Editor-In-Chief Gene D. Sprouse lists eight Chinese names that all transliterate as Wei Wang. In this particular case, the addition of the author's name in its original Chinese form would help clear up any ambiguities about which particular Wei Wang authored the paper. The program might be extended to additional languages in the future.

ACS will introduce a similar capability in its journals later this year.

OPEN ACCESS is also a hot issue in Europe. Last November, the Council of the European Union said it supported the concept of free online access to "scientific output resulting from publicly funded research ... under economically viable circumstances, including delayed open access." But it stopped short of mandating such access.

Another European body, the European Research Council, went further. The EU established ERC a year ago to fund investigator-initiated cutting-edge basic research in science, technology, and other areas. Last December, ERC released guidelines pertaining to publications authored by its grantees. "In the age of the Internet, free and efficient access to information, including scientific publications and original data, will be the key for sustained progress," according to the guideline document. ERC added that "access to unprocessed data is needed not only for independent verification of results but, more importantly, for secure preservation and fresh analysis and utilization of the data."

ERC then declared that peer-reviewed publications based on research it funds "must be deposited on publication into an appropriate research repository where available, such as PubMed Central or an institutional repository, and subsequently made open access within six months of publication." And it stated that primary data such as nucleotide/protein sequences or macromolecular atomic coordinates must be deposited in "relevant databases" within six months of publication.

Scientists have loads of data repositories from which to choose, including the Protein Data Bank archive of macromolecular structural data and the Cambridge Structural Database of crystal structures.

Some organizations are working to ensure that data persist once they're deposited in such an archive. The National Science Foundation, the Andrew W. Mellon Foundation, and other sponsors have formed an international Blue Ribbon Task Force on Sustainable Digital Preservation & Access, which held its first meeting in Washington, D.C., in January. The task force is charged with finding an economically viable plan for preserving and accessing digital data as the information migrates through computer upgrades and format and storage changes in the coming decades.

NSF is encouraging digital preservation efforts through other means as well, including its new Sustainable Digital Preservation & Access Network Partners (DataNet) program. The agency says the program is designed to develop new types of technically and economically sustainable organizations that will "provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline," among other goals. NSF notes "data" can run the gamut from text and numeric data to software, models and simulations, video, and even websites.

The agency is currently reviewing preliminary DataNet proposals submitted by academic institutions and by nonprofit, nonacademic organizations engaged in scientific research and education. Full proposals will be due in March. NSF expects to invest up to $100 million in the project over a five-year period.

SIMILAR EFFORTS are under way worldwide. For instance, the Dutch government is funding research and development at the National Library of the Netherlands to ensure permanent access to the material in the library's e-Depot digital archive. According to the library, the R&D is intended to preserve access to digital material, "which would otherwise be threatened by rapidly evolving software and hardware platforms as well as media decay." The archive was established in 2003 and contains more than 10 million digital objects drawn from the arts, humanities, and social sciences; science, technology, and medicine; and digital culture. The library has made e-Depot storage services available to publishers worldwide who wish to participate in the project, including Elsevier and the International Union of Crystallography. Publishers determine access restrictions for the material they submit to the archive.

The National Library of the Netherlands is also coordinating the Alliance for Permanent Access, which was launched in November 2007. The group consists of European research institutes, research funders, national libraries, and international publishers, including the European Science Foundation, CERN, and the Max Planck Society. The library says the alliance plans to establish a "European infrastructure to secure permanent access to the digital records of science."

In addition, the National Library of the Netherlands is participating in the Digital Repository Infrastructure Vision for European Research (DRIVER). Funded by the European Commission, this project is integrating multiple repositories from many institutions to create a virtual network of open-access scientific information.

But not all e-resource projects are so complex. For instance, several projects are under way to digitize books-including chemistry-related volumes-and make them more accessible for both scientists and the public.

Google has scanned many chemistry-related books, which can be viewed and purchased online.
Online Books
Google has scanned many chemistry-related books, which can be viewed and purchased online.
Credit: Google

Examples include the Library Project run by Google, which is working with several major libraries to scan their collections into its online database. Users find books through Google Book Search, which can track down any type of book, including fiction and nonfiction, scholarly and reference, and out-of-print and rare books. Examples from the chemistry section of the site include "Carbon Nanotubes: Properties and Applications."

Clicking on a search result provides bibliographic information about the book and in many cases a few sentences showing the search term in context. Additional information can include keywords, chapter titles, and a list of related books. A few sample pages, or even the entire book, can be viewed if the author or publisher has given permission or the book is no longer copyrighted. Links to stores and libraries allow the user to buy or borrow the book.

The Association of American Publishers and others have sued Google for copyright infringement. Google maintains that it's not infringing copyright because it allows publishers to opt out of the program.

The U.K.'s Joint Information Systems Committee (JISC), which funds many archive projects, is digitizing numerous resources for free online access, including dissertations. JISC acknowledges that the dissertation project "knowingly infringes copyright" but notes that an author can have a dissertation removed from the collection and the service is not for profit.

Other organizations have avoided such controversy altogether. They include the Open Content Alliance (OCA), which is scanning out-of-copyright books for free online access. It's supported by Yahoo!, the Internet Archive, and several libraries. The online collection will also include multimedia works.

OCA and some major libraries are also providing resources for Microsoft's book-scanning project. The software company's developmental Live Search Books service provides free online searching of and access to books that are out of copyright or have been provided by publishers.

Of course, some books and data are more valuable than others. Given that archival resources are finite, those entrusted to safeguard the digital legacy of science will face difficult decisions about what digital materials to preserve, and for how long.

 
Chemical & Engineering News
ISSN 0009-2347
Copyright © American Chemical Society

Leave A Comment

*Required to comment