Issue Date: February 4, 2008
It's All About Access
MORE AND MORE scientific journal articles, books, and data are flowing into online archives and databases, potentially broadening access to the material for both scientists and the public. This trend raises issues such as who should have access to what material, what should be saved for the long haul, and how best to preserve the information.
Without proper precautions, a disastrous loss of access to old data could occur. Just think how hard it is to find equipment to play an eight-track tape. Even accessing the data on a 5.25-inch floppy disk is problematic. Skip ahead a century or two, through multiple transformations in software and hardware, and you'll get a sense of the nightmare bedeviling data preservationists.
Other stakeholders are more concerned with providing current access. Google, for instance, is scanning the book collections of several major libraries so readers can view the books online and then buy or borrow them. The National Institutes of Health will soon require its grant recipients to deposit their journal articles in PubMed Central, its free digital archive of biomedical and life sciences literature.
Some publishers welcome the enhanced exposure that these databases proffer. Others believe that Google and NIH are paying inadequate attention to intellectual property rights.
Recent legislation has brought the disagreements with NIH to a head. In December 2007, President George W. Bush signed a bill including a provision requiring NIH-funded investigators to deposit electronic copies of their accepted, peer-reviewed manuscripts in PubMed Central. The manuscripts must be made publicly available in the archive within a year of publication. NIH's previous policy had called for voluntary submission of manuscripts. In January, NIH laid out the specifics of the new process, which affects manuscripts accepted for publication on or after April 7 (C&EN, Jan. 21, page 10).
Allan Adler, vice president for legal and governmental affairs at the Association of American Publishers (AAP), says the new policy is "unprecedented and inconsistent with important U.S. laws and policies regarding the conduct of scientific research and the protection of intellectual property rights." The policy "allows the agency to take important publisher property interests without compensation, including the value added to the article by the publishers' investments in the peer review process and other quality-assurance aspects of journal publication." ACS is a member of AAP.
OPEN ACCESS is also a hot issue in Europe. Last November, the Council of the European Union said it supported the concept of free online access to "scientific output resulting from publicly funded research ... under economically viable circumstances, including delayed open access." But it stopped short of mandating such access.
Another European body, the European Research Council, went further. The EU established ERC a year ago to fund investigator-initiated cutting-edge basic research in science, technology, and other areas. Last December, ERC released guidelines pertaining to publications authored by its grantees. "In the age of the Internet, free and efficient access to information, including scientific publications and original data, will be the key for sustained progress," according to the guideline document. ERC added that "access to unprocessed data is needed not only for independent verification of results but, more importantly, for secure preservation and fresh analysis and utilization of the data."
ERC then declared that peer-reviewed publications based on research it funds "must be deposited on publication into an appropriate research repository where available, such as PubMed Central or an institutional repository, and subsequently made open access within six months of publication." And it stated that primary data such as nucleotide/protein sequences or macromolecular atomic coordinates must be deposited in "relevant databases" within six months of publication.
Some organizations are working to ensure that data persist once they're deposited in such an archive. The National Science Foundation, the Andrew W. Mellon Foundation, and other sponsors have formed an international Blue Ribbon Task Force on Sustainable Digital Preservation & Access, which held its first meeting in Washington, D.C., in January. The task force is charged with finding an economically viable plan for preserving and accessing digital data as the information migrates through computer upgrades and format and storage changes in the coming decades.
NSF is encouraging digital preservation efforts through other means as well, including its new Sustainable Digital Preservation & Access Network Partners (DataNet) program. The agency says the program is designed to develop new types of technically and economically sustainable organizations that will "provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline," among other goals. NSF notes "data" can run the gamut from text and numeric data to software, models and simulations, video, and even websites.
The agency is currently reviewing preliminary DataNet proposals submitted by academic institutions and by nonprofit, nonacademic organizations engaged in scientific research and education. Full proposals will be due in March. NSF expects to invest up to $100 million in the project over a five-year period.
SIMILAR EFFORTS are under way worldwide. For instance, the Dutch government is funding research and development at the National Library of the Netherlands to ensure permanent access to the material in the library's e-Depot digital archive. According to the library, the R&D is intended to preserve access to digital material, "which would otherwise be threatened by rapidly evolving software and hardware platforms as well as media decay." The archive was established in 2003 and contains more than 10 million digital objects drawn from the arts, humanities, and social sciences; science, technology, and medicine; and digital culture. The library has made e-Depot storage services available to publishers worldwide who wish to participate in the project, including Elsevier and the International Union of Crystallography. Publishers determine access restrictions for the material they submit to the archive.
The National Library of the Netherlands is also coordinating the Alliance for Permanent Access, which was launched in November 2007. The group consists of European research institutes, research funders, national libraries, and international publishers, including the European Science Foundation, CERN, and the Max Planck Society. The library says the alliance plans to establish a "European infrastructure to secure permanent access to the digital records of science."
In addition, the National Library of the Netherlands is participating in the Digital Repository Infrastructure Vision for European Research (DRIVER). Funded by the European Commission, this project is integrating multiple repositories from many institutions to create a virtual network of open-access scientific information.
But not all e-resource projects are so complex. For instance, several projects are under way to digitize books-including chemistry-related volumes-and make them more accessible for both scientists and the public.
Examples include the Library Project run by Google, which is working with several major libraries to scan their collections into its online database. Users find books through Google Book Search, which can track down any type of book, including fiction and nonfiction, scholarly and reference, and out-of-print and rare books. Examples from the chemistry section of the site include "Carbon Nanotubes: Properties and Applications."
Clicking on a search result provides bibliographic information about the book and in many cases a few sentences showing the search term in context. Additional information can include keywords, chapter titles, and a list of related books. A few sample pages, or even the entire book, can be viewed if the author or publisher has given permission or the book is no longer copyrighted. Links to stores and libraries allow the user to buy or borrow the book.
The Association of American Publishers and others have sued Google for copyright infringement. Google maintains that it's not infringing copyright because it allows publishers to opt out of the program.
The U.K.'s Joint Information Systems Committee (JISC), which funds many archive projects, is digitizing numerous resources for free online access, including dissertations. JISC acknowledges that the dissertation project "knowingly infringes copyright" but notes that an author can have a dissertation removed from the collection and the service is not for profit.
Other organizations have avoided such controversy altogether. They include the Open Content Alliance (OCA), which is scanning out-of-copyright books for free online access. It's supported by Yahoo!, the Internet Archive, and several libraries. The online collection will also include multimedia works.
OCA and some major libraries are also providing resources for Microsoft's book-scanning project. The software company's developmental Live Search Books service provides free online searching of and access to books that are out of copyright or have been provided by publishers.
Of course, some books and data are more valuable than others. Given that archival resources are finite, those entrusted to safeguard the digital legacy of science will face difficult decisions about what digital materials to preserve, and for how long.
- Chemical & Engineering News
- ISSN 0009-2347
- Copyright © American Chemical Society