Sometimes a name isn’t sufficient to specify an individual, and only a more definitive form of identification will do. A Social Security number, for instance, is necessary to ensure that you alone are credited for money you pay into the system when you’re working and that payments issued after you retire are sent to you and not to a different person who happens to share your name.
The same needs pertain in the world of science, where researchers want to be credited for their work and to be distinguished from others with the same name. Yet, there is currently “no authoritative list of all the researchers in the U.S. with all of their publications, grants, and other achievements” such as patents, mentoring, service, and teaching, says Katy Börner, an information science professor at Indiana University, Bloomington.
A given researcher’s records associated with these activities usually aren’t linked because each activity uses a different identifier, whether it’s a university- or publisher-issued ID or some other number, Börner says. But interlinking an individual’s publication and other data across the Web via a single identifier would be very useful in tenure or funding reviews, searches for potential collaborators or competitors, and citation analyses. These benefits are motivating researchers, publishers, and scientific and governmental organizations to explore the concept of a “unique author identifier.”
Such an identifier—which could be included in papers, data sets, grant applications, and on an individual’s website—could do far more than differentiate between two scientists who bear the same name. It could also serve as a tool to find all the publications by a single researcher, even if the author’s name were misspelled on a paper or recorded as J. Doolittle instead of James Doolittle.
Likewise, if Jane Smith adopted her husband’s name after marriage and subsequently authored papers as Jane S. Williams, she would still be recognized as the same researcher. And if she moved from one institution to another during her career, a unique identifier would keep track of her.
It would also keep track of an author whose name was recorded by different publishers in different ways, such as Mueller and Müller, or the more complex case in which a researcher’s name was transliterated from, say, Chinese to English.
Existing IDs include those issued by researchers’ institutions as well as Social Security, driver’s license, and passport numbers. But these IDs aren’t practical to use as author identifiers because they need to remain private, says Howard Ratner, chief technology officer with Nature Publishing Group.
That means an alternative ID is required. Several possible schemes—including some that chemists are already trying—were discussed this past March at the National Institutes of Health Workshop on Identifiers & Disambiguation in Scholarly Work. Börner co-organized the workshop with Mike Conlon, interim director of biomedical informatics at the University of Florida, Gainesville, and principal investigator for a project funded by NIH to expand VIVO, a free online database that includes researcher profiles.
One possible mechanism for establishing unique identifiers would be to set up a central registry where researchers could obtain an ID when they publish their first paper, Börner says. Researchers would subsequently use the identifier every time they applied for a grant, submitted a paper, taught a new course, and so on.
Whatever organizations end up assigning author identifiers, the programs will need to be sustainable in terms of funding and technology. “You want to put a system in place that can be used for hundreds of years to come,” Börner says. “And ideally, it will be an international system, because science today is global, and many researchers travel from Europe to the U.S., work here for a while, and then might go on to other countries.”
“What is really needed is a global, cross-sector, cross-institutional system that research institutions and all types of publishers can share,” according to MacKenzie Smith, associate director for technology at the Massachusetts Institute of Technology Libraries.
The International Organization for Standardization, better known as ISO, will begin appointing agencies to issue ISNIs in the third quarter of this year. A person or a legal entity such as a company will then be able to apply to an agency for an ISNI, which consists of 16 digits.
The ORCID program was initiated in November 2009 by Nature Publishing Group and Thomson Reuters but has since attracted more than 90 collaborators including individual researchers, institutions, funding agencies, other publishers, and groups such as CrossRef, of which the American Chemical Society is a member. CrossRef is a not-for-profit membership association that registers DOIs (digital object identifiers) for published journal articles.
The ORCID initiative will be run by a not-for-profit organization whose structure and funding are still being worked out, says Ratner, who cochairs the ORCID initiative along with David Kochalko, vice president for strategy and business development in health care and science at Thomson Reuters.
Researchers who obtain an alphanumeric ORCID will be able to set up an online profile page that includes information such as affiliations, publications, contact information, and links to other online information, including profiles they set up through other author identifier initiatives. The researcher will control what information remains private and what can be accessed freely by the public. A prototype of the system will be available for testing this summer.
Ratner says the ORCID system is based on the same computer code as Thomson Reuters’ established ResearcherID system but will be developed independently on different servers. The ResearcherID program will continue even after the ORCID initiative is launched because the two systems offer different features, Kochalko notes. In fact, it’s likely that many different author ID systems will coexist but will be cross-referenced via “see also” links, Börner adds.
Thomson Reuters introduced the ResearcherID program in January 2008. Researchers who register with the service are assigned a unique alphanumeric identifier and can then begin building an online profile populated with a list of publications, patents, grants, and other information, including affiliation and past institution history, URLs for a website or blog, and a description about the researcher, with access controlled by privacy settings. The profile includes links to publications (with article access regulated by subscription status); collaboration networks; and information about citation counts, average citations, and h-index rankings, a measure of scientific impact. Profiles can be searched by name, institution, keywords, or ResearcherID.
Researchers can also set up profile pages through databases such as COS Expertise, Denise Beaubien Bennett, an engineering librarian at the University of Florida, Gainesville, noted at the NIH workshop. COS Expertise is made available to institutional and corporate subscribers by ProQuest, a scholarly information products and services company based in Ann Arbor, Mich. Profiles in the database include ID numbers for individual researchers.
Some databases rely on automated “disambiguation” to sort out which authors are the same and which are different, Bennett added. They include bibliographic databases such as Thomson Reuters’ Web of Science and Elsevier’s Scopus. The Scopus Author Identifier program assigns a unique number to authors whose documents are covered by the Scopus database, which is accessed via subscription. Author Identifier then groups all the documents written by that author, using an algorithm that analyzes affiliation, publication history, subject area, and coauthors.
Bennett also mentioned Author-ity, a University of Illinois, Chicago, tool for identifying Medline articles written by a particular author. Author-ity looks for shared title words, journal name, coauthors, subject headings, affiliations, e-mail address, and author middle initial and suffix (Junior or III, for instance) to determine the likelihood that two articles that share an author name were written by the same person.
“There is a lot of good algorithm development” under way to improve automated identity analysis, Börner says. For now, however, it’s still a challenge to address what she calls “the John Smith issue,” in which two John Smiths work at the same institution or even in the same department. “Similarly, people get married and they change their names, or they might change their names for other reasons, and this cannot be captured automatically,” Börner says. Sorting out who is who therefore “needs to be an effort that combines automatic processing of data and manual disambiguation.”
The free Lattes Database compiled by the National Council for Scientific & Technological Development, in Brazil, represents “a powerful example of how this might work,” Börner says. The site provides access to profiles of more than 1 million researchers and about 4,000 institutions in Brazil.
“Researchers were asked to log in to Lattes to ensure their data were complete and correct, with the incentive that this data would be used in funding decision-making,” Börner says. “The result was one of the cleanest researcher databases in existence today,” she says. “So imagine if the National Science Foundation and NIH would step up and say, ‘We need such a system.’ ” Börner believes that would entice authors in the U.S. “to do the last 10% or 15% of cleanup, which cannot be done automatically.”
The timing for approaching scientists also has to be right, Ratner says. “I have heard that there are only four times that researchers actually care about their identity,” he explains. “The first time is when they get associated with an institution—when they become a student or a professor. The second is when they go for a grant. The third is when they go to get published.” And the fourth, he says, is when somebody misuses another researcher’s ID. “You need to be able to interact with those researchers at those stages.”