If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.



House Cleaning

Coping with errors, ambiguity, and fudging in the realm of citations

by Sophie L. Rovner
May 19, 2008 | A version of this story appeared in Volume 86, Issue 21

Impact factors and other citation metrics are becoming more and more important in shaping the fates of researchers and their institutions and the journals in which they publish. The organizations that create and use these metrics are, therefore, taking steps to ensure the statistics are as accurate and meaningful as possible.

When assigning credit for a citation to a particular author, institution, or journal, one of the toughest things to deal with is citation errors in the original publication, says James Pringle, vice president of product development for Thomson Reuters' scientific business, which developed the impact factor and several other metrics. The name of the author or institution might be misspelled, for instance. Confusion can arise from another source, Pringle says. "If you have two people, both of whom are John Smith, how do you know which papers go with which author?"

Thomson Reuters uses computerized techniques to compensate for such errors and ambiguities. For example, if the system finds that two papers authored by a John Smith show the same e-mail addresses or cite a similar set of references, it's likely the papers were written by a single person rather than two different authors, Pringle says.

Thomson Reuters is also developing methods that "enable researchers to take control over their presentation to the world and essentially validate and verify the accuracy of what we are doing," Pringle says.

For example, this January, the company introduced a Web-based tool called that assigns a unique identification number to every author. The ID is "a persistent link that they can paste anywhere they want," including into their manuscripts, Pringle says.

Using the site, authors can confirm that Thomson Reuters' list of their papers is correct and complete. Authors can also use the site to publicly post a list of their papers. Thomson Reuters will update citation counts for the listed papers on a weekly basis.

Thomson Reuters is also supporting the development of methods to uniquely identify institutions. One such system is the Ringgold identifier. This ID system is being developed by a group including Ringgold Inc., which provides support services to publishers and other clients involved in the journal supply chain.

At the same time that metrics producers are enhancing accuracy, they're also working to safeguard their methods from would-be cheaters.

Although Pringle claims that "it's very hard to fudge citation numbers," there are ways to try to manipulate the system. Some journal publishers reportedly encourage authors to include citations to the journal in their manuscripts. And authors themselves sometimes include excessive citations to their own work.

But Pringle notes that "self-citation in and of itself isn't bad. For authors, citing yourself is a good practice if you are doing the best work in the field. Likewise, if you're publishing an emerging journal in a new field, and most of the good work is going on within your journal, the journal is going to have a high level of self-citation."

Thomson Reuters studied self-citation by journals and found that it doesn't generally "lead to some huge change in the impact factor," Pringle says. The company makes adjustments to its database for any exceptional cases it uncovers. Similarly, with individual authors, "it's rare that high levels of self-citation lead to high numbers of total citations," Pringle says.

A scientist could also try to fudge the numbers by setting up a citation ring, in which several researchers agree to cite each others' papers. "That's more an urban legend than anything else," Pringle says. Nevertheless, "groups of scholars do tend to cite each other consistently, and that's because they're working together, they're building up a new field," he says. "That's what used to be called the 'invisible college.' Scientists naturally collaborate, and those collaborations are reflected in citation patterns. That's not fudging; it's an accurate reflection of how science evolves."

Usage factors, too, will be open to distortion. In fact, publishers will have to be monitored so they aren't tempted to "break some of the rules" to increase their journals' usage factors, says Richard Gedye, research director for the journals division at Oxford University Press in the U.K.

Say "you've got a popular article, so you put a link to it on the home page of the journal," Gedye says. "That seems to me to be a perfectly reasonable marketing device to maximize the chances of anybody who might be interested in seeing it and clicking on it," he says. On the other hand, a publisher could set up the journal site so it was impossible for a user to view the PDF version of an article without first downloading the HTML version. Such a tactic would record two "uses" for one user and "would be inappropriate because that would distort usage," he explains.

Authors also might try to influence their papers' usage. "It's possible for authors, and I know one who's admitted to having done it just to demonstrate that it was possible, to devise a computer program that will download their articles 3,000 times in five minutes," Gedye says. To some extent, this deception can be foiled by the counting rules devised by the Counting Online Usage of Networked Electronic Resources group (COUNTER). Gedye chairs the group, which is helping develop the usage factor. COUNTER?s method ignores download requests for the same article that originate from the same IP address if they occur too close together in time.

Sometimes usage data can be distorted unintentionally. For instance, when "you download a page with lots of links, Google can be configured to automatically request those links even if you haven't asked for them, so if you do ask for them, you get them quicker," Gedye says. That sort of retrieval, which is known as prefetching, shouldn't be counted as usage, he says. The next version of COUNTER's guidelines will specify that prefetched items should be excluded from the usage tallies.

Cover Story



This article has been sent to the following recipient:

Chemistry matters. Join us to get the news you need.