Volume 86 Issue 24 | pp. 50-55
Issue Date: June 16, 2008

Fraud Busters

New tools emerge for detecting and weeding out plagiarism and data falsification in journal articles
Department: Science & Technology
Digital Detective
Garner, shown with the computers he uses in his research, developed software that can be used to screen articles for plagiarism.
Credit: UT Southwestern
Digital Detective
Garner, shown with the computers he uses in his research, developed software that can be used to screen articles for plagiarism.
Credit: UT Southwestern

LIKE THEIR FELLOW manuscript authors in other disciplines, scientists occasionally go over to the dark side. They'll drop an inconvenient data point, copy a paragraph out of someone else's paper, submit manuscripts based on the same set of results to two different journals, or even fabricate their data entirely. Sometimes the cases make a spectacular splash in the lay press. Just think of the international notoriety reaped by South Korean biomedical scientist Woo Suk Hwang when word got out that he had faked the contents of two seemingly groundbreaking stem cell papers in Science (C&EN, Jan. 16, 2006, page 25).

Of course, chemists aren't immune to the allure of taking shortcuts in publishing: Chemistry professor Pattium Chiranjeevi of Sri Venkateswara University in Tirupati, India, apparently plagiarized or falsified more than 70 research papers published between 2004 and 2007 (C&EN, Feb. 18, 2008, page 37).

Readers or referees for journals are often the ones who detect data manipulation or plagiarism. But journal publishers are increasingly taking responsibility for weeding out fraudulent submissions before the papers can make it into print. Journal publishers are checking for manipulation of images in manuscripts and are beginning to employ software to hunt for plagiarism. They also offer their authors guidance on ethical matters, with varying degrees of detail.

For instance, the "Chemical Professional's Code of Conduct" of the American Chemical Society (which publishes C&EN) states that chemists should "respect the truth" and that "scientific misconduct, such as fabrication, falsification, and plagiarism, are incompatible with this code."

The editors of ACS journals have established a set of "Ethical Guidelines to Publication of Chemical Research," which state that "an author's central obligation is to present an accurate account of the research performed, as well as an objective discussion of its significance." Furthermore, "an author is obligated to perform a literature search to find, and then cite, the original publications that describe closely related work."

Individual ACS journals also offer their own set of guidelines. The Journal of the American Chemical Society, for instance, specifies on its website that submission of a manuscript implies that the content "has not received prior publication and is not under consideration for publication elsewhere."

Other organizations list detailed requirements for authors. For example, the American Physical Society's "Guidelines for Professional Conduct" state that following publication, research data "should be retained for a reasonable period in order to be available promptly and completely to responsible scientists." These standards should make it harder for an author whose article comes under suspicion to claim that the original data have been lost or no longer exist.

Nature Publishing Group (NPG) also lays out an extensive set of rules in the "Guide to Publication Policies of the Nature Journals." The policies cover subjects including availability of data and materials, image integrity and standards, duplicate publication, and plagiarism and fabrication.

The publisher's guidelines state that plagiarism occurs "when an author attempts to pass off someone else's work as his or her own. Duplicate publication, sometimes called self-plagiarism, occurs when an author reuses substantial parts of his or her own published work without providing the appropriate references. This can range from getting an identical paper published in multiple journals to 'salami slicing,' where authors add small amounts of new data to a previous paper."

NPG makes it clear that such behavior is unacceptable. The publisher acknowledges, however, that "minor plagiarism without dishonest intent is relatively frequent, for example, when an author reuses parts of an introduction from an earlier paper." NPG states that its editors judge any case that comes to light on its own merits.

The guidelines also list several rules for the treatment of images submitted with a manuscript, emphasizing that processing should be kept to a minimum.

Likewise, the American Association for the Advancement of Science states on its website for Science that the journal "does not allow certain electronic enhancements or manipulations of micrographs, gels, or other digital images. Figures assembled from multiple photographs or images must indicate the separate parts with lines between them. ... Selective enhancement or alteration of one part of an image is not acceptable." An inappropriate change might be as minor as erasure of a blemish or as blatant as an attempt to make it appear that two different versions of the same image are really two different images.

Science checks for manipulation in submitted figures when a paper comes back from revision. First, a Science employee uses Adobe's Photoshop digital image software to look for alterations, Executive Editor Monica M. Bradford says. If that initial scrutiny arouses any misgivings, the figures are further analyzed by the journal's deputy editors and art department.

"If we think there's a problem then we go back to the author" and ask for the original data, Bradford says. "Sometimes, it's an honest mistake. But sometimes you know that it had to be done deliberately, because," for instance, "there's no possible reason why the author would cut and paste this way." In that case, editors might call upon the research integrity officer or dean at the author's institution to get the original data.

ONE OF THE MOST notorious incidents of data manipulation in Science involved stem cell researcher Hwang. The Seoul National University professor fabricated much of the content of two stem cell research papers, which were ultimately retracted by Science. Bradford credits readers with detecting the fraud. Following that incident, the journal got serious about screening papers for data manipulation.

Still, the process isn't foolproof. Bradford says a paper recently slipped through the screen and was published. Again, readers discovered some issues with the paper, this time limited to the figures. The journal is still pursuing that case with the author's institution. "It's a learning process," Bradford concedes.

Reviewers, editors, and readers also have a role to play in looking for plagiarism. Readers sometimes notify the journal that plagiarism has occurred, although Bradford says she can't recall any cases in which Science itself detected a case of plagiarism. "But we had a case where reviewers for Biotechnology Letters discovered a paper that was plagiarizing one of our papers," she says. As it turned out, the author had a habit of plagiarizing others' work. Oddly enough, Science has also received letters to the editor that were plagiarized.

For now, Science isn't using plagiarism-detection software; its staff is looking into such software, but Bradford says some issues would have to be settled before adopting this technology. For instance, a certain degree of self-citation is legitimate in publishing, but software might flag such duplication as plagiarism. "Where do you draw the line?" Bradford asks. Before using plagiarism-detection software, the journal would need to "come to a clear sense of how we would use it and how we would apply it," she adds. "We just haven't had the time to really test it and figure out how we would implement it so that we knew it was fair and not a waste of time."

If Science adopts a manuscript-screening strategy, it probably wouldn't screen all 12,000 of the manuscripts that gush through its submissions pipeline each year, Bradford says. Instead, the journal would most likely test only the manuscripts that make it through the review process.

IN ALL, Bradford estimates that Science retracts a couple of papers per year as a result of misconduct. But even "a couple a year is disturbing," she says. "You don't want to see any misconduct. It really undermines the public trust in science." So even though Bradford and other editors of science journals don't see a lot of problematic manuscripts, each one can reap a huge amount of damage that takes a long time to repair.

Over the past decade, the Royal Society of Chemistry (RSC) has had about six cases in which written material or data have been plagiarized, estimates Janet L. Dean, one of the society's three publishers. "Sometimes authors have copied the introduction section from another person's article, presented a proposal as their own, or included tables of experimental data or figures from other papers," she says. But Dean adds that she "can't remember a case in the last decade where an article contained falsified data."

Professional in-house editors at RSC carry out an initial review of each manuscript submitted for publication, Dean says. The editors look out for "possible breaches of ethics such as multiple submission, undue fragmentation, or authorship issues," she adds. Those issues could include self-plagiarism or submission of work without the permission of coworkers. Referees for the RSC journals look out for plagiarized material or falsified data in manuscripts. They might also discover that the name of a researcher who contributed significantly to the work has been omitted or that the name of one whose contribution didn't warrant it has been included, Dean says. Such matters might also be raised by readers after publication.

For the future, RSC is keeping tabs on the CrossCheck plagiarism detection service that CrossRef is scheduled to launch on June 20, Dean says. Both RSC and ACS are members of CrossRef, a nonprofit association of more than 2,500 publishers. CrossRef operates an online linking system that allows readers who click on a reference in one publisher's journal to jump to the cited content in a different publisher's journal.

CrossRef has set up a partnership with iParadigms, an Oakland, Calif., firm that provides Web-based services to check documents for originality. For a fee, CrossCheck will use iParadigms' online iThenticate service to check manuscripts submitted for publication against the CrossRef database of full-text journal articles provided by member publishers. Manuscripts can also be compared with content on the Internet. iThenticate generates an "originality report" that highlights suspect text and provides a side-by-side comparison with similar text in other documents.

"The system doesn't actually detect plagiarism," CrossRef Executive Director Ed Pentz notes. "Plagiarism implies intent, and that's not something computers can identify. Instead, it identifies matching text in different documents, and then a knowledgeable person has to look at the results as part of the editorial screening process."

Trish Groves, deputy editor of the British medical journal BMJ, adds that publishers are unlikely to screen all submissions with a tool such as CrossCheck because every search that detects an overlap with another publication would have to be checked by an editor or some other staffer. She thinks it's more feasible for a publisher to use such tools to check a random subset of submissions or of papers on the verge of acceptance—and to let authors know about the screening process. "That might deter people from sending you plagiarized work," Groves says.

Truth Will Out
High-profile cases of data manipulation by authors including South Korea's Hwang led to the retraction of multiple papers.
Credit: Reprinted by permission from Macmillan Publishers Ltd. (Nature 2004, 431, 211), © 2004. Reprinted with permission from Science (2005, 308, 1777, and 2001, 294, 2138), © 2005 and 2001 AAAS
Truth Will Out
High-profile cases of data manipulation by authors including South Korea's Hwang led to the retraction of multiple papers.
Credit: Reprinted by permission from Macmillan Publishers Ltd. (Nature 2004, 431, 211), © 2004. Reprinted with permission from Science (2005, 308, 1777, and 2001, 294, 2138), © 2005 and 2001 AAAS

The BMJ Publishing Group, which publishes BMJ, was one of six publishers that participated in a CrossCheck pilot test from October 2007 to January 2008. During that time, the test detected only one plagiarized paper among the manuscripts that were ready for acceptance by three of the BMJ Publishing Group's journals, according to Groves.

CrossCheck's access to CrossRef's database is valuable, Groves adds. That access means that the tool can scan the full text of articles rather than just check the abstracts, as some free screening tools do.

These free tools include the Web-based eTBLAST program designed by Harold (Skip) R. Garner Jr., a professor of biochemistry and internal medicine at the University of Texas Southwestern Medical Center in Dallas, and colleagues. eTBLAST was developed to search through the contents of Medline, the National Institutes of Health's online database of biomedical journal citations and abstracts.

ORIGINALLY, the software was intended to give researchers a way to search for potential collaborators or find previous publications on a particular topic. But the developers, who have since added the capability to search additional online text databases, also realized that eTBLAST could uncover duplication in the literature.

Users insert text such as the abstract of an unpublished paper or a paragraph from a grant proposal into a box on the eTBLAST site. By looking for duplication of key words and also comparing word proximity and order, the program can track down and provide abstracts from similar papers in Medline.

"We can identify near-duplicate publications using our search engine," Garner says. "But neither the computer nor we can make judgment calls as to whether an article is plagiarized or otherwise unethical. That task must be left up to human reviewers such as university ethics committees and journal editors, the groups ultimately responsible for determining legitimacy."

Garner and colleagues recently ran more than 62,000 Medline abstracts through eTBLAST and then manually checked those that appeared to be duplicates. They concluded that 0.04% of the abstracts that had no authors in common were potential cases of plagiarism, while 1.35% of the abstracts that had authors in common appeared to be duplications (Bioinformatics 2008, 24, 243).

For now, eTBLAST can't detect " 'smart duplication' such as rewording another work while reproducing the substance," Garner and his colleagues write. In fact, there's an arms race of sorts going on. "There are technologies designed to aid authors to evade other similarity detection systems by rearranging sentences and using synonym substitution," they add.

But there might be ways to neutralize such deceptions. "On average only 10% of the references in related articles are shared," according to Garner's team, whereas about 60% of the references in duplicated articles are shared. They suggest that wholesale copying of references is a signature that indicates the main text of the paper was likely copied as well.

The eTBLAST team is continuing its analysis of the Medline content and keeping a record of duplicate citations at a website called Deja Vu. Users of the site can contest a record or submit an apparent duplication to be confirmed.

"As it becomes more widely known that there are tools such as eTBLAST available and that journal editors and others can use them to look at papers during the submission process, we hope to see the numbers of potentially unethical duplications diminish," Garner says.

Chemical & Engineering News
ISSN 0009-2347
Copyright © American Chemical Society

Leave A Comment

*Required to comment