Plagiarism Detection Tools Offer a False Sense of Accuracy

The tools that likely brought down Harvard president Claudine Gay are improperly used on students all the time

When Katherine Pickering Antonova became a history professor in 2008, she got access to the plagiarism detection software tools Turnitin and SafeAssign. At first blush, she thought the technology would be great. She had just finished a graduate program where she had manually graded papers as a teaching assistant, meticulously checking students’ suspect phrases to see if any showed up elsewhere.

But her first use of the plagiarism checkers gave her a jolt. The software suggested the majority of her students had copied portions of their essays.

Soon she realized the lie in how the tools were described to her. “It’s not tracking plagiarism at all,” Pickering Antonova said. “It’s just flagging matching text.” Those two concepts have different standards; plagiarism is a subjective assessment of misconduct, but scholars may have matching words in their academic articles for a variety of legitimate reasons.

Plagiarism checkers are built into The City University of New York’s learning management system, where faculty members post assignments and students submit them. As at many colleges throughout the country, scanning for plagiarism in submitted assignments is the default. But fed up with false flags and the countless hours required to check potentially plagiarized passages against the source material Turnitin and SafeAssign highlight, Pickering Antonova gave up on the tools entirely a couple years ago.

“The bots are literally worse than useless,” she said. “They do harm, and they don’t find anything I couldn’t find by myself.”

Some experts agree that Claudine Gay, Harvard’s ousted president and a widely respected political scientist, recently became the latest victim of this technology. She was forced to step down from the presidency after an accuser flagged nearly 50 examples from her writing that they called plagiarism. But many of the examples looked a lot like what Pickering Antonova considered a waste of her time when she was grading student work.

“The Voting Rights Act of 1965 is often cited as one of the most significant pieces of civil rights legislation passed in our nation’s history,” Gay wrote in one paper. Her accuser says she plagiarized David Canon’s description of the landmark law—but as the Washington Free Beacon reported in publishing the allegations, Canon himself disagrees, arguing Gay had done nothing wrong.

The controversy over Gay’s alleged plagiarism has roiled the academic community, and while much of the attention has been on the political maneuvering behind her ouster and the definition of plagiarism, some scholars have commented on the detection software that was likely behind it. The fact is, however, that students, not academics, bear the brunt of the tools’ shoddy analyses. Turnitin is the industry leader in marshaling text analysis tools to assess academic integrity, boasting partnerships with more than 20,000 institutions globally and a repository of over 1.8 billion student paper submissions (“and still counting”).

The companies that are marketing plagiarism detection tools tend to acknowledge their limitations. While they may be referred to as “plagiarism checkers,” the products are described as highlighting “text similarities” or “duplicate content.” They scan billions of webpages and scholarly articles looking for those matches and surface them for a reviewer. Some, like Grammarly’s, are marketed to writers and offer to help people add proper citations where they may have forgotten them. It isn’t meant to police plagiarism, but rather help writers avoid it. Turnitin specifically says its “Similarity Report” does not check for plagiarism.

Still, the tools are frequently used to justify giving students zeroes on their assignments—and the students most likely to get such dismissive grading are those at less-selective institutions, where faculty are overstretched and underpaid.

For her part, Pickering Antonova came to feel guilty about putting students through the stress of seeing their Turnitin results.

“They see their paper is showing up 60 percent plagiarized, and they have a heart attack,” she said.

The era of plagiarism wars

Plagiarism does not carry a legal definition. Institutions create their own plagiarism policies, and academic fields have norms about how to credit and cite sources in scholarly text. Plagiarism checkers are not designed with such nuance. It is up to users to follow up their algorithmic output with good, human judgment.

Jo Guldi, a professor of quantitative methods at Emory University, recently published “The Dangerous Art of Text Mining: A Methodology for Digital History” and jumped into the Gay plagiarism controversy with a now-deleted post on X before Christmas. She pointed out that computers can search for five-word overlaps in text but argued that such repetition does not equal plagiarism: “the technology of text mining can be used to destroy the career of any scholar at any time,” she wrote.

By phone, Guldi said that while she didn’t cover plagiarism detection in her book, the parallel is clear. Her book traces bad conclusions reached because people fail to critically analyze the data. She, too, has used Turnitin in her classes and recognized the findings cannot be taken at face value.

“You look at them and you see you have to apply judgment,” she said. “It’s always a judgment call.”

Many scholars, including those Gay is supposed to have plagiarized, have come to Gay’s defense over the course of the last month, arguing the text similarities highlighted do not rise to the level of plagiarism.

Yet her accuser has identified nearly 50 examples of overlap, pairing her writing with that of other scholars and insisting there is a pattern of academic misconduct. The sheer number of examples—and promise of more to come—helped seal Gay’s fate. And some scholars worry anyone with enemies could be next.

Ian Bogost, a professor at Washington University in St. Louis, mulled in The Atlantic what a “full-bore plagiarism war” could look like, running his own dissertation through iThenticate, a checker run by the same company as Turnitin that is marketed to researchers, publishers, and scholars.

Bill Ackman, a billionaire Harvard megadonor, signaled his commitment to participating in such a war after Business Insider launched its own grenade, publishing an analysis last week that accused his wife, Neri Oxman, of plagiarizing parts of her dissertation. Oxman got her Ph.D. at MIT in 2010 before joining the faculty and then leaving to become an entrepreneur. Suspecting someone from MIT encouraged Business Insider to take a closer look at her dissertation, Ackman posted on X that he was going to begin a “review of the work of all current @MIT faculty members, President Kornbluth, other officers of the Corporation, and its board members for plagiarism.”

He later added, “Why would we stop at MIT? Don’t we have to do a deep dive into academic integrity at Harvard as well? What about Yale, Princeton, Stanford, Penn, Dartmouth? You get the point.”

A null model

It’s unclear which tool Gay’s accuser used to identify their examples, but experts agree the accusations seem to come from a text comparison algorithm. A Markup analysis of five of Gay’s papers in the Grammarly and EasyBib plagiarism checkers did not turn up any of the plagiarism accusations that have surfaced in recent months. Grammarly’s tool did flag instances of text overlap between Gay’s writing and other scholars’, sometimes because they were citing her paper, but sometimes because the two authors were simply describing similar things. Gay’s 2017 political science paper “A Room for One’s Own?” is the subject of more than half a dozen accusations of plagiarism that Grammarly didn’t flag—but the tool did, for example, suggest her line “The estimated coefficients and standard errors from the” may have been plagiarized from an article about diabetes in Bali.

Analyzing the same paper, Turnitin ignored several of the lines included in complaints against her but it did flag four from two academic papers. It also found other similarities, suggesting, for example, that the phrase “receive a 10-year stream of tax credits” warranted review.

David Smith, an associate professor of computer science at Northeastern University, has studied natural language processing and computational linguistics. He said plagiarism detection tools tend to start with what is called a “null model.” The algorithm is given very few assumptions and simply told to identify matching words across texts. To find examples in Gay’s writing, he said, “it basically took people looking through the really low-precision output of these models.”

“Somebody could have trained a better model that had higher precision,” Smith said. “That doesn’t seem to be how it went in this case.”

The result was a long list of plagiarism accusations most scholars found baffling.

Turnitin introduced its similarity check in 2000. Since then, plagiarism analyses have become the norm for editors of some academic journals as well as many college and university faculty members. Yet the tool is not universal. Many users, like Pickering Antonova, have decided the software isn’t worth the time and don’t align with their teaching goals. This has created two distinct classes of people: those who are subjected to plagiarism checkers and those who are not. For professional academics, Gay’s case highlights the concern that anyone with a high profile who makes the wrong enemy could quickly become part of the former group.

For students, it’s often just a matter of their schools’ norms. Plagiarism checkers can seem like a straightforward assessment of the originality of student work, reporting a percentage of the paper that may have been plagiarized. For faculty members who don’t have the time to look at the dozens of false flags, it can be easy to rely on the total percentage and grade accordingly.

This behavior worries Smith, the computer scientist. “Getting a quantification makes it easier to just judge a lot of student papers at scale,” he said. “That’s not what’s going on in the Claudine Gay case but is troubling about what’s going on with students’ subjection to these methods.”

Tech companies have produced a steady stream of new tools for educators concerned with students cheating, including AI detectors that followed the widespread adoption of ChatGPT. With each new tool comes a promise of scientific accuracy and cutting-edge analysis of unbiased data.

But as Claudine Gay’s case demonstrates—and the threat of the plagiarism wars promises—plagiarism detection is far from precise.

This article was originally published on The Markup and was republished under the Creative Commons Attribution-NonCommercial-NoDerivatives license.

Posted in: AI, Communications, Education, Legal Research