Making sense of online textual information and information management technologies
   
 
PlagiarismHome

   
Plagiarism: Prevention, Detection, Analysis: Detection and Prevention
June 18, 2004

In our earlier article we took a look at 'term paper mills', in this article we will take a look at some of the plagiarism detection software and services that are available to students and teachers alike.

Plagiarism Detection and Prevention

There are several plagiarism detection software/ services(turnitin, mydropbox, EVE2), and at least one plagiarism prevention software (powerresearcher). Most of these services compare a submitted text (student assignments) to another set of documents (either via a search engine, or a local database). If matches are found, then a more detailed comparison is undertaken, and an 'originality report' is created, and returned to the user (an individual or an educational institution). All these, and a few others offer additional services/ software as well. Prevention software makes it easier to research, and manage project, thus obviating the need to plagiarize to a certain extent.

Glatt uses the uniqueness of a student's style, and taking an essay, eliminates every fifth word, replaces these with blanks, and asks the student to fill in these blanks. Glatt also offers a training program, and a self-detection routine.

Plagiarism is not restricted to students, and for code plagiarism, there is a free service, MOSS (a measure of software similarity), and also some other tools at University of Warwick. There is also Jplag from University of Karlsruhe for similarity in pieces of code.

A very large number of educational institutions and individual educators are using these services, and some have developed their own plagiarism detection software.

Most of these seem depend on matching strings, and computing power. Turnitin and its related companies (iparadigms and ithenticate) claim to have a database of 4.5 million essays/pages. It must be said that while search and match are unavoidable steps in the process of detection, it is not essential that these be the only ones. There might be some additional routines that would make the results even more reliable.

One issue with these methods is that they work on the assumption that the submitted text is plagiarized. It is on that assumption that the search and match is conducted. The student is truly off the block only when no matches are found. Other ways of identifying plagiarism could perhaps be used to supplement search and match.

Authorship Attribution/Identification

One of the earliest relationships between statistics and writing was for analyzing style-stylometry. With advanced computing power available today, the power of stylometry has grown exponentially. Using N-grams, cusum, zipping, and many other methods, it has become possible to make advances in stylometry.

These advances might come of use for plagiarism identification. In fact, it should be possible to make further advances in stylometry so that an individual's style is captured.

This could lead to a fundamental change in plagiarism identification: instead of building crawlers and search engines, or using existing search engines as some do, it might be easier to take substantial samples of a student's writing, identify the style, and then sift/mark passages which are not in that style. This is, possibly, a simpler method, and has the advantage of not making any negative assumptions about a student's essay. There is reason to believe that this might even be a little quicker.
This is one of the ways in which traditionally, and intuitively, a teacher would identify plagiarism.

A combination of search and match, and the above would lead to a more robust and assumptionless manner of dealing, not only with plagiarism, but also intellectual property.

 
Home | Contact K-Praxis | About K-Praxis | Copyright© 2003-2004 K-Praxis. All rights reserved.