Find a good Plagiarism Checking Script?

automation

I am writing a dissertation for my final-year project and I know that regardless of what I write there are going to be similarities to stuff out there on the web and in academic journals and books. Thankfully for most Google can root out most of these with ease so finding a free plagiarism checking script shouldn't be too hard.

After a bit of Googling I came across a couple of really basic checkers, but most of these are either "basic versions" that require one to pay for a script that actually works or are just generally useless. I've found one that seems to work well for small chunks of data but fails miserably when handed a large file.

All I want to do is to be able to upload a Word document (or just input text if no upload options are available) and to tell me what appears to be copied and where from. Does anyone know of any good plagiarism checkers that can be downloaded for free or run online?

EDIT: Just to clarify my needs, I want a tool that can take a Word document and search it in its entirety so that it can label any instance where the document matches another document on Google.

Best Answer

I don't think that you are going to find a real solution in the form of a script or a free application. Think of what you want the program to do: read a document and check for - what? (Other people's published work? That requires a big honking database of published works. Stylistic variance beyond some statistical norm? That requires a statistical norm for style (probably based on vocabulary and sentence length computed in relation to significant words - i.e. filter out 'and', 'or', 'to', 'for' etc.). These are not trivial requirements or something you can just put together in a couple of hundred lines of |insert-your-scripting-language-here|.

Many schools use Turnitin, and many schools use Google to search for published works. That method is hit or miss, for obvious reasons. Turnitin isn't free, and Google is a very imperfect solution, but those are the two solutions I see most often.

Related Question