Angels

1. How to Find Plagiarism

Plagiarists rely upon the anonymity and the vastness of the Internet to hide their activities. Almost always, they know what they’re doing is wrong (at least morally) and though they seem very bold about their activities, they are betting that you won’t learn about their misuse of your work.

What plagiarists don’t realize is that the same tools that make it easy for them to find works to steal also make it easy for you, the copyright holder, to retrace their steps and catch them. Because, even though the Internet is vast, it’s so well indexed that finding plagiarism is a very easy task.

Non-Blogging Writers

If you’re a writer looking for copycats, Google is your best friend. Google’s huge database of sites, combined with its ability to search other types of documents (including Adobe PDF and Word DOC) make it a very powerful tool for tracking down infringement.

The first step to a successful Google search is to NOT use the title of your work. Many plagiarists, to hide their activity, will change the title of your work while keeping the body intact.

The best thing to do is to find a statistically improbable phrase (SIP) in your work and search for it. A good SIP is usually between 6-12 words long and is completely unique to your work.

On your first try, place your SIP in quotes. If that doesn’t return any matches, search again without the quotes to broaden your search. If neither search turns up anything and your work is posted online, it means that Google has not indexed your site and it is best to wait a few days and try again.

However, if results do show up, the next step is to look closely at each link that is not from your site or somewhere else you posted the work. Check each link to first make sure that it is your work and then to see if the use is within your guidelines.

It’s important to note that, when performing these searches, Google may give you a message similar to this one:

“In order to show you the most relevant results, we have omitted some entries very similar to the 1 already displayed. If you like, you can repeat the search with the omitted results included.”

If you get this message, click the link and repeat the search. This means, quite literally, other pages on the Web have content almost identical content to your own and you definitely need to follow up. While it could be another page on your site, it could also be a plagiarist.

However, as powerful as Google is, it has limitations. For one, it tends not to index some of the most rampant places for plagiarism such as private message boards and social networking sites.

To greatly speed up this process, you can use a tool called Copyscape to make the process of searching through Google much more efficient. To use Copyscape, you simply punch in the URL of the work you want to check and click submit. Copyscape will do the rest.

Copyscape uses Google as its backend so it will have the same limitations as the search engine itself in terms of what it picks up. Also, the free version may be too limited for some users since it only returns ten results. However, their paid service 5 cents per search and offers unlimited results.

If you are interested in a free, unlimited service, you may wish to take a look at Plagium. It is very similar to Copyscape but also allows you to copy and paste text and get unlimited results for free. The service uses the Yahoo! search engine as its backend, meaning it will produce different results than Copyscape or Google.

Finally, if you wish to automate your searching, use Google Alerts to detect plagiarism on your behalf. Simply set up search queries as described above and instruct Google Alerts to email you with new results. The service is completely free and is ideal for sites where the content remains relatively static.

Bloggers

Bloggers have special challenges when it comes to detecting plagiarism. The dynamic nature of their work makes it impractical to check every single work for infringement. Worse still, with RSS scraping and spam blogging on the rise, their content is often lifted as soon as new works are posted.

One way to detect RSS scraping specifically is to add a digital fingerprint to your feed. A digital fingerprint is basically a string of unique characters that do not appear anywhere else on the Web. If your feed content is scraped, your fingerprint will be scraped with it and you can simply search for that phrase to find suspicious sites. Even better, you can create a Google Alert, discussed above, to automate the process and email you when new matches are found.

To add a fingerprint to your feed, you can either manipulate your template files directly, use a plugin or use a service like FeedBurner.

Though these tools will enable you to detect most RSS and automated scraping, they will not protect you against the more traditional “copy and paste” plagiarist.

Artists/Photographers

Artists and photographers have a much more difficult time finding plagiarists than writers due to the nature of their work. There is no way to search for unique phrases or just “Google” for copies of a particular work.

Currently, the best available free tool for detecting image misuse is the visual search engine Tineye. Tineye works by having you either upload the image or give a URL for it, Tineye then scours its database looking for images that are either exact duplicates or very close.

It is important to note that Tineye’s database is somewhat limited in size, encompassing only a small fraction of the Web’s images, but it is targeted well to find the worst infringers and those that you need to worry about.

In addition to Tineye, you may be able to useGoogle Image Search to find copies of your work. You can do this by clicking the “image” tab at the top of the search area before entering your terms.

However, since you can’t search for a distinctive line, the best place to start is by searching for the title. Even though a lot of copycats will change that, many others will not and you can find those results.

Also, consider doing an image search for the file name. If you take the time to give each of your images a very original file name (and certainly not just a number), there’s a good chance that plagiarists will leave that untouched when posting the work. This works even if they edit the file, for example, to remove a watermark or shrink the image and is an easy mistake for even the most careful plagiarist to make, regardless of how easy it is to correct.

But, where searching may be less effective, referral links are infinitely more powerful. A lot of people, especially those posting message boards, won’t take the time to save and repost your image. Instead, they’ll simply link to the image file hosted on your server and, if you have access to server logs, you can check referrers for each image and catch people who are linking to it illegally, not only taking credit for your work, but also stealing your bandwidth.

Of course, if you control your own server, this is probably an issue you’re going to want to nip in the bud, to save the cost of bandwidth if nothing else. As such, there are several methods to eliminate hotlinking images using both scripts and by editing server files.

The main thing to remember is, as an artist, it is much more difficult to track down thieves than it is for for a writer. However you do have significantly more proactive measures that you can take to stop thieves including watermarking images with a visual indicator. I strongly urge artists to focus more on those proactive steps than detection.

Musicians and Movie Makers

There are many tools for detecting and locating copies of musical works. However, those tools are targeted at a corporate audience and are not available for free.

However, in my experience, though video and audio works are shared widely, plagiarism is relatively rare, largely due to the time and effort it takes to modify these works.

As with visual arts, it is typically better to focus on proactive measures to discourage misuse such as including watermarks on videos and ID3 data in MP3 files. Though such methods will not prevent your works from being copied, they will help ensure that attribution is carried with them as they are passed around, at least giving you credit.

Help With Detecting Plagiarism

Here is a recap of the key links in this article:

  • Copyscape: A great tool for quick plagiarism searches.
  • Plagium: A Copyscape alternative that is free and based on Yahoo!
  • Google Alerts: A free service that can automate basic plagiarism checks and email you results.
  • FairSare: A tool to detect misuse of content in an RSS feed.
  • Digital Fingerprint Plugin: A WordPress plugin to detect RSS scraping.
  • Tineye: A visual search engine that looks for copies of an image.
  • FeedBurner: Offers feed modification and analysis tools that can make detecting RSS scraping much easier.

43 Responses to “1. How to Find Plagiarism”

  1. Timothy says:

    I just started reading this site so maybe I’m missing something. How does one determine if a work is truly plagiarized? Copyright infringement is of course easy to detect – if a substantial piece of an original work has been copied without permission.

    But plagiarism is defined as taking another’s works, ideas and processes and claiming as your own.

    If two people share the same thought, share a common culture, and express ideas in similar language, and a sentence or two in one work sounds similar to that of another, does that constitute plagiarism? Who’s the plagiarizer?

    Or perhaps we’re talking about obvious blatant rip-offs.

    I guess I’m curious of what the yardstick is for legally identifying plagiarism.

  2. Thanks Jonathan for so many great articles about finding and combatting text theft. I’d like to second the importance of having access to your list of Referring Sites. It is truly vital that authors (or their webmasters) have access to and review this data.

    I write fiction and essays and found an unusual referrant on my log. Using Firefox (I too am a fan of this tool) I went to the site and found it was a clearing house directory of related essays. There was one of my essays (on medieval cosmetics) listed and described, with the link to my site. Just beneath it was another essay on the same subject listed and described, with the note by the compiler “There are a lot of similarities between these two essays”. This comment certainly set off bells, and I went to the second site at once and found my own essay almost ver batim embedded into the other essay. I started a flurry of “Cease and Desist” letters which by their second posting earned me apologies and the complete removal of the other site’s essay containing my text.

    I’ve learnt how to pick my battles however. For some reason almost 200 young women have taken my photograph and used it on myspace.com. It’s sort of funny, really, and I certainly am not going to waste time and energy tracking them down. Many other of my graphic images have been abscounded with, and again, I usually don’t bother with these. It’s my fiction and essays I try to protect.

    Thanks again Jonathan for such a useful and meaningful addition to the WWW.

    wes thu hal (be whole and hearty, in Old English)
    Octavia

  3. Marios says:

    First off, great info! I recently found a couple of other sites that had plagiarized my content. I used Google to find these sites by picking out a phrase from a page that I figured sounded unique enough not to return too many results.

    Copyscape is a service that can help with people that are willing to pay for the service. They use Google’s database apparently.

  4. [...] First, take a look at my “How to Find Plagiarism” guide. It covers most of the basic information that writers, artists and musicians need to find plagiarized copies of their works. [...]

  5. charly says:

    Yeah, I think that Copyscape rules but sometimes it’s not efficiently.

    Regards

  6. JPG says:

    1. Subscribe to Google Alerts
    2. Pick up some keywords from your text or post, for criteria search/detection purposes.
    3. Test the search routine (blogs or comprehensive); narrow it, if necessary, by adding some other keywords.
    4. Choose your report(s) options: “type”, and “how often”.
    5. Relax. Forget it. Do something else. If something similar to your work is found, you will be noticed: where, when, how exactly (choose html reports).

    You can create one (or more) alerts for each text or post, via different criteria and/or type/frequency, and also manage, test or modify any, some, or all of them.

  7. linda lee says:

    I speak to writers about getting started blogging and turning their writing into articles and ebooks. One of the biggest questions that always comes up is what if someone steals my work?
    Unfortunately that happens with any form of writing, but the internet has made it so easy.
    In the end you have to decide if you want to take the “risk” and put your writing product out there, or keep it to yourself and possibly a limited small audience. I’m going to add your blog to my handout sheet to help those who are really concerned about this. Thanks for all your hard work and great info on this blog!

  8. [...] to make sure that this presence isn’t being scattered around carelessly by another person. Plagiarism Today is a wonderful resource that provides in-depth information about this topic including how to find [...]

  9. I wonder what to do when a student steals an essay and synonymizes a few words to elude detection. Does Google Alert detects the changes? Is there any other tool?

  10. JB says:

    Will: For that kind of situation, you might want to use something like Copyscape or a professional plagiarism checking service such as Turnitin or MyDropBox. They can better detect synonymized plagiarism and examine the entire work rather than just a snippet.

  11. JPG says:

    Well, the idea (probably very arrogantly presented) was to point out some simple, totally free, efficient method. The alternatives – besides expensive, complex, and extremely fallible programs – can include, for instance, continuing to (personally and regularly) check every single page on the Internet for possible plagiarism. By all means, leave no stone unturned, for all I care. Good luck.

  12. [...] first section to get an overhaul was the first chapter itself, How to Find Plagiarism. To that chapter, I’ve added two sections, one targeted at bloggers and RSS scraping as well [...]

  13. San says:

    Can Any body help me out to give a list of software over internet(Downloadble/Purchase) to check the plagiarism of pictures and documents.

  14. JB says:

    San:

    It depends on the type of software you are looking for. Are you looking to compare two or more documents against one another on a local machine or check something for plagiarism against the Web or other sources.

    The latter is not a program that you are going to be able to download, but rather, will be a Web-based service you pay to use.

    I can try to give you some links but you have to be more specific about your goal.

  15. San says:

    I need to check something for plagiarism against materials/ pictures present in local system to its Web or other sources.

  16. JB says:

    San:

    Sorry for the delay if you’re comparing anything to the Web, you’re going to need a Web-based solution.

    iThenticate (http://www.ithenticate.com) is probably the best-known solution in this area. They, along with MyDropBox (http://www.mydropbox.com) provide services to academics and businesses and search through the largest amount of content including the Web, journals, magazines, newspapers and other sources.

    For quick checks against the Web, you can use Copyscape (http://www.copyscape.com) pretty trivially to check for content. You might have to register for a premium account but that is only five dollars.

    A company in testing right now, Bitscan (http://www.bitscan.com/), also has some promise in this area and offers the ability to paste text for free.

    Another alternative, of course, is to just take key phrases and run them through Google as you would any other query. It can sometimes tell you all that you need.

    There are more, but one of those five should give you what you need.

    Hope that helps!

  17. Ujjwal Dey says:

    Hi,

    What can I do when there is no contact information to inform the plagiarist that he is a creep ?

    My Post at Blogger:
    http://hotaircoldlove.blogspot.com/2008/01/melancholic-musings.html

    THE COPY CAT JERK:
    http://ashlypradhan.blogspot.com/2008/01/depression-blues.html

    Nothing can be done with blog posts there are too many scoundrels out there and no one chasing them.

    Regards,
    UD

  18. JB says:

    UD:

    Take a look at the fourth part of this series here:

    http://www.plagiarismtoday.com/stopping-internet-plagiarism/4-contacting-the-host/

    The host in this case is Blogspot, which is owned by Google. The email address is amac at google.com. However, you’re going to want to read this column about how to submit a notice to Google first.

    http://www.plagiarismtoday.com/2006/06/02/google-the-dmca-and-you/

    It should be a pretty easy submission once you know that. I’ve got the DMCA stock letter here:

    http://www.plagiarismtoday.com/stock-letters/

    With those items, you should have no trouble resolving it. Email me if you have any questions!

  19. Ujjwal Dey says:

    Hi,

    I will try and see if that helps.

    Thanks.

  20. [...] Use content theft search sites and services such as FairShare, Copyscape, Google Alerts, Digital Fingerprint WordPress Plugin, and other techniques described by Jonathan Bailey in “How to Find Plagiarism”. [...]

  21. [...] How to Find Plagiarism – Great page on PlagiarismToday.com about how you can find unauthorized copies of your work online. Includes links to tools. [...]

  22. [...] ↓About the AuthorOther SitesConsultingDMCA ContactsStock LettersStop Internet Plagiarism ↓1. How to Find Plagiarism2. Contacting a Plagiarist3. Finding the Host4. Contacting the Host5. When All Else Fails6. The Long [...]

  23. [...] great articles and information on blog plagiarism from Plagiarism Today including  a post about how to find plagiarism.-Justin Germino Related posts:Switching Web Hosting Providers Transferring wordpress blogs between [...]

  24. [...] interesting information, like Don’t Plagiarize Us: Twitter Plagiarism Checking and his popular 1. How to Find Plagiarism. I’m not clear on what the number means in the title, but it’s full of useful [...]

  25. [...] créateur est concerné par cette quête en vue de défendre ses droits d’auteur. Le site Plagiarismtoday nous propose un tour d’horizon des solutions simples d’utilisation et conçues pour [...]

  26. [...] ↓About the AuthorOther SitesConsultingDMCA ContactsStock LettersStop Internet Plagiarism ↓1. How to Find Plagiarism2. Contacting a Plagiarist3. Finding the Host4. Contacting the Host5. When All Else Fails6. The Long [...]

  27. [...] learn more about how this all works and what you can do about it, see How to Find Plagiarism, What Do You Do When Someone Steals Your Content and The 6 Steps to Stop Content [...]

  28. [...] learn some-more about how this all works and what we can do about it, see How to Find Plagiarism, What Do You Do When Someone Steals Your Content and The 6 Steps to Stop Content [...]

  29. [...] How to Find Plagiarism – Great page on PlagiarismToday.com about how you can find unauthorized copies of your work online. Includes links to tools. [...]

  30. [...] Plagiarism Today y el artículo de El  Tiempo que  me envió Juan David Zambrano, nuestro ingeniero de redes [...]

  31. [...] defensive tools that you can use to keep track of how your images are being used. This article in Plagiarism Today tells a few little things that can make tracking even easier. In decorative concrete, as in [...]

  32. [...] for both title of article and a few snippets from the article itself. If you want to read more: 1. How to Find Plagiarism | Plagiarism Today HT: L-C-R Zaph ZDT3.5's | AVR: Denon 4308Ci | Source: HTPC (Radeon 5k series – AMD Sempron 140 – [...]