Plagiarists rely upon the anonymity and the vastness of the Internet to hide their activities. Almost always, they know what they’re doing is wrong (at least morally) and though they seem very bold about their activities, they are betting that you won’t learn about their misuse of your work.
What plagiarists don’t realize is that the same tools that make it easy for them to find works to steal also make it easy for you, the copyright holder, to retrace their steps and catch them. Because, even though the Internet is vast, it’s so well indexed that finding plagiarism is a very easy task.
If you’re a writer looking for copycats, Google is your best friend. Google’s huge database of sites, combined with its ability to search other types of documents (including Adobe PDF and Word DOC) make it a very powerful tool for tracking down infringement.
The first step to a successful Google search is to NOT use the title of your work. Many plagiarists, to hide their activity, will change the title of your work while keeping the body intact.
The best thing to do is to find a statistically improbable phrase (SIP) in your work and search for it. A good SIP is usually between 6-12 words long and is completely unique to your work.
On your first try, place your SIP in quotes. If that doesn’t return any matches, search again without the quotes to broaden your search. If neither search turns up anything and your work is posted online, it means that Google has not indexed your site and it is best to wait a few days and try again.
However, if results do show up, the next step is to look closely at each link that is not from your site or somewhere else you posted the work. Check each link to first make sure that it is your work and then to see if the use is within your guidelines.
It’s important to note that, when performing these searches, Google may give you a message similar to this one:
“In order to show you the most relevant results, we have omitted some entries very similar to the 1 already displayed. If you like, you can repeat the search with the omitted results included.”
If you get this message, click the link and repeat the search. This means, quite literally, other pages on the Web have content almost identical content to your own and you definitely need to follow up. While it could be another page on your site, it could also be a plagiarist.
However, as powerful as Google is, it has limitations. For one, it tends not to index some of the most rampant places for plagiarism such as private message boards and social networking sites.
To greatly speed up this process, you can use a tool called Copyscape
Copyscape uses Google as its backend so it will have the same limitations as the search engine itself in terms of what it picks up. Also, the free version may be too limited for some users since it only returns ten results. However, their paid service 5 cents per search and offers unlimited results.
If you are interested in a free, unlimited service, you may wish to take a look at Plagium. It is very similar to Copyscape but also allows you to copy and paste text and get unlimited results for free. The service uses the Yahoo! search engine as its backend, meaning it will produce different results than Copyscape or Google.
Finally, if you wish to automate your searching, use Google Alerts to detect plagiarism on your behalf. Simply set up search queries as described above and instruct Google Alerts to email you with new results. The service is completely free and is ideal for sites where the content remains relatively static.
Bloggers have special challenges when it comes to detecting plagiarism. The dynamic nature of their work makes it impractical to check every single work for infringement. Worse still, with RSS scraping and spam blogging on the rise, their content is often lifted as soon as new works are posted.
One way to detect RSS scraping specifically is to add a digital fingerprint to your feed. A digital fingerprint is basically a string of unique characters that do not appear anywhere else on the Web. If your feed content is scraped, your fingerprint will be scraped with it and you can simply search for that phrase to find suspicious sites. Even better, you can create a Google Alert, discussed above, to automate the process and email you when new matches are found.
Though these tools will enable you to detect most RSS and automated scraping, they will not protect you against the more traditional “copy and paste” plagiarist.
Artists and photographers have a much more difficult time finding plagiarists than writers due to the nature of their work. There is no way to search for unique phrases or just “Google” for copies of a particular work.
Currently, the best available free tool for detecting image misuse is the visual search engine Tineye. Tineye works by having you either upload the image or give a URL for it, Tineye then scours its database looking for images that are either exact duplicates or very close.
It is important to note that Tineye’s database is somewhat limited in size, encompassing only a small fraction of the Web’s images, but it is targeted well to find the worst infringers and those that you need to worry about.
In addition to Tineye, you may be able to useGoogle Image Search to find copies of your work. You can do this by clicking the “image” tab at the top of the search area before entering your terms.
However, since you can’t search for a distinctive line, the best place to start is by searching for the title. Even though a lot of copycats will change that, many others will not and you can find those results.
Also, consider doing an image search for the file name. If you take the time to give each of your images a very original file name (and certainly not just a number), there’s a good chance that plagiarists will leave that untouched when posting the work. This works even if they edit the file, for example, to remove a watermark or shrink the image and is an easy mistake for even the most careful plagiarist to make, regardless of how easy it is to correct.
But, where searching may be less effective, referral links are infinitely more powerful. A lot of people, especially those posting message boards, won’t take the time to save and repost your image. Instead, they’ll simply link to the image file hosted on your server and, if you have access to server logs, you can check referrers for each image and catch people who are linking to it illegally, not only taking credit for your work, but also stealing your bandwidth.
Of course, if you control your own server, this is probably an issue you’re going to want to nip in the bud, to save the cost of bandwidth if nothing else. As such, there are several methods to eliminate hotlinking images using both scripts and by editing server files.
The main thing to remember is, as an artist, it is much more difficult to track down thieves than it is for for a writer. However you do have significantly more proactive measures that you can take to stop thieves including watermarking images with a visual indicator. I strongly urge artists to focus more on those proactive steps than detection.
Musicians and Movie Makers
There are many tools for detecting and locating copies of musical works. However, those tools are targeted at a corporate audience and are not available for free.
However, in my experience, though video and audio works are shared widely, plagiarism is relatively rare, largely due to the time and effort it takes to modify these works.
As with visual arts, it is typically better to focus on proactive measures to discourage misuse such as including watermarks on videos and ID3 data in MP3 files. Though such methods will not prevent your works from being copied, they will help ensure that attribution is carried with them as they are passed around, at least giving you credit.
Help With Detecting Plagiarism
Here is a recap of the key links in this article:
- Copyscape: A great tool for quick plagiarism searches.
- Plagium: A Copyscape alternative that is free and based on Yahoo!
- Google Alerts: A free service that can automate basic plagiarism checks and email you results.
- FairSare: A tool to detect misuse of content in an RSS feed.
- Digital Fingerprint Plugin: A WordPress plugin to detect RSS scraping.
- Tineye: A visual search engine that looks for copies of an image.
- FeedBurner: Offers feed modification and analysis tools that can make detecting RSS scraping much easier.