Stopping Self Content Theft

Jonathan BaileyFebruary 3, 2010

4 minutes read

Feeding Google’s insatiable appetite for content is on of the main reasons why infringers scrape and plagiarize content and also one of the biggest reasons why it is important to monitor and, in many cases, defend against it.

The logic is simple enough, the more copies of a work that appear, especially without proper attribution, the less likely that the search engines will give credit to the original source. This can erode search engine performance, especially for smaller and less-established sites or those in highly-competitive fields.

However, duplicate content doesn’t just come from plagiarists and spammers, it also comes from oneself and our own actions when dealing with our own content. Some of it is errors within our site, some of it is in how we approach social networking and social news.

So even as we are enforcing our rights elsewhere, we have to be careful about how we use our own works. Though it might not be infringement, it can certainly have a very negative impact on you and your site and is worth dealing with all the same.

Starting at Home

The first steps to dealing with duplicate content have to start on your own site or blog. Many people don’t realize how many opportunities there are to create duplicate content on your site, even by pure accident.

Consider the following examples from a simple blog:

Tag Pages: Tag pages have much of the same content as individual post pages and are generated by most blogging applications.
Archive Pages: Monthly, yearly and other archive pages, similar to tag pages, have the same content, or significant portions of it, repeated.
Category Pages: As with Archives and Tags, category pages repeat content.
Printable Pages: Many themes include printable versions of content pages that can be indexed as duplicates.
Comment Pages: Finally, depending on the way comments are set up, a separate page with duplicate content can be created for the version with comments.

Depending on how your blog is set up, it is entirely possible that your article appears six times or more on your site. Google, and other search engines, have to make a decision about which page is the best page and link to it. However, it doesn’t always make the right decision and, in extreme cases, can even decide that the site is spamming and either lower its ranking or remove it.

Thus, it is important to make sure that you keep this duplicate content to a minimum and do your part to let the search engines know what you want them to link to. Here are a few tips:

Show Summaries: When possible, only use article summaries and link to the full article. There is no reason for your tag, archive or category pages to display the full text of every entry.
Use Robots.txt: Use your robots.txt file to prevent search engines from indexing unneeded pages, such as your printable pages. However, use caution with this method.
Use Canonical Tag: Google, Yahoo and Bing all support the canonical tag, which tells search engines which page is the best to include in the index.

In short, be very clear what versions of your content are ideal and try to keep the duplicates to an absolute minimum. Doing so will greatly help search engines tell which page to link to, helping both you provide a better service.

Away from Home

The other problem with self-defeating content use lies away from the home site. Where once an entire person’s presence was in their home page, now it can be scattered all over the Web, including other sites they run and social networking sites that they integrate with and use.

While it may seem like a great idea to post your content on every site you take part in, it can confuse the search engines. You want your efforts in social media to support your search engine strategy, not replace your original site. However, many people unwittingly do exactly that.

Here are a few ways to avoid that:

Unique Content for Each Site: If you run multiple sites, you need to have unique content for each. You can use snippets of content to cross promote and certainly link between them, but don’t repost everything. It confuses search engines and readers alike.
Use Snippets: When posting your content on other sites, use snippets and link to the original works. The likelihood of this replacing your content, in human or search engine eyes, is slim to none.
Require Links: Whenever any content of yours appears on another site, even in snippet form, request links back to the original, specifically SE-friendly ones.

In short, be careful how you use your content. Though linked use isn’t likely to hurt you with the search engines, if you aren’t careful you can really eat up your own site by spreading your work too thin, too carelessly.

Bottom Line

When we think of content reuse, we think of what others do with our work. However, the fact is we are all the biggest reusers of our own work and, perhaps, the most important.

Though we can and should track how others use our content, as well as prevent uses that are against our wishes, it is also important to keep an eye on ourselves and make sure that our actions are working for us and with our strategy.

As with anything else in life, the best place to start your content strategy is by looking at yourself and your own actions, after all, you are your own biggest customer.