When people talk about content detection, they are usually putting it into some copyright-related context. Whether it is YouTube’s content ID system, image matching for tracking plagiarists/preventing orphans or simple duplicate text searching to track violators of their license, most people think of content detection as a means to track and stop copyright violations.
Sadly, this site too is guilty of that. However, I want to take time today to highlight one of the more important uses of content detection, audience analytics.
Most bloggers put some kind of analytics on their site to track visitors, referrals, etc. But if their revenue doesn’t come from online ads, it is more about understanding your audience than it is tracking actual page views. The statistics themselves are just a means to an end.
However, there is a very good chance that a significant portion of your audience is actually on other sites. But unless you track your content, you may never be aware of it.
Every Site is Different
Last year content tracking service Attributor announced the results of its TrueAudience study. It found, for the publishers that it checked, that the off-site audience was 1 and a half times greater than the audience on the site itself. This meant that, for every two people reading the content on the publisher’s site, three were seeing it elsewhere.
Since the Attributor study focused more on larger publishers, who will likely have higher levels of copying, the results will obviously not be that dramatic for smaller bloggers. However, virtually everyone who publishes to the Web will see some copying and, through that, will have some of their audience on other sites.
Ignoring this is like telling your statistics program to ignore every X visitor without any understanding of what number X is. If you want to know your audience, you have to go where they are.
Given the wide range of sites and the different situations they are in, it is impossible to even offer good estimates without at least getting some facts.
Referrals and Linkbacks
To be certain, you can track some of this with your existing tools. Referrals will alert you to when someone visits your site from another page and trackbacks/pingbacks will alert you instantly when someone has linked to your content.
However, there are several problems with these. Referrals are limited to when people actually click links. This requires both the site to link to you and a user to actually click the link. Given that the vast majority of visitors don’t click referral links, it could never actually happen, even with attributed uses. Also, referrals also track sites that simply link without using any content, making it a challenge to find actual audience members on other sites.
Though trackbacks and pingbacks don’t require anyone to click the link, they also focus mostly on sites that simply link to your blog. Furthermore, there is a huge issue with spam and many sites that duplicate your content may be filtered out, correctly or incorrectly, as such.
These tools are powerful, but they are not actual substitutes for following your content on the Web.
If you’re new to the idea of tracking your content, meaning you probably aren’t a regular reader of this site, here are a few suggestions to get you started.
- FairShare: A free service provided by content tracking company Attributor, FairShare subscribes to your RSS feed and publishes a private one for you that tracks where it finds your content. Very useful for sites with a low-to-moderate level of copying.
- Tineye: Though somewhat limited, Tineye is the best visual search engine available and definitely the best free search. Great for visual artists to find how their work is being used.
- Plagium: If you have static content and can’t use FairShare, Plagium is a good alternative. Receive weekly alerts of new matches for free, works like a hybrid between Google Alerts and free Copyscape, other tools well worth looking into.
These are all great, free services that you can use to track your content and get a slightly better understanding of your audience.
When I was running a personal literature site, I was proud of my traffic stats but was stunned to find out that my audience off the site was many times larger than it was on my site. Much of the use was legitimate, including use in compilations and online magazines, but much of it was plagiarism. Using this information, I reached out and encouraged legitimate use, even participating in sites and discussions that properly used my content, and decided to tackle the plagiarists.
The system worked very well for me. It let me meet many people and reach out to a whole new group of people. It also let me convert some cases of mistaken identity into legitimate uses and stop plagiarists by the hundred.
Though I eventually abandoned my site, there is no doubt that tracking my content helped me expand my audience and my understanding of it. These two things are at least as valuable, if not more so, than the copyright uses for many bloggers and smaller publishers.