A recent study by content tracking company Attributor attempted to determine the true audience of a Web publisher by analyzing both the viewership the site’s content gets on its own site and what it gets on other sites where it is copied onto, usually without a license.
The results were stunning. According their report, on average the sites that they studied had 140% more views of their content on other sites than the original. This meant that well over half of all views of the content took place on sites other than the creator’s and were unavailable for either monetization or, in many cases, attribution.
Though the results are interesting, they likely are not at all surprising to many who deal with copyright and plagiarism issues on the Web. With human copying, RSS aggregation (both good and bad) and other republication as common as it is, many had already suspected that the audience of a content was much larger off the site than on it. Attributor is simply one of the first to conduct a study that shows it.
However, there are several elements of the study that are interesting beyond the initial findings and may offer clues as to what Webmasters are most at risk of having their content misused.
How it Was Done
According to the report (PDF), they analyzed 100 publishers from the Compete Top 1000, first discarding their existing customers and sites with partial feeds, and added some publishers into the list on top of that to ensure a good mix of different topics.
They then ran the sites through the content matching service and analyzed all of the copies that had higher than a 50% match and more than 125 words the same. After removing known licensees, they looked at the sites that had information on and used traffic estimates from Compete to get an approximate idea of how large the viewership was on these match sites.
After that, they then broke down the results by broad category to see which kinds of publishers had the largest “Audience Multiplier”, meaning viewership of their content on other sites.
The results were that, on average, the publishers tested had nearly 60% of their content viewership on other sites. This leads to missed opportunities both for linkbuilding and for monetization as well as possible causes for removal requests.
This obviously will be of great interest to many Web publishers, who are looking for ways to maximize their audience in the face of an economic slow down, but may not come as a surprise to those that have studied these issues.
However, other findings of the study are potentially even more useful to Webmasters, especially those in high-risk fields.
The study, in addition to looking at publishers in general, broke down their results by content category and the results there were staggering.
Of the sites listed, automotive sites fared the worst. For them, they had nearly seven times the audience on other sites than they did on their own. Travel sites also had a high multiplier, over five times the amount and movie reviews had just under five.
In each of the cases above, the sites have audiences on other parts of the Web that easily dwarf their own traffic, meaning they are experiencing the highest level of unlicensed copying and the sites that are copying them have the highest amount traffic levels.
The topics that fared best were politics and health, both of which had barely over one. However, in both cases, the audience is still larger on the rest of the Web, only in these cases it is by a very small margin.
Why there is such a wide divide between the different topics is very difficult to say. Without the full statistics, which were not available in the report, it could be due to a variety of reasons. Though it seems unlikely that one site would be scraped and republished significantly less than another, especially since nearly all of the topics have a strong spammer following, it could be a sign that copying and pasting has had a higher degree of success in certain categories.
For example, with political sites, most people visit the one or two sites that they trust rather than performing blind searches. However, when automotive problems arise, people tend to put queries into Google and trust the search results. Even though both sites likely have many copies of their content, one is able to rely on their brand loyalty to keep much of their audience close by.
However, this is only a guess, but it is clear that it is time to think about the sites in danger slightly differently. Technology and health were two categories with very low multipliers, though they both have a very high tendency to attract spammers.
It is worth noting that the study, while useful, should not be considered scientifically valid. The sampling size is too small and the traffic statistics, though likely about as good as possible, leaves room for error.
It is very difficult to imagine a more thorough study being performed without the backing of a major university, but any study in this area is likely to face similar challenges and limitations.
The other element is that this study focuses on large publishers and not regular bloggers. Whether this means that bloggers would have a much higher audience multiplier due to their smaller initial audience or a smaller one due to less copying and scraping remains to be seen.
Though, likely, the results on the different categories of content and their relative risk may transfer well to smaller publishers, a separate study would likely be needed for smaller bloggers to see how their audience compares.
Still, the purpose of the study is not to necessarily achieve these goals, but to illustrate the possibility of a much larger audience outside of the original site. This is something many have suspected but, to my knowledge, this is the first study to attempt to discuss the issue.
The bottom line is simple, most likely the audience for your content is much larger off your site than it is on it. What you do about that is completely up to you. Whether you attempt to monetize it, turn it into promotion or request removal of it (or a combination of all three) is up to the individual author and the course they want to take.
No matter what though, it is clear that this audience and its potential (both for harm and for good) is too big to ignore and it is important to start tracking and understanding what is going on. Whether it is through a professional service such as Attributor, one targeted at individuals such as Copyscape or even simple Google searches, the time to understand these uses is now.
What the Attributor study illustrates, more than anything, is the need for an even deeper understanding of how this copying takes place, what it means for publishers and what strategies could they use to maximize their benefit from it.
This is an area ripe for exploration moving forward and one that will require a great deal of creativity and work.
Disclosure: I work as a consultant for Attributor.