Knol Spam: Two Weeks Later

knol-logo.pngA little bit over two weeks ago I, as well as many others, expressed concern Google Knol would become a haven for Web spam.

At that time, these predictions were mere speculation. Though a video would quickly come online showing that the spammers could already post automatically to Knol, there was little reason to suspect a spam attack beyond the nature of the service and Google’s history in this area.

This raises the question, now that the headlines have faded, has Google Knol become a haven for spam? Though it is hard to tell how serious things are given the short time span that has passed, it seems clear that at least some of the predictions were eerily correct.

Methodology

knol-results.pngIn order to get a rough picture for how much spam existed on Google Knol and how it was affecting the rankings within the service, I decided to perform a very small study.

I searched for six keywords, three that are spammy in nature, three that are not. I then looked at the first page of the results and tried to find the highest-ranked case of duplicate content I could find. I did this using only string matching, so article synonymizers might have escaped detection.

This was just a cursory check and was designed to only locate articles that were clearly duplicate content. Also, this check was ONLY designed to catch duplicate content, not necessarily spam and, as you can see in many cases, the duplicate content was uploaded by the author.

Finally, if a search term did not produce at least one full page of results, ten items or more, I did not use it because such searches likely favored spammers.

However, despite my attempts to give Knol the benefit of the doubt, I had little trouble finding junk content.

Note: All links to Google Knol have been nofollowed just in case.

Keyword: Poker

First Questionable Result

Rank: 6
Number of Stars: 1
Comments: 0
Revisions: 2
Original Article:: British Casino
Commentary: Appears to be a manual copy/paste job taking only a relevant portion of an article about different types of poker. The content also appears on many spam blogs. In short, this appears to be the result of a spam attack on Knol, at least at this time.

Keyword: Viagra

First Questionable Result

Rank: 2
Number of Stars: 5
Comments: 0
Revisions: 2
Original Article:: Viagra Web Site
Commentary: This is a tricky one as it appears to have been passed around the Web, especially via spam blogs, for a very long time. Most of the content is from the Viagra site with changes made to eliminate formatting. Once again, this appears to be the likely result of a spam attack, probably from an article database.

Keyword: Mortgage

First Questionable Result

Rank: 2
Number of Stars: 5
Comments: 0
Revisions: 2
Original Article:: Mortgage Site?
Commentary: This one is almost impossible to tell who has the original article. However, lengthier versions of this piece have been passed around the Web for a long time by all accounts. One of the most likely candidates is a mortgage site linked above. However, this Knol appears to be a likely spam attack as well.

Keyword: Apple

First Questionable Result

Rank: 1
Number of Stars: 5
Comments: 0
Revisions: 2
Original Article:: Freebase & Wikipedia
Commentary: The first non-spam keyword is the first case of clearly duplicate content being number one. Worst of all, this content was lifted from the Wikipedia entry on Apple, which has since been changed.

Keyword: Movie

First Questionable Result

Rank: 4
Number of Stars: 5
Comments: 0
Revisions: 3
Original Article:: Company Web Site
Commentary: This isn’t a case of copyright infringement or scraping, the company itself is repurposing it’s work for Google Knol in a bid to get links to its site up. The entire article has been put up on Knol, complete with a link back to the site’s home page. Not necessarily unethical, but still somewhat against the spirit of Knol.

Keyword: Dog

First Questionable Result

Rank: 1
Number of Stars: 5
Comments: 0
Revisions: 3
Original Article:: Blog
Commentary: Another case of an author reposting their own work. Though the original blog appears to be drowning in ads, it still seems to be legitimate. Since the content could not be found anywhere else and the names match, I am forced to assume that the original author uploaded it to both places.

Results

When it was all said and done, none of the searches went more than six results without duplicate content and four of the six had duplicate material in the first two items.

This paints a very grim picture of Knol but it appears that much of the problem is not spammers, but authors seeking extra exposure for older works.

While this is clearly not a copyright violation and also not against Google’s policy on Knol, it does little to help the site. If Knol is to succeed, it needs a large volume high-quality original content.

However, at this point, Google has done a very poor job keeping duplicate content out of Knol and/or reducing its ranking. Also interesting is that the spam content and other duplications almost always have high ratings, indicating the possibility that spammers have started to game the voting system as well.

Things don’t look good for Knol. Considering it has barely been two weeks and things already look this bad, it seems likely that they are only going to get worse.

Google Hits Back

similar-content-knol.pngOne thing I noticed as I was performing this “study” was that Google has added an element to the sidebar of every Knol entry entitled “Similar Content on the Web”.

This section provides links to other copies of the work that Google has detected on the Web along with percentages to indicate how much of the content has been duplicated.

It is unclear at this time if this section updates to point out possible cases of plagiarism for authors posting original content to Knol or if it is simply designed to indicate where the content may have come from and is a one-time check.

It is also interesting that Google’s search engine often detected many other results not found in the sidebar, including, at least a few times, the original site.

I will have more on this later.

Conclusions

The bottom line right now is that Google Knol is already a haven for spam and duplicate content. Whether it is authors republishing their work in hopes of getting a little extra Adsense revenue or spammers pushing out junk content, the junk results are high in the search and only get uglier the lower down you go.

Though Wikipedia has had its issues with spam in the past, I can not recall it ever looking quite like this.

If Google Knol is to ever have a chance to become anything other than a spam haven, it needs to hit back now and do its part to keep duplicate content out. Google clearly has the tools already, but has been timid about using them, even to adjust ranking.

Knol needs original content to thrive but if it does not start encouraging and rewarding it now, it may create a situation where the few original authors are buried under a pile of duplicate content and searchers have no motivation to look to Knol for new information as almost all of it will be available elsewhere.

Hopefully, this can be the beginning of Google taking these issues seriously and using its tools to hit spammers hard.

Related Links

Mashable on Knol Spam
Problogger on Knol
Demerzel on Begging of Knol Spam
Nate Nead on Knol Dofollow/Nofollow
Usefularts on Knol Spam

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free