Blekko is a new search engine that is aiming at the search leaders, including Google, by offering a more open and more spam-free search experience. With a tagline “Slash the Web” Blekko has laid down an Internet searcher’s bill of rights that encourages users to create “slashes” and that will customize what appears in their results.
For example, if you search for “Phones” and add the /android slash you’ll only get results for related to the Android operating system. Likewise, you can use slashed to manipulate the results in various ways, including selecting a date range, a political slant or only certain kinds of sites (forums, blogs, etc.).
Two of the more interesting slashes are /duptext and /domainduptext, which supposedly will check either a page or a domain’s content to find where it is being duplicated and how it is being misused. For webmasters, this could mean a powerful new tool for tracking duplicate content on the Web and tracking down those who are misusing their work.
So, as with other systems, I put it to the test and was, in a word, disappointed with the results. Though I think Blekko has a lot of potential in other areas, it doesn’t seem that duplicate content detection is one of its better uses, at least not at this time.
How Blekko’s Plagiarism Checker Works
Using Blekko’s duplicate content detection system is actually fairly easy. All one has to do is search for the URL they want to check the content of and then add the /duptext tag to the end of the URL.
You can do this with any page on the Web and the results are usually presented in a few seconds.
As you can see, it breaks out the information by hosts and URLs and, from there, based on those that are on-site and off-site. Below the chart is a list of links where the duplicate content is present.
You can also check an entire domain for duplicate content by looking for just the domain and adding “/domainduptext” to the end. For example.
However, with this slash you get significantly less information, basically just a list of domains where your duplicate content is suspected of appearing and links to their SEO pages.
The question, however, is “How well does it work?” Unfortunately, after a few searches, the answer appears to be a disappointing one.
Testing it Out
As is typical with my tests, I decided to have Blekko do a duplicate content check on several works with a relatively known amount of plagiarism, 2 poems, one short story and one post on Plagiarism Today.
Here are the results of those tests:
Test 1: Poem 1
I tried out Blekko on an old poem of mine that I knew had seen widespread copying, both plagiarized and attributed. However, after performing the search, Blekko failed to find a single copy of the poem on any other site, even though a simple Google search finds about 40 results, though many are admittedly duplicates.
Blekko Results: 0 Google Results: 40
Test 2: Poem 2
Testing with another poem produced very similar results. However, this time Blekko didn’t even find duplicates on my site and instead simply indicated that there were no duplicates at all. However, once again, a simple Google search turned up about forty results though though, as with before, many were duplicates or copies on my domain.
Blekko Results: 0 Google Results: 39
Test 3: Story
Following the lack of luck with the two poems, I then tried an old short story of mine that had seen a small amount of copying. However, once again, Blekko failed to find any results that were not on my domain and a quick Google search turned a duplicate of the story on a DeviantArt account.
Blekko Results: 0 Google Results: 1
Test 4: PT Post
Finally, I tried an old, popular post from Plagiarism Today to see how well its content was detected. However, once again, Blekko failed to return any results and Google found a duplicate version of the piece on what appears to be a BlogSpot spam blog (one I was previously unaware of too).
Blekko Results: 0 Google Results: 1
Test 5: Whole Domain
Finally, in a bid to see what would happen if I ran my entire old literature domain through Blekko using the /domainduptext slash, it found only 6 offsite domains and 11 offsite URLs, even though many individual pieces see more reuse than that. It was missing many domains with widespread reuse of my work (legitimate and plagiarized) including blogspot.com, myspace.com and deviantart.com to name just a few.
Worse still, I couldn’t examine any of the individual links as clicking the link provided by Blekko just took me to the SEO page for that domain, not to a list of suspect URLs on the site or even to the domain itself.
In short, if I wanted to find out exactly how my content was used on these sites, it was up to me to find it.
Blekko Results: 11 Google Results: N/A
It became pretty clear that Blekko was missing a lot of duplicate content with its searches. My suspicion is that its because it tries to hone in only what it considers the best sites and cuts out spam blogs and other sites it deems to be of low value.
While this may be great for searchers, it creates a real problem when checking for duplicate content as these are often the exact sites you need to find.
However, that can’t be the only cause of the problem. If you use Blekko to do search for quotes from the relevant pieces, you get much more respectable results. Though the results aren’t nearly as good as Google in this area, they are definitely much more useful than via either of the slashes.
But the biggest problem is what one does after they find content reuse via Blekko. With the /domainduptext slash you can’t even access the individual URLs to investigate further. Using the /duptext slash is a much more robust tool, taking you to a page where the duplicate content is highlighted, but in the pages I did check the results were hit and miss, as many as half of the pages linked had no duplicate content at all.
All in all, as useful as Blekko is for other kinds of searches, or at least as useful as it might be, it doesn’t handle duplicate content searches very well, certainly no better than Copyscape or even regular Google.
None of this is meant to be a slight against Blekko in any other regard. The other searches I did with it were actually pretty useful and, though I wasn’t swayed enough to change my default search engine, I did enjoy a lot of what Blekko had to offer and can see myself making some slashes for my use.
In the end though, it just isn’t a good tool for detecting plagiarism, copyright infringement or other kinds of duplicate content. Though the idea is solid and its integration with other SEO functions very appealing to some, it just isn’t accurate or complete enough at this time.
Still, as with other tools I’ve reviewed, there is hope for the future. But it remains to be seen if this will be a priority for Blekko, which is clearly targeting a more generic search audience. Duplicate content detection is a high-specialized skill and the tools to find a keyword on the Web aren’t the same as the ones to find an article on every site it appears.
As such, this will most likely remain a nice idea by a decent search engine that just isn’t practical.