A few weeks back, a reader of this site noticed a spam blogger not only scraping his posts, but backdating the entries before re-posting them. The resulting site made it appear as if all of the scraped entries had appeared well before the original ones, possibly tricking both search engines and human readers.
However, in this case, the backdating was unlikely to fool anyone. The date shifting was so severe, usually spanning several weeks, that many of the entries on the spam blog were listed as posted before the events they described and, most likely, were allegedly posted on dates well before the search engine spiders made their last visits.
Still, it is not uncommon to see spam bloggers backdate their scraped posts more conservatively. From a shift of a few hours to account for time zone differences to a day or two to try and appear more legitimate, there are many reasons why a spammer’s post may appear to go up before your own.
Fortunately though, this is not a major worry for Webmasters. The timestamps we look at are all lies and both search engines and users know that to be the case.
Why Timestamps Lie
The problem with the timestamps provided by most major blogging platforms is that they are easily changed by users. There are many legitimate reasons why a blogger or Webmaster would want to alter a timestamp. You can forward date a post so that it publishes in your absence, pre-date the post so that it fits into a natural series with related items or set the date to an outlandish time so that it remains at the top of the page.
Even if there is no intentional manipulation of the timestamp by the author, it can still be wrong due to problems with the server, disagreements in time zone and other completely natural issues that can change the date a post or page is listed as going up.
For these reasons, search engines place very little faith in the timestamp of a post when determining which is the original. As such, spammers are unable to simply backdate their scraped posts and claimed the top spot in Google.
Fortunately, it is a bit more difficult that that.
It’s About Trust
If spammers could steal search engine thunder by simply backdating their posts, every spammer would be doing it. However, search engines place much more stock in how much trust the sites involve have and that is something much more difficult to obtain.
This is something that Andy Beard points out on his site. In a recent post on his blog, he responded to a previous post by David Naylor, using many of the same keywords. Though Beard’s post both came later and linked to Naylor’s post in the first paragraph, Beard’s site was able to claim the top spot in Google for a relative search term due solely to its search engine authority.
Though the story is anecdotal in nature, it illustrates how Google, and other search engines, award rankings. It is not based merely upon who is first, but rather, who is it trusts more and which site the search engines feels the reader would rather land on.
This makes backdating posts an ineffective tactic for gaining search engine ranking. If Google does not trust your site, it does not matter if your post appears to have come first or even if it truly did, you will not rank well for terms related to it.
While this is good news for many bloggers who are heavily scraped, there are other bloggers that have a great deal to worry about.
On the upside, if your site is well-established and is generally trusted by the search engines, it has a natural shield against scraping. Search engines are not likely to give a new site more authority than you on a topic, regardless of how they date their posts.
However, statistically speaking, most active blogs are fairly new and have not yet earned that level of authority. As such, they may be very vulnerable to scraping, especially considering that spam bloggers often leverage their networks to build up artificial authority. In the early months of a blog’s life, it is entirely likely that the spammers scraping its posts may have more authority and trust than the original posts, making it very hard for the site to find its footing.
In short, the problem with authority is that all sites start out with none and that makes them vulnerable to abuse from sites that have any, no matter how little.
Bloggers have very little to worry about from “clever” spammers that backdate their posts. The search engines place little to no faith in those timestamps and, most likely human readers don’t either.
The issue is not who came first, but who carries more trust. The Associated Press, for example, will always carry more trust than a one-month old blog and the fact that the blog backdates its posts is irrelevant.
In short, it is more important to cull and nurture this kind of relationship than it is to simply be first. This is not just a large part of what prevents spam bloggers from simply taking over the Web, but also part of the reason why new bloggers often struggle with scraping so much more severely than established.
It is very important to track and stop blog scraping, especially in the early months of a blog’s life, to further that trust and ensure that the spammers can not build an artificial reputation.
After all, the sword cuts both ways. If being first will not help the spammers, it will not help you either. Building and maintaining your authority level is the first and best step to protecting yourself against scraping, but it is one that requires both hard work on building your content and vigilance at keeping the spammers at bay.