Spam Bloggers Who Backdate

By Jonathan Bailey • May 27th, 2008 • Category: Articles, Prevention

A few weeks back, a reader of this site noticed a spam blogger not only scraping his posts, but backdating the entries before re-posting them. The resulting site made it appear as if all of the scraped entries had appeared well before the original ones, possibly tricking both search engines and human readers.

However, in this case, the backdating was unlikely to fool anyone. The date shifting was so severe, usually spanning several weeks, that many of the entries on the spam blog were listed as posted before the events they described and, most likely, were allegedly posted on dates well before the search engine spiders made their last visits.

Still, it is not uncommon to see spam bloggers backdate their scraped posts more conservatively. From a shift of a few hours to account for time zone differences to a day or two to try and appear more legitimate, there are many reasons why a spammer’s post may appear to go up before your own.

Fortunately though, this is not a major worry for Webmasters. The timestamps we look at are all lies and both search engines and users know that to be the case.


Why Timestamps Lie

The problem with the timestamps provided by most major blogging platforms is that they are easily changed by users. There are many legitimate reasons why a blogger or Webmaster would want to alter a timestamp. You can forward date a post so that it publishes in your absence, pre-date the post so that it fits into a natural series with related items or set the date to an outlandish time so that it remains at the top of the page.

Even if there is no intentional manipulation of the timestamp by the author, it can still be wrong due to problems with the server, disagreements in time zone and other completely natural issues that can change the date a post or page is listed as going up.

For these reasons, search engines place very little faith in the timestamp of a post when determining which is the original. As such, spammers are unable to simply backdate their scraped posts and claimed the top spot in Google.

Fortunately, it is a bit more difficult that that.

It’s About Trust

If spammers could steal search engine thunder by simply backdating their posts, every spammer would be doing it. However, search engines place much more stock in how much trust the sites involve have and that is something much more difficult to obtain.

This is something that Andy Beard points out on his site. In a recent post on his blog, he responded to a previous post by David Naylor, using many of the same keywords. Though Beard’s post both came later and linked to Naylor’s post in the first paragraph, Beard’s site was able to claim the top spot in Google for a relative search term due solely to its search engine authority.

Though the story is anecdotal in nature, it illustrates how Google, and other search engines, award rankings. It is not based merely upon who is first, but rather, who is it trusts more and which site the search engines feels the reader would rather land on.

This makes backdating posts an ineffective tactic for gaining search engine ranking. If Google does not trust your site, it does not matter if your post appears to have come first or even if it truly did, you will not rank well for terms related to it.

While this is good news for many bloggers who are heavily scraped, there are other bloggers that have a great deal to worry about.

Spammer Trust

On the upside, if your site is well-established and is generally trusted by the search engines, it has a natural shield against scraping. Search engines are not likely to give a new site more authority than you on a topic, regardless of how they date their posts.

However, statistically speaking, most active blogs are fairly new and have not yet earned that level of authority. As such, they may be very vulnerable to scraping, especially considering that spam bloggers often leverage their networks to build up artificial authority. In the early months of a blog’s life, it is entirely likely that the spammers scraping its posts may have more authority and trust than the original posts, making it very hard for the site to find its footing.

In short, the problem with authority is that all sites start out with none and that makes them vulnerable to abuse from sites that have any, no matter how little.

Conclusions

Bloggers have very little to worry about from “clever” spammers that backdate their posts. The search engines place little to no faith in those timestamps and, most likely human readers don’t either.

The issue is not who came first, but who carries more trust. The Associated Press, for example, will always carry more trust than a one-month old blog and the fact that the blog backdates its posts is irrelevant.

In short, it is more important to cull and nurture this kind of relationship than it is to simply be first. This is not just a large part of what prevents spam bloggers from simply taking over the Web, but also part of the reason why new bloggers often struggle with scraping so much more severely than established.

It is very important to track and stop blog scraping, especially in the early months of a blog’s life, to further that trust and ensure that the spammers can not build an artificial reputation.

After all, the sword cuts both ways. If being first will not help the spammers, it will not help you either. Building and maintaining your authority level is the first and best step to protecting yourself against scraping, but it is one that requires both hard work on building your content and vigilance at keeping the spammers at bay.

Bookmark and Share

Jonathan Bailey is The Webmaster and author of Plagiarism Today, which he founded in 2005 as a way to help Webmasters going through content theft problems get accurate information and stay up to date on the rapidly-changing field. He is also a consultant to Webmasters and companies to help them devise practical content protection strategies and develop good copyright policies.
Email this author | All posts by Jonathan Bailey

Trackbacks

  • [...] One of the more difficult challenges on the Web is determining when a page was created. We simply can not trust the date and time stamps provided with the content we read as both good guys and bad guys alike change the date of their posts as necessary. [...]

    Finding the Age of a Page - PlagiarismToday — June 6, 2008 @ 9:13 am

Viewing 4 Comments

    • ^
    • v
    At the end of the day, most SEO bloggers I know either ignore scrapers or try to take advantage of them more. If they rank for your content, most likely you wouldn't have ranked for the term anyway, and the scrapers are siloing.

    In the early days of a blogs life, I encourage blog scraping, syndication etc and continue to do so throughout its life.

    p.s. it is Dave or David Naylor
    • ^
    • v
    Andy: Thanks for the correction. I have no idea why I typed Doug but it has been fixed. That's what I get for being sleep deprived.

    It is strange that you mention that the SEO bloggers you know ignore scrapers. SEO bloggers are the most common group to approach me about stopping the problem. It is clear that there are a lot of varied opinions on this topic but it is safe to say I likely wouldn't have a business model without SEO bloggers.

    To each their own I say, I am here to help those that want to put a stop to it or at least reduce the instances.
    • ^
    • v
    Hi Jonathan - I manage a large number of blogs for clients, and the biggest frustration I run into is that many of the scrapers I've had trouble with have no contact link or info on their sites (many of them are clearly just "made for adsense" sites), and even checking the domain via a Whois search often yields no results with the prevalence of private domain registration information - sometimes a registrar will respond, often not - so it's difficult to get the offending duplicate content removed when there's no one I can contact......clients don't like to hear "there's nothing we can do"..... I do report them to Google when I find them, but it would still be nice if there were other (easy) avenues or tools.

    I'm really glad we have a resource like you to turn to for advice and help.
    • ^
    • v
    Trisha: If you have any tough cases, feel free to send me an email and have me look at them. I might be able to help. There are things that you can do, specifically filing a DMCA notice with the host, that can remove these sites even if you can't contact them.

    Just send me the URLs of a couple of cases and I'll give it a quick look for you.

    Hope that this helps!

Trackbacks

close Reblog this comment
blog comments powered by Disqus