The Gatehouse Settlement & RSS Scraping


Up until literally a few hours ago, the New York Times Co. and GateHouse media seemed to be destined for an epic courtroom clash over which much of the Web’s linking practices hung into balance.

The dispute was over a “hyper local” site created under the banner, which is part of The Boston Globe, which in turn is owned by the NYT. The site, featured links aggregated via RSS from a GateHouse-owned local blog, published under the “Wicked Local” banner.

To be clear, this was NOT a case of RSS scraping. The site was merely using the headlines and linking out to the GateHouse stories. However, GateHouse filed suit in the matter last month and the matter seemed to be heading to a trial this week (one of the fastest turnarounds I have seen).

However, the matter was abruptly settled, as was announced yesterday, much to the chagrin of legal scholars and Web technologists hoping to see a solid ruling in this area.

The details of the settlement have been slow to come to light, but a PDF on the New York Times corporate site lays out the actual terms.

What does the settlement say and what does that mean for the rest of us? That’s what we’re about to look at.

The Settlement

The settlement itself is a pretty basic document. However, like most legal documents, it spends a lot of words saying something that can be summed up pretty quickly:

  1. NYT has agreed to stop aggregating all GateHouse content.
  2. NYT has agreed to remove all existing links the best it can by March 1.
  3. GateHouse will take some undefined steps to prevent future aggregation.
  4. Provided the other terms are met, nothing in the settlement prevents deep linking of content (Meaning this settlement only pertains to automated aggregation).
  5. No money changed hands and both sides are covering their own legal fees.

In short, regarding the use of its content on NYT-owned sites, GateHouse got pretty much everything that they wanted, including a removal of all existing aggregated links and a promise to not aggregate in the future. NYT, on the other hand, avoided a legal conflict and playing any money.

It was a compromise settlement, as most are, but one can not help but feel that GateHouse just managed to bully one of the largest and most prestigious new organizations in the world.

What Does it Mean for Bloggers

The frustrating thing about settlements, such as this one, as that they do not become case law and have no bearing on future cases. If and when this kind of dispute arises again, we will be starting over from square one.

What is interesting is that the NYT is still continuing to aggregate headlines form other sources, just not from GateHouse sites. This kind of duality is hard for many bystanders to justify.

However, it doesn’t bode well for RSS aggregation that GateHouse was able to push the New York Times into stopping the systematic linking to their headlines.

On the positive side, this means that RSS scraping and spam blogging is even less likely to be viewed as legal and ethical. On the negative, it makes the future of more acceptable aggregation, such as headline widgets, more tentative.

Where I think that this is more likely heading is in a similar direction to search engine indexing, which will likely mean one of or a combination of these two things:

  1. A system similar to meta tags and/or robots.txt that will allow RSS publishers to identify how they want their feeds to be used. Disobeying the tags may be grounds for legal action.
  2. A notice and takedown system by which an RSS publisher can request their work be removed, failure to do so could also be looked at as an infringement.

In short, I feel strongly that RSS headline aggregation, since it is already a commonplace activity, will remain, for the most part, an accepted practice. However, there will be systems put into place, either using technology or the legal system, that will give RSS publishers who don’t want to allow such aggregation a means to opt out.

Wholesale, unlicensed RSS scraping, on the other hand, will remain legally dubious.


Obviously, those who are outside the case and building technologies around RSS would have preferred an actual ruling. Still, the settlement can provide clues as to the direction we are heading in with RSS aggregation, especially when you consider it’s potential use as a negotiation tool in future disputes.

This may not be the firm answers we were all hoping for, but it may be the beginning to the answers that we need.

Let us hope that when the ruling does come down, it manages to draw the line in such a way that rights and aggregation are both protected.

Copyright law is always a tricky balance and every ruling threatens to upset it. This dispute is no different.

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free