The Gatehouse Settlement & RSS Scraping


Up until literally a few hours ago, the New York Times Co. and GateHouse media seemed to be destined for an epic courtroom clash over which much of the Web’s linking practices hung into balance.

The dispute was over a “hyper local” site created under the banner, which is part of The Boston Globe, which in turn is owned by the NYT. The site, featured links aggregated via RSS from a GateHouse-owned local blog, published under the “Wicked Local” banner.

To be clear, this was NOT a case of RSS scraping. The site was merely using the headlines and linking out to the GateHouse stories. However, GateHouse filed suit in the matter last month and the matter seemed to be heading to a trial this week (one of the fastest turnarounds I have seen).

However, the matter was abruptly settled, as was announced yesterday, much to the chagrin of legal scholars and Web technologists hoping to see a solid ruling in this area.

The details of the settlement have been slow to come to light, but a PDF on the New York Times corporate site lays out the actual terms.

What does the settlement say and what does that mean for the rest of us? That’s what we’re about to look at.

The Settlement

The settlement itself is a pretty basic document. However, like most legal documents, it spends a lot of words saying something that can be summed up pretty quickly:

  1. NYT has agreed to stop aggregating all GateHouse content.
  2. NYT has agreed to remove all existing links the best it can by March 1.
  3. GateHouse will take some undefined steps to prevent future aggregation.
  4. Provided the other terms are met, nothing in the settlement prevents deep linking of content (Meaning this settlement only pertains to automated aggregation).
  5. No money changed hands and both sides are covering their own legal fees.

In short, regarding the use of its content on NYT-owned sites, GateHouse got pretty much everything that they wanted, including a removal of all existing aggregated links and a promise to not aggregate in the future. NYT, on the other hand, avoided a legal conflict and playing any money.

It was a compromise settlement, as most are, but one can not help but feel that GateHouse just managed to bully one of the largest and most prestigious new organizations in the world.

What Does it Mean for Bloggers

The frustrating thing about settlements, such as this one, as that they do not become case law and have no bearing on future cases. If and when this kind of dispute arises again, we will be starting over from square one.

What is interesting is that the NYT is still continuing to aggregate headlines form other sources, just not from GateHouse sites. This kind of duality is hard for many bystanders to justify.

However, it doesn’t bode well for RSS aggregation that GateHouse was able to push the New York Times into stopping the systematic linking to their headlines.

On the positive side, this means that RSS scraping and spam blogging is even less likely to be viewed as legal and ethical. On the negative, it makes the future of more acceptable aggregation, such as headline widgets, more tentative.

Where I think that this is more likely heading is in a similar direction to search engine indexing, which will likely mean one of or a combination of these two things:

  1. A system similar to meta tags and/or robots.txt that will allow RSS publishers to identify how they want their feeds to be used. Disobeying the tags may be grounds for legal action.
  2. A notice and takedown system by which an RSS publisher can request their work be removed, failure to do so could also be looked at as an infringement.

In short, I feel strongly that RSS headline aggregation, since it is already a commonplace activity, will remain, for the most part, an accepted practice. However, there will be systems put into place, either using technology or the legal system, that will give RSS publishers who don’t want to allow such aggregation a means to opt out.

Wholesale, unlicensed RSS scraping, on the other hand, will remain legally dubious.


Obviously, those who are outside the case and building technologies around RSS would have preferred an actual ruling. Still, the settlement can provide clues as to the direction we are heading in with RSS aggregation, especially when you consider it’s potential use as a negotiation tool in future disputes.

This may not be the firm answers we were all hoping for, but it may be the beginning to the answers that we need.

Let us hope that when the ruling does come down, it manages to draw the line in such a way that rights and aggregation are both protected.

Copyright law is always a tricky balance and every ruling threatens to upset it. This dispute is no different.

Sort: Newest | Oldest

If you have an rss enabled site... it's not plagiarism for someone to get and post your feed.

If you don't like a TV Show, turn the channel. If you don't want to be syndicated... don't allow real simple syndication.



If you have an rss enabled site... it's not plagiarism for someone to get and post your feed.

If you don't like a TV Show, turn the channel. If you don't want to be syndicated... don't allow real simple syndication.


Jonathan Bailey
Jonathan Bailey

It is worth noting that the GateHouse case did NOT deal with RSS scraping. All the NYT sites were doing was looking at the RSS feed, pulling the headline and linking it to the original article in a widget on the side. The company was not taking any actual content from GateHouse, save perhaps a 12-word snippet in some cases.

I agree wholesale that you should not allow spammers scrapers others to use your content without attribution, that is what the whole site is about. Furthermore, I don't think others should do it even with attribution unless the content has been licensed correctly.

This wasn't a case of content theft in the traditional sense, just aggregated linking. Though GateHouse was likely within their rights, it is questionable that they would sue someone who is driving them traffic...


I've gone round and round with some sites that like to republish a digest feed of my blog (especially with photos). I get there's a fair use argument for commentary ... but without editorial oversight it's completely lost.

I mention this because I found this site not long ago: which seems to be "digesting" my site with photos. Sure it's attributed, no it's not the whole post and it does link back. But it also gets tagged up with lots of shopping links and basically, it's not theirs.

Jonathan Bailey
Jonathan Bailey

It's a real interesting problem. With text an aggregator can limit the number of words they display and, at the very least, make a strong fair use argument. Sure, they aren't adding any serious commentary, but they can argue that their use of the content is so slight and the harm so small that they should be allowed.

With photos, it is different. They can't take the first fifty pixels of an image and get anything useful (do I smell a new art project?). So, by taking the whole of the image they really are throwing any potential fair use argument out the window and, if the image is hotlinked, they're adding some computer misuse issues in as well.

Honestly, if an aggregator is repurposing your images, I'd ask them nicely to stop and, if they fail, start firing DMCA notices. It's a bit crude but it is both your bandwidth and your work.

Just my two cents!

Patrick Goff
Patrick Goff

We generate most of our own content. We take our own photographs and video, do our own interviews, and encourage our subscribers to contact us to create news items for our audience. We do training days, research and travel the world to get our content.

Why then should we let others steal it? I know this is a web tradition but so is spamming and hacking. None ofthem shouldbe legal.
If csomeone asks me and comes to a suitable arrangement with me I am happy to let them use my content, often with only an acknowledgement or a link. From its theft for commercial use by another news outlet I would expect to get some of my costs back as a fee.

Is this unreasonable? Shame gatehouse did not get the judgement they would probably have received


  1. [...] PlagiarismToday sums up the case: It was a compromise settlement, as most are, but one can not help but feel that GateHouse just managed to bully one of the largest and most prestigious new organizations in the world. [...]

  2. [...] A case in point. Last December Gatehouse Media, a small newspaper publisher, sued The New York Times Co., a big one, over its use of headlines and lead sentences to Gatehouse content on The Boston Globe site. After a month the Times caved. [...]

  3. [...] RSS came with its own set of problems. For content creators, it enabled scraping and other forms of content theft, kept visitors off the site and discouraged discussion on posts. For readers, though it enabled [...]