RSS Brief: Another Scraping/Spam Threat

Yesterday, the makers of the controversial Pay Per Post service launched a new tool designed to make blog reading faster, RSS Brief.

The idea is that the service takes long posts, like what you might expect here on Plagiarism Today, and condenses them down into a few short sentences.

Though the service sounds convenient and useful, it also raises significant copyright and spam issues that the company has not addressed as of yet.

Though the service is only in alpha, the time to consider these issues is now, before the service is completed and becomes an active part in many people’s blogging lives and it is too late to change course.

How it Works… In Brief

The idea behind RSS Brief is pretty simple. You punch in the URL of your favorite blog, RSS Brief will read the entries in the feed and use what its creators refer to as “natural language technology” to parse the text down to a few sentences.

The idea is that, unlike traditional truncating that simply cuts off everything but the first few sentences, you will receive an effective summary of the post. This should, in theory, allow you to get the basic idea of the post and move on.

The technology, however, is questionable at this point. Plagiarism Today’s RSS Brief page shows some of the weaknesses. Though PT is the type of site targeted by this service, it utterly fails to give a meaningful summary of any of the stories in the RSS feed. Instead, on most stories, it seems to simply do the kind of truncating it claimed to avoid.

However, finding glitches in alpha-stage technology is not as disturbing as the copyright and spam issues that this service raises. It seems that, in the rush to create this service, the programmers completely avoided any and all issues about the copyright issues it might raise and how their technology might be abused.

Copyright Issues

What RSS Brief does, fundamentally, is take a lengthy post and make a derivative work of it. Under copyright law, the creation of derivative works is the sole right of the copyright holder.

Though there is a decent fair use argument for RSS Brief in that the use is largely transformative and only takes a small portion of the original, there is a strong argument against them as well. Their use of the work, by their own design, takes the heart of the original material, it does so for a commercial purpose, and RSS Brief is designed to replace the original work, thus damaging the market for the author’s work, especially if the author has ads in the feed.

Worse still, the service continues to “summarize” even shorter works, some as short as sixty words. This severely raises the amount of the original work used and lowers the likelihood that the use will be found fair.

However, most damming of all is the 1841 case Folsom v. Marsh (PDF) that found the following when dealing with the issue of “transformative” use:

(if a user) cites the most important parts of the work, with a view, not to criticise, but to supersede the use of the original work, and substitute the review for it, such a use will be deemed in law a piracy.

Though it is impossible to predict whether or not a use will be deemed “fair” until it goes before a judge and/or jury, there seems to be a lot of reason to doubt whether or not RSS Brief will pass muster in that situation.

Most damming of all being its stated attempt to replace the original work and the lack of any opt out mechanism, such as the one Google uses to ensure its cache is fair use.

The Spam Issue

Though many readers would love an “important parts only” feed, so would spammers. Fortunately for them, RSS Brief offers up just such a feed on their service, one that essentially scrapes, processes and rebroadcasts the original feed in their “brief” format.

Spammers will, most likely, grow to love these feeds. Not only are they keyword rich and to the point, but can easily be combined with other feeds from the same service to create rapid-fire blogs with short posts, something search engines seem to love.

Already spammers take advantage of Technorati, Icerocket and Google Blog Search feeds for much the same purpose. They enjoy the keyword density those feeds provide and the fact that they raise fewer copyright issues than scraping full feeds.

Though an RSS Brief feed might be less keyword rich, it would also be much more modified from the original, making it harder for search engines and Webmasters to spot. Depending on the nature of the spammer, they might find this RSS Brief feeds preferable to the existing alternatives.

Also, much like the search feeds, RSS Brief strips out any and all digital fingerprints as well as copyright information contained in the feed. It’s rush to get to “just the facts” causes it leave out some very critical elements to bloggers. This also makes the use of RSS Brief feeds impossible to track, unless they report usage to FeedBurner, and leaves Webmasters in the dark about how many are subscribing to the feed and how they are using it.

Finally, since Pay Per Post is not a search company, it’s not in a position to punish people who do scrape their feeds. Technorati and Google can blacklist sites that scrape their search results, Pay Per Post has no such card to play.

If spammers aren’t already looking at RSS Brief as a new tool, they likely will be soon. They seem to seize on new technology as fast as they can and I doubt this service will be any exception.

Conclusions

As interesting as the idea of RSS Brief is, it is poorly executed. As of this writing, there is no means for Webmasters to opt out, no clear safeguards against spam blogging and no consideration to Webmasters. There is

Though Pay Per Post has always been a controversial company, they have always been a company that seemed to value bloggers and the role they play, albeit in a somewhat backhanded way. That is why it seems so odd to me that they created this service with so little consideration to them.

One day they are paying bloggers for reviews, the next they are taking their feeds, without permission or an opt out mechanism, and creating derivative works to be redistributed over RSS.

Hopefully they can get these issues as well as their technical glitches straightened out. The idea is interesting but doing so in the way they are doing it is very dangerous to both them, bloggers and the Internet at large.

It borders on irresponsible and if Pay Per Post is going to change their image, they need to put the good of the Web and of bloggers first. They made that mistake when they first launched their primary service and it seems that history is, in a strange way, repeating itself.

Hopefully that won’t be the case.

Note: If there is an interest in an excerpt-only “just the facts” feed for this site, I will create one. WordPress has the tools to do that and I’ll simply create the second feed this weekend. If interested, please post a comment below or send me an email.

24 Responses to RSS Brief: Another Scraping/Spam Threat

  1. Andy Beard says:

    The same argument would be true of Technorati who serve full text of an article on their site, with formatting removed and images removed. That is a derivative.I don't think the negative argument really has a context. A summary linking through to a full version seem to me an ideal option.If it is used by splogs, it would actually be a very legitimate option in my opinion, though I relalise our views often differ on many things ;)

  2. Andy Beard says:

    The same argument would be true of Technorati who serve full text of an article on their site, with formatting removed and images removed. That is a derivative.

    I don’t think the negative argument really has a context. A summary linking through to a full version seem to me an ideal option.

    If it is used by splogs, it would actually be a very legitimate option in my opinion, though I relalise our views often differ on many things ;)

  3. Andy Beard says:

    When logged in, click home so it is showing your Technorati favorites.Then click "Show Details" on one of the storiesA window pops up with full contentThat has been there for as long as I remember (2 years maybe)The only thing I would be worried about with feeding the RSSBrief feeds to some kind of aggregated blog would be breaking the copyright of…. PayPerPost as they in theory are the copyright holder of the briefs.

  4. JB says:

    The problem with the Technorati comparison is simple. Nothing in Technorati is designed to actually replace the original feed. All links on Technorati that I have seen point back to the original site or original feed.

    You can subscribe to Technorati watchlists, but those only display the beginning snippets, not what is supposed to be the heart of the work, as RSS Brief does by their own description. Looking at fair use and transformative use decisions, I see bad things for RSS Brief.

    It’s a separation of degrees, I grant, but search engines notoriously flirt with the line on fair use and RSS Brief seems to take that line and push it a few more steps into the really dark grey area.

    Strangely, the thing that might save RSS Brief is that it doesn’t actually work. If it did and successfully replaced the original work, it’d have a much greater problem in my eyes.

    Once again though, if anyone wants an excerpt feed of PT, I’ll offer it.

    (Note: You say that you’ve seen the full text of articles on the site, I haven’t seen that. Here’s PT’s site link there:

    http://www.technorati.com/blogs/www.plagiarismtoday.com

    There are only intros to the articles and links to the page, admittedly the intros are a bit longer than on, say, Google, but nothing too outrageous. Is there a page I don’t know about?)

  5. JB says:

    The problem with the Technorati comparison is simple. Nothing in Technorati is designed to actually replace the original feed. All links on Technorati that I have seen point back to the original site or original feed.

    You can subscribe to Technorati watchlists, but those only display the beginning snippets, not what is supposed to be the heart of the work, as RSS Brief does by their own description. Looking at fair use and transformative use decisions, I see bad things for RSS Brief.

    It’s a separation of degrees, I grant, but search engines notoriously flirt with the line on fair use and RSS Brief seems to take that line and push it a few more steps into the really dark grey area.

    Strangely, the thing that might save RSS Brief is that it doesn’t actually work. If it did and successfully replaced the original work, it’d have a much greater problem in my eyes.

    Once again though, if anyone wants an excerpt feed of PT, I’ll offer it.

    (Note: You say that you’ve seen the full text of articles on the site, I haven’t seen that. Here’s PT’s site link there:

    http://www.technorati.com/blogs/www.plagiarismt

    There are only intros to the articles and links to the page, admittedly the intros are a bit longer than on, say, Google, but nothing too outrageous. Is there a page I don’t know about?)

  6. Andy Beard says:

    When logged in, click home so it is showing your Technorati favorites.

    Then click “Show Details” on one of the stories

    A window pops up with full content

    That has been there for as long as I remember (2 years maybe)

    The only thing I would be worried about with feeding the RSSBrief feeds to some kind of aggregated blog would be breaking the copyright of…. PayPerPost as they in theory are the copyright holder of the briefs.

  7. Will says:

    The concept of this service really bothers me. I don't mind an excerpt of something I write with a link back to the article. But… I do not want a machine or human summarizing what I write with the intent that there is then no need to read the full article. I already try to write posts as succinctly as possible, who are they to decide how to cut it down even further. There definitely needs to be an opt out for this, but then most people would probably opt out and RSS Brief would fail.I (think) Andy is right about the full content. I seem to remember that from a while ago. I tried to verify it just now, but the Technorati site is not loading the home page or favorites for me tonight.Glad you track this stuff Jonathan. Sometimes when I read one of your posts my head starts to spin!

  8. Will says:

    The concept of this service really bothers me. I don’t mind an excerpt of something I write with a link back to the article. But… I do not want a machine or human summarizing what I write with the intent that there is then no need to read the full article. I already try to write posts as succinctly as possible, who are they to decide how to cut it down even further. There definitely needs to be an opt out for this, but then most people would probably opt out and RSS Brief would fail.

    I (think) Andy is right about the full content. I seem to remember that from a while ago. I tried to verify it just now, but the Technorati site is not loading the home page or favorites for me tonight.

    Glad you track this stuff Jonathan. Sometimes when I read one of your posts my head starts to spin!

  9. JB says:

    Andy: Ah, I see what you're saying. That would make it basically an RSS reader, like Bloglines or Google Reader. At that point, you aren't replacing the feed, you're simplly subscribing to it and using Technorati to do it. Technorati doesn't appear to be creating a new feed that you are supposed to subscribe to instead of the original, like RSS Brief Does.Will: The concept bothers me too. There's a lot wrong with it from a legal and ethical standpoint. Hopefully this is just the alpha and we're going to see these issues addressed sooner rather than later.

  10. JB says:

    Andy: Ah, I see what you’re saying. That would make it basically an RSS reader, like Bloglines or Google Reader. At that point, you aren’t replacing the feed, you’re simplly subscribing to it and using Technorati to do it. Technorati doesn’t appear to be creating a new feed that you are supposed to subscribe to instead of the original, like RSS Brief Does.

    Will: The concept bothers me too. There’s a lot wrong with it from a legal and ethical standpoint. Hopefully this is just the alpha and we’re going to see these issues addressed sooner rather than later.

  11. Steve says:

    As a lawyer and as a blogger, I this new service as a problem. I am concerned about the automated summaries it will produce of content from blogs and RSS or Atom feeds. If someone chooses to read the summary rather than my entire blog entry, they may finish their reading with a significantly different idea than if they had read the entire article.Depending on the accuracy and reliability of the algorithm that generates the summary, by omitting various words and phrases essential to the context of my blog entry, the RSS Brief service could, theoretically, defame me by attributing to me something I did not say or leaving out something important that I did say.This is a significantly different situation than, for example, my RSS aggregator which gives me the title and a hyperlink to a blog post, along with the first 15 or 20 words of that post so I can see how it begins. My RSS aggregator (FeedDemon) performs a very simple function, akin to presenting to me the first 2 lines of a page so I can read them quickly and decide if I want to read that item or skip to the next one.By contrast, it appears that RSS Brief might summarize the contents of RSS feeds, or the full text of the blog entries listed in the feeds, and present them to me instead of the entire blog entries themselves. However, I am somewhat confused about how it will work. I tested it with one of my blogs and I can't see the summaries it's currently producing as a reasonable substitute for the full blog entries.Of course, since we are seeing an alpha version, we can expect more sophistication to appear later. At the moment, we ought to monitor it to see what develops.

  12. Steve says:

    As a lawyer and as a blogger, I this new service as a problem. I am concerned about the automated summaries it will produce of content from blogs and RSS or Atom feeds. If someone chooses to read the summary rather than my entire blog entry, they may finish their reading with a significantly different idea than if they had read the entire article.

    Depending on the accuracy and reliability of the algorithm that generates the summary, by omitting various words and phrases essential to the context of my blog entry, the RSS Brief service could, theoretically, defame me by attributing to me something I did not say or leaving out something important that I did say.

    This is a significantly different situation than, for example, my RSS aggregator which gives me the title and a hyperlink to a blog post, along with the first 15 or 20 words of that post so I can see how it begins. My RSS aggregator (FeedDemon) performs a very simple function, akin to presenting to me the first 2 lines of a page so I can read them quickly and decide if I want to read that item or skip to the next one.

    By contrast, it appears that RSS Brief might summarize the contents of RSS feeds, or the full text of the blog entries listed in the feeds, and present them to me instead of the entire blog entries themselves. However, I am somewhat confused about how it will work. I tested it with one of my blogs and I can’t see the summaries it’s currently producing as a reasonable substitute for the full blog entries.

    Of course, since we are seeing an alpha version, we can expect more sophistication to appear later. At the moment, we ought to monitor it to see what develops.

  13. JB says:

    Steve,I'm glad to see that I"m not the only one that sees some serious problems with this service. Of course, I didn't even think about the defamation issue. That could be a very serious problem.If it summarized the sentence "I am not a plagiarist" into "I am a plagiarist" or something to that effect, it could be a major problem, I completely agree.I have to wonder if the makers of this service really thought the legal implications through.

  14. [...] Grrrrr. Someone beat up the PayPerPost guys please. (tags: blogging payperpost copyright plagiarism) [...]

  15. JB says:

    Steve,

    I’m glad to see that I”m not the only one that sees some serious problems with this service. Of course, I didn’t even think about the defamation issue. That could be a very serious problem.

    If it summarized the sentence “I am not a plagiarist” into “I am a plagiarist” or something to that effect, it could be a major problem, I completely agree.

    I have to wonder if the makers of this service really thought the legal implications through.

  16. Andy Beard says:

    Well I find more problem with someone like the Associated Press take the content from a site, thin it down and don’t provide a link, then syndicate it glogbally as a top10 instead of a top100

    http://andybeard.eu/2007/03/are-yahoo-guilty-of-unethical-plagiarism-with-syndicated-content.html

    Even worse the Yahoo article was the one that gained massive traffic from Digg, and 100s of links.

    That is long term financial damage

    If someone subscribes to a summary, they know it is a summary and can follow the link to the original article to clarify facts.

    I honestly have more problems with sharing with Google Reader than someone creating summaries of my content for easier consumption.

  17. Andy Beard says:

    Well I find more problem with someone like the Associated Press take the content from a site, thin it down and don’t provide a link, then syndicate it glogbally as a top10 instead of a top100

    http://andybeard.eu/2007/03/are-yahoo-guilty-of

    Even worse the Yahoo article was the one that gained massive traffic from Digg, and 100s of links.

    That is long term financial damage

    If someone subscribes to a summary, they know it is a summary and can follow the link to the original article to clarify facts.

    I honestly have more problems with sharing with Google Reader than someone creating summaries of my content for easier consumption.

  18. Will says:

    I read Andy's article and that really seems out of line. I think a site like Yahoo should NEVER republish content of any amount without a link back to the ORIGINAL author. Even if the site Yahoo gets their story from neglected the link, Yahoo has an obligation, moral and ethical, if even maybe legal, to link to the original source. Yahoo can not claim innocence as they know only too well the value of something like this to the original author, and have the resources to get a link to the original writing.I would think an attorney could have a field day with a suit claiming damages to a writer when their content, in any form. is published on Yahoo without attribution and a link?

  19. Will says:

    I read Andy’s article and that really seems out of line. I think a site like Yahoo should NEVER republish content of any amount without a link back to the ORIGINAL author. Even if the site Yahoo gets their story from neglected the link, Yahoo has an obligation, moral and ethical, if even maybe legal, to link to the original source. Yahoo can not claim innocence as they know only too well the value of something like this to the original author, and have the resources to get a link to the original writing.

    I would think an attorney could have a field day with a suit claiming damages to a writer when their content, in any form. is published on Yahoo without attribution and a link?

  20. JB says:

    Andy and Will: For some reason my previous comment didn't take. I'll just say that I agree completely that what the AP and Yahoo did here is entirely wrong. If I had been the content owner, I would have sought to register the works with the USCO and then file against both of them considering what happened. That is especially true for the AP.I agree that these are worse sins than what RSS Brief is doing and that we have to prioritize our efforts. However, we to at least look at all types of content misuse. We can't ignore one type because it's not the worst possible. That's like ignoring assault because it's not murder.Still, I agree that the AP and Yahoo need to pay for this. This was a tremendous faux pas.

  21. JB says:

    Andy and Will: For some reason my previous comment didn’t take. I’ll just say that I agree completely that what the AP and Yahoo did here is entirely wrong. If I had been the content owner, I would have sought to register the works with the USCO and then file against both of them considering what happened. That is especially true for the AP.

    I agree that these are worse sins than what RSS Brief is doing and that we have to prioritize our efforts. However, we to at least look at all types of content misuse. We can’t ignore one type because it’s not the worst possible. That’s like ignoring assault because it’s not murder.

    Still, I agree that the AP and Yahoo need to pay for this. This was a tremendous faux pas.

  22. Apart from all that has been said in the various comments, I would take serious objection to the whole process on principle. The blogger has taken pains to be as detailed as he thinks is necessary. How can a precis convey what the writer wants to convey? If at all any one should be responsible for this, surely, this should be the original writer?

  23. JB says:

    Ramana: I wonder that myself. You simply can not summarize 1000 words of content in 50 words of summary. I noticed that no one, in email or comment, asked for a summary feed of this site. I'm going to guess that there is no interest in it. Therefore, I believe others here feel the same way.

  24. JB says:

    Ramana: I wonder that myself. You simply can not summarize 1000 words of content in 50 words of summary. I noticed that no one, in email or comment, asked for a summary feed of this site. I’m going to guess that there is no interest in it.

    Therefore, I believe others here feel the same way.

Leave a Reply

STAY CONNECTED