<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Plagiarism TodaySpam-Blogging | Plagiarism Today</title>
	<atom:link href="http://www.plagiarismtoday.com/tag/spam-blogging/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.plagiarismtoday.com</link>
	<description>Content Theft, Plagiarism, Copyright Infringement</description>
	<lastBuildDate>Mon, 13 Feb 2012 17:55:20 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>FAQs: The Basics of RSS Scraping</title>
		<link>http://www.plagiarismtoday.com/2011/05/09/faqs-the-basics-of-rss-scraping/</link>
		<comments>http://www.plagiarismtoday.com/2011/05/09/faqs-the-basics-of-rss-scraping/#comments</comments>
		<pubDate>Mon, 09 May 2011 18:21:39 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[RSS scraping]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Spam-Blogging]]></category>
		<category><![CDATA[Spamming]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=9659</guid>
		<description><![CDATA[RSS Scraping is a problem nearly every webmaster is going to have to face at some point, here's the basics on what it is and what to do about it.]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2011/05/rss-big-icon1-250x250.png" alt="" title="rss-big-icon" width="250" height="250" class="alignleft size-medium wp-image-9664" />RSS scraping is one of the most common and most frustrating types of content theft bloggers, forum admins and other site owners will face as they grow their presence online. Not only does it, often, allow the scraper to grab all of the content from the original site easily, but it also is a tactic used by spammers, who not only are able to exploit the content for search engine gains, but are also among the most despised infringers online.</p>
<p>As such, it&#8217;s important for all webmasters and content creators to be aware of what RSS scraping is, how it works and where it&#8217;s going in the future. Even though <a href="http://www.staynalive.com/2011/05/twitter-and-facebook-both-quietly-kill.html">RSS as a protocol may be on the ropes</a>, RSS scraping is not a problem that&#8217;s going away and, in fact, may be getting a lot worse in the coming years.</p>
<p>With that in mind, here is a quick FAQ on some of the more common questions asked about RSS scraping and what can be done about it.<span id="more-9659"></span></p>
<h4>What is RSS?</h4>
<p>RSS, sometimes referred to as Really Simple Syndication or <a href="http://www.whatisrss.com/">Rich Site Summary</a>, is a protocol that makes it easy for other sites and tools to access the content in your site by formatting your content in a consistent, easy-to-parse way.</p>
<p>Contrary to an HTML document, which could have the content be anywhere on the page, RSS indicates clearly what is the headline, body and other elements of the content. This makes it easy to grab the content and display it elsewhere without the surrounding formatting and HTML code.</p>
<h4>How is RSS Normally Used?</h4>
<p>Traditionally, RSS has been used to enable readers to subscribe to a site using various RSS readers such as <a href="http://www.google.com/reader">Google Reader</a>, <a href="http://www.feeddemon.com/">Feed Demon</a> and even many mail clients. </p>
<p>However, RSS has also been used to power other services, such as <a href="http://www.mailchimp.com/features/rss-to-email/">email newsletters</a> and even <a href="http://www.facebook.com/RSS.Graffiti">Facebook integration</a>.</p>
<h4>What is RSS Scraping?</h4>
<p>RSS scraping is when a third party, usually a spammer, grabs the content in an RSS and republishes it wholesale on another site. </p>
<p>In this regard, RSS scrapers work a great deal like Google Reader, grabbing your site&#8217;s content and displaying it on a site but, where Google Reader places the content behind a password protected wall that can only be accessed by the subscriber (or those who are shared the individual story), scrapers instead place the content on a public site for anyone to view, including search engines.</p>
<h4>Why do People Scrape RSS Feeds?</h4>
<p>Spammers seek high rankings in search engines so they can get traffic to display their ads against or sell products with. To do this, they need content but creating content by hand is time-consuming and difficult, especially when much of it is going to make no difference in the search engines.</p>
<p>RSS scraping is an easy way for spammers, and other sites, to quickly fill their pages with content, even if the content comes solely from other sites.</p>
<h4>How Can RSS Scraping Hurt Me?</h4>
<p>In most cases, RSS scraping doesn&#8217;t hurt. Google and other search engines have become savvy enough about spam that most of the time, they don&#8217;t give much credence to spam sites, keeping them from getting a lot of traffic or harming you in the rankings. </p>
<p>However, the system is far from perfect and there are many times spammers outrank the sites they scrape from for relevant terms. This is especially true with new sites or those that don&#8217;t have a strong search engine presence.</p>
<p>Less likely is that others may confuse the spam site as either being the original site or as being one endorsed by you, thus actively taking traffic from you. Few people, however, make this mistake with spam sites as the distinction is usually very clear.</p>
<p>All in all, the risk from an individual case of RSS scraping is actually fairly low, but the problem is that there is rarely just one or two such scrapers working at any given time.</p>
<h4>What Can I Do About RSS Scraping?</h4>
<p>Dealing with RSS scraping starts with good SEO practices. If you link between your posts, get good inbound mentions and earn social networking shares, odds are that RSS scraping won&#8217;t greatly impact you.</p>
<p>If it does, you can alway seek to have the content removed by either <a href="http://www.plagiarismtoday.com/stopping-internet-plagiarism/4-contacting-the-host/">filing a DMCA notice with the spammer&#8217;s host</a> or, if that fails, <a href="http://www.plagiarismtoday.com/stopping-internet-plagiarism/6-when-all-else-fails/">sending one to Google</a>. </p>
<p>If RSS scraping becomes a more serious and more recurring problem, you  may want to consider truncating your feeds or eliminating them. <a href="http://www.plagiarismtoday.com/2007/01/04/the-six-worst-ways-to-protect-content/">Though that would be an extreme last resort</a>.</p>
<h4>Is RSS Scraping Illegal?</h4>
<p>Some have made arguments that distributing your content via an RSS feed, even if you didn&#8217;t realize you were doing it, creates an implied license to use it in this manner. However, <a href="http://www.plagiarismtoday.com/2006/08/29/why-rss-scraping-isnt-ok/">there are many problems with that and other related arguments on RSS scraping</a>. </p>
<p>Generally, RSS scraping is considered to be copyright infringement, though there are <a href="http://www.plagiarismtoday.com/2006/08/24/linkworthy-scraping-as-a-legal-minefield/'">other legal arguments against RSS scraping</a> as well. </p>
<h4>What if I Want to Encourage RSS Scraping and Reuse</h4>
<p>If you want others to scrape your RSS feed, you can actually give blanket permission to do that by <a href="http://wiki.creativecommons.org/Syndication">inserting a Creative Commons license into your feed</a>. This will let bots that do scraping know your intentions and, those that are complying with the law should be able to follow your wishes.</p>
<h4>How Can I Track RSS Scraping?</h4>
<p>Many people will find RSS scrapers on accident when they search for keywords relevent to their blog or site. However, you can keep track of your content using automated tools like <a href="https://fairshare.attributor.com/fairshare/">Fairshare</a> that are designed for tracking dynamic content.</p>
<p>In the end though, its best to keep an eye on the search engines for terms that others commonly find your site through as scrapers will often show up for those same results though, initially, they will likely be lower than your site.</p>
<h4>What is the Future of RSS Scraping</h4>
<p>Though it&#8217;s difficult to predict what spam tactics will be popular in the coming years, RSS scraping has been a problem for at least six years and is continuing today.</p>
<p>That being said, it has fallen out of favor with many spammers, who prefer content generation or scraping excerpts from feeds to avoid duplicate content penalties in the search engines. Still, many active spammers use the method though spammers have clearly become more diversified in this area.</p>
<h4>Bottom Line</h4>
<p>There&#8217;s no doubt that RSS scraping can be and often is very annoying and very problematic. That being said, there&#8217;s no reason that it should be a major headache or that it should become a reason to walk away from your site. Most cases of RSS scraping don&#8217;t have a major impact on a blog and those that do can usually be dealt with.</p>
<p>That being said, if you are having a serious problem with RSS scraping, please f<a href="http://www.plagiarismtoday.com/contact-pt/">eel free to drop me a line or</a>, if you think you may need outside help, feel free to <a href="http://copybyte.com">see if I can help via my consulting services</a>. </p>
<p>All in all, RSS scraping is a reality most bloggers and webmasters will have to deal with, but it&#8217;s not one that should sink your site if you&#8217;re savvy about how to handle it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2011/05/09/faqs-the-basics-of-rss-scraping/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>The Changing Face of RSS</title>
		<link>http://www.plagiarismtoday.com/2010/09/13/the-changing-face-of-rss/</link>
		<comments>http://www.plagiarismtoday.com/2010/09/13/the-changing-face-of-rss/#comments</comments>
		<pubDate>Mon, 13 Sep 2010 16:24:21 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[bloglines]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[RSS scraping]]></category>
		<category><![CDATA[Spam-Blogging]]></category>
		<category><![CDATA[Splogs]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=7774</guid>
		<description><![CDATA[As the closure of Bloglines illustrates, RSS may be shifting away from from being a destination and transitioning into a very different role.]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2010/09/bloglines-logo.jpg" alt="" title="bloglines-logo" width="277" height="62" class="alignleft size-full wp-image-7777" /></p>
<p>This weekend saw dueling announcements that painted a sharp contrast about the future of RSS. First, the big news from a slow weekend. was that <a href="http://mashable.com/2010/09/11/bloglines-discontinued/">Ask.com is shuttering it&#8217;s Bloglines Reader</a>. Once one of the most popular RSS readers, Bloglines has maintained a strong following but has largely taken a backseat to Google Reader as a Web-based RSS reader.</p>
<p>Still, its closure comes as surprise and is <a href="http://www.blogher.com/rss-dead-bloglines-close-october-1?wrap=blogher-topics/blogging-social-media-0&#038;crumb=10">being viewed as a sign that RSS is about to meet its maker</a>.</p>
<p>However, a separate announcement came from Automattic, <a href="http://mashable.com/2010/09/10/wordpress-subscriptions/">which has added a new RSS-based subscription feature to its WordPress.com offering</a>. The idea is to make RSS usable and approachable to people who aren&#8217;t as tech savvy as your average RSS user and compile RSS feeds into a Facebook-like news stream.</p>
<p>What does this mean for RSS as one Internet company enters the field and another makes an exit? There&#8217;s no clear answers but one thing is certain, content creators need to pay close attention.<span id="more-7774"></span></p>
<h4>A Changing Relationship</h4>
<p>Bloggers and other content creators have always had a love/hate relationship with RSS. On one hand, it has been a powerful and compelling way to engage and stay in contact with readers. Everything from RSS readers to RSS-to-email services have kept readers connected and reading more content. </p>
<p>On the other hand, RSS has been accused by some of robbing sites of page views, <a href="http://www.plagiarismtoday.com/2006/09/26/why-my-feeds-are-long/">a notion FeedBurner disputes</a>, and has <a href="http://www.plagiarismtoday.com/2006/08/29/why-rss-scraping-isnt-ok/">opened the door to RSS scraping by spam bloggers and other garbage sites</a>.</p>
<p>However, RSS adoption never really took off. In 2008, <a href="http://www.micropersuasion.com/2008/10/rss-adoption-at.html">RSS adoption was at 11%</a> and it largely peaked there. The years since have seen widespread growth in social networking, including Facebook and Twitter, but limited growth in RSS use. </p>
<p>This is a story I&#8217;ve seen first hand here at Plagiarism Today. My RSS subscribers, according to FeedBurner, have been flat (though FeedBurner stats have not been very reliable) even though I&#8217;ve seen growth in pageviews, <a href="http://www.facebook.com/plagiarismtodayfans">Facebook</a> and <a href="http://twitter.com/plagiarismtoday">Twitter</a> subscriptions. Though RSS is still a major way to access this site&#8217;s content, now more people view the site on Twitter and Facebook than RSS.</p>
<p>Simply put, mainstream users never really understood or made use of RSS, finding it too complicated and hating having a &#8220;second inbox&#8221;. RSS may never be widely used for its intended function, but that also doesn&#8217;t mean that it&#8217;s dead, <a href="http://www.techcrunchit.com/2009/05/05/rest-in-peace-rss/">contrary to what TechCrunch may say</a>, just an indication that the function is changing.</p>
<h4>Changing Faces</h4>
<p>RSS may not be a destination much longer but it is still an important tool and an important means. Feeding Facebook, Twitter and other social news sites requires a standard to distribute the content, something RSS provides.</p>
<p>But as RSS&#8217; relationship with readers change, so it will with content creators. Slowly, it&#8217;s possible content creators will abandon public RSS feeds and favor private ones they feed to their various channels. The idea of a partial feed will become less egregious as RSS&#8217;s function is pushed to fit into 140 characters and other short status updates.</p>
<p>Where once RSS was viewed as the ideal way to follow a site, it is slowly becoming the way to feed the new ideal ways to follow a site. In short, it&#8217;s the tool that enables realtime, not the destination for it.</p>
<h4>Looking Forward</h4>
<p>Many creators, myself included, have been downplaying RSS on their sites for some time but the question is what will you be doing? Will you be looking at making your RSS private, switching to a truncated feed or staying the same? Will you be shifting focus to other methods for readers to connect or continue with RSS?</p>
<p>I don&#8217;t have any firm answers for myself, only what I&#8217;ve done in the past (almost without realizing what I was doing), but I can&#8217;t see myself ditching RSS completely or truncating my feed. There are still several thousand who read this site via RSS and I have no desire to cut them off or impede their reading.</p>
<p>But it is clear RSS won&#8217;t be the way of the future for me, or likely many other bloggers, at least not as a destination.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2010/09/13/the-changing-face-of-rss/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Akismet and Spam Blogs</title>
		<link>http://www.plagiarismtoday.com/2007/11/29/akismet-and-spam-blogs/</link>
		<comments>http://www.plagiarismtoday.com/2007/11/29/akismet-and-spam-blogs/#comments</comments>
		<pubDate>Thu, 29 Nov 2007 18:56:05 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Personal Experiences]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[Akismet]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[defensio]]></category>
		<category><![CDATA[pingbacks]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Spam-Blogging]]></category>
		<category><![CDATA[Splogging]]></category>
		<category><![CDATA[trackback spam]]></category>
		<category><![CDATA[trackbacks]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/11/29/akismet-and-spam-blogs/</guid>
		<description><![CDATA[]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.akismet.com"><img src="http://i60.photobucket.com/albums/h30/plagiarismtoday/PT%20Images/akismetlogo.png" border="0" alt="Akismet Logo" align="left" hspace="10"</a/>Over the past few weeks, especially since the </a><a href="http://www.plagiarismtoday.com/2007/11/20/massive-trackbackcomment-spam-attack/">recent trackback spam attack</a>, I&#8217;ve had some time to ponder anti-comment spam technology and how it relates to fighting content theft. </p>
<p>However, a <a href="http://blog.akismet.com/2007/11/27/it-really-is-spam/">recent post on the Akismet Blog</a> particularly caught my eye. The post, entitled &#8220;Is It Spam?&#8221; details the shift in spammer tactics and how some users have fallen for it, marking spam comments &#8220;not spam&#8221; despite the better judgment of Akismet.</p>
<p>However, it was one of the last paragraphs that really got me thinking: </p>
<blockquote><p>In the case of the pingbacks (the ones that start [...]) the spammers are actually stealing your work&#8230;</p></blockquote>
<p>Though that fact is clear to anyone who reads this site regularly, it occurred to me that our aggressive blocking of trackback and pingback spam may be preventing us from identifying spam bloggers and preventing us from shutting them down.</p>
<p>Curious, I decided to check my own spam files and was more than a little surprised at what I found.</p>
<p><span id="more-738"></span></p>
<p><strong>The Problem</strong></p>
<p>As wonderful as <a href="http://www.plagiarismtoday.com/2007/11/28/making-the-switch-going-from-partial-to-full-feeds/">Digital Fingerprints and other anti-splog features</a> are, trackbacks and pingbacks remain one of the most common ways for a spam blog to be identified. </p>
<p>The reason is simple, spammers, usually in a bid to either appear legitimate or obtain some additional incoming links, often provide links to the original post. Those links, due to the nature of the software they use, cause trackbacks to be sent to the original site and those trackbacks can lead Webmasters right to the people misusing their content.</p>
<p>Another alternative is that, in many cases, bloggers will link to other articles on their site, as I&#8217;ve done with this one, and those links often get picked up when the article is scraped. Those links, in turn, produce trackbacks that can be easily followed up on. </p>
<p>The problem is that anti-spam solutions such as Akismet and <a href="http://www.defensio.com">Defensio</a> aggressively filter out and stop trackback spam. Most Webmasters, just happy they aren&#8217;t being inundated with junk comments, never check their spam folders. This means that those trackbacks, which can be very useful in detecting scraping and plagiarism, are often filtered out before anyone, including the blogger, sees them.</p>
<p><strong>Down the Rabbit Hole</strong></p>
<p>Curious to see if this was a real problem or simply an academic issue, I delved into my Akismet spam folder to see what was there.</p>
<p>The sample was relatively small, approximately 1600 comment spams. This is mostly due to me switching back and forth between Akismet and Defensio over the past week and that my blog automatically discards spam comments on posts older than one month.</p>
<p>However, when I did a filter search for &#8220;[...]&#8221; I found thirteen trackbacks, all of them containing various amounts of scraped content (Note: I found another thirteen in my Defensio folder, which had approximately 900 spam messages). </p>
<p><img src="http://i60.photobucket.com/albums/h30/plagiarismtoday/PT%20Images/akismetspam2.png" border="0" alt="Akismet Spam 2"/></p>
<p>In every case the site was hosted on a &#8220;.info&#8221; domain, had scraped a excerpt of one of my stories and introduced it with a generic statement such as &#8220;While looking through the blogosphere we stumbled on an interesting post today. Here’s a quick excerpt.&#8221; None carried my digital fingerprint.</p>
<p>In most cases, the excerpts were fairly short though, in a few cases, they spilled over into a few paragraphs. In all cases, they were surrounded with advertising from a variety of sources. </p>
<p><img src="http://i60.photobucket.com/albums/h30/plagiarismtoday/PT%20Images/2007-11-29_1141.png" border="0" alt="Spam Blog Sample"/></p>
<p>Of the thirteen sites in my Akismet folder, about half were down and the remainder appeared to belong to the same spam blog network. However, these are thirteen scrapers I would never have known about if I hadn&#8217;t manually reached in and filtered through my trackback spam.</p>
<p>It&#8217;s a scary thought, but it makes one wonder how many I have missed up until now and, even worse, how many do I never get the chance to see?</p>
<p><strong>The Good News</strong></p>
<p>On the upside, these particular sites, though definitely scrapers, are not what I would call &#8220;high priority&#8221;. Since they only reposted an excerpt of the feed and did offer a link back, the damage they can cause is somewhat minimized. However, it is still annoying that this has been going on right underneath my nose and I never would have found out about it had it not been for manual intervention.</p>
<p>The fear isn&#8217;t so much that spam blogs like these will stay hidden, but that Akismet might bury someone scraping the full post or even the full feed. That seems to be somewhat rare, possibly an indication that Akismet considers the amount of reuse when analyzing whether a comment is spam or not, but without more Webmasters looking through their Akismet spam, there is no real way to know.</p>
<p>However, none of this is to say that Akismet, or any other spam plugin, is helping spammers out by assisting them in escaping detection. I would imagine the net effect of the plugin is still very bad for the spammers as they depend on these trackbacks and comment spams to build their networks.</p>
<p>Furthermore, any spammer caught up in your Akismet spam folder is likely an ineffective one to start with. If Akismet has already pegged the site as garbage, you can say with little doubt that Google, Technorati and others probably have as well.</p>
<p>Still, it may be worth a few moments to check your Akismet spam folder and see what you find.</p>
<p><strong>Checking For Scraping</strong></p>
<p>The process for checking your Akismet folder for potential scrapes is actually fairly simple. Visit your Akismet folder in your WordPress panel, it can be found under the &#8220;comments&#8221; section, and then, using the search box up top, type in &#8220;[...]&#8220;. It should take you to a list with the suspicious trackback posts.</p>
<p>This system is far from perfect as not all trackback spam seems to include that intro, despite it being something of a standard. Unfortunately, neither Akismet nor Defensio offer a means to simply filter spam based upon spam type, this makes the above search, though somewhat simplistic, the best alternative at the moment.</p>
<p>If you are using Defensio and you perform the check, be certain to tick the box that includes &#8220;obvious&#8221; spam as the majority of trackbacks in my folder, ten total, were labeled as such. </p>
<p><strong>Conclusions</strong></p>
<p>This situation is very frustrating. We, as bloggers, are forced to make a choice between being inundated with comment spam and being able to effectively follow up on scrapers and spam bloggers. Of the two, comment spam certainly seems to be the most annoying, especially considering the ratio of scraping to comment spam, and the most time-consuming to fight.</p>
<p>In short, the time it would take to deal with comment spam without Akismet far outstrips the time it takes to reach into the spam folder once every few weeks and search for suspicious pings.</p>
<p>Still, it is frustrating that Akismet does not offer an easier way to track these cases, either by enabling filtering on spam type or offering a special folder for suspicious blogs. </p>
<p>Though I definitely would rather have the protection than not, it would be nice if these plugins could help us stop other kinds of spam than just comments to our blogs.</p>
<p>For me, I&#8217;m going to debate what action to take against these scrapers. Though I certainly can and probably will notify their advertisers, I am unsure about taking additional action due to the nature of the reuse.</p>
<p>In the meantime, I&#8217;m encouraging everyone to delve into their spam folders and see what they find. Be sure to let me know if you find anything exceptionally interesting in there. I&#8217;ll be eager to hear about what turns up.</p>
<p>Leave a comment below if you want to share. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/11/29/akismet-and-spam-blogs/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>RSS Brief: Another Scraping/Spam Threat</title>
		<link>http://www.plagiarismtoday.com/2007/09/14/rss-brief-another-scrapingspam-threat/</link>
		<comments>http://www.plagiarismtoday.com/2007/09/14/rss-brief-another-scrapingspam-threat/#comments</comments>
		<pubDate>Fri, 14 Sep 2007 16:08:42 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Legal Issues]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[icerocket]]></category>
		<category><![CDATA[Pay-Per-Post]]></category>
		<category><![CDATA[PPP]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[RSS-Brief]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Spam-Blogging]]></category>
		<category><![CDATA[Splogging]]></category>
		<category><![CDATA[Splogs]]></category>
		<category><![CDATA[technorati]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/09/14/rss-brief-another-scrapingspam-threat/</guid>
		<description><![CDATA[Yesterday, the makers of the controversial Pay Per Post service launched a new tool designed to make blog reading faster, RSS Brief. The idea is that the service takes long posts, like what you might expect here on Plagiarism Today, and condenses them down into a few short sentences. Though the service sounds convenient and...]]></description>
			<content:encoded><![CDATA[<p>Yesterday, the makers of the controversial <a href="http://payperpost.com/">Pay Per Post service</a> <a href="http://www.blogherald.com/2007/09/13/payperpost-launches-rss-brief-alpha/">launched a new tool</a> designed to make blog reading faster, <a href="http://www.rssbrief.com/">RSS Brief</a>. </p>
<p>The idea is that the service takes long posts, like what you might expect here on Plagiarism Today, and condenses them down into a few short sentences. </p>
<p>Though the service sounds convenient and useful, it also raises significant copyright and spam issues that the company has not addressed as of yet. </p>
<p>Though the service is only in alpha, the time to consider these issues is now, before the service is completed and becomes an active part in many people&#8217;s blogging lives and it is too late to change course.</p>
<p><span id="more-653"></span><strong>How it Works&#8230; In Brief</strong></p>
<p><a href="http://www.rssbrief.com/about">The idea behind RSS Brief</a> is pretty simple. You punch in the URL of your favorite blog, RSS Brief will read the entries in the feed and use what its creators refer to as &#8220;natural language technology&#8221; to parse the text down to a few sentences.</p>
<p>The idea is that, unlike traditional truncating that simply cuts off everything but the first few sentences, you will receive an effective summary of the post. This should, in theory, allow you to get the basic idea of the post and move on.</p>
<p>The technology, however, is questionable at this point. Plagiarism Today&#8217;s RSS Brief page shows some of the weaknesses. Though PT is the type of site targeted by this service, it utterly fails to give a meaningful summary of any of the stories in the RSS feed. Instead, on most stories, it seems to simply do the kind of truncating it claimed to avoid. </p>
<p>However, finding glitches in alpha-stage technology is not as disturbing as the copyright and spam issues that this service raises. It seems that, in the rush to create this service, the programmers completely avoided any and all issues about the copyright issues it might raise and how their technology might be abused.</p>
<p><strong>Copyright Issues</strong></p>
<p>What RSS Brief does, fundamentally, is take a lengthy post and make a derivative work of it. Under copyright law, the creation of derivative works is the sole right of the copyright holder. </p>
<p>Though there is a decent fair use argument for RSS Brief in that the use is largely transformative and only takes a small portion of the original, there is a strong argument against them as well. Their use of the work, by their own design, takes the heart of the original material, it does so for a commercial purpose, and RSS Brief is designed to replace the original work, thus damaging the market for the author&#8217;s work, especially if the author has ads in the feed.</p>
<p>Worse still, the service continues to &#8220;summarize&#8221; even shorter works, some as short as sixty words. This severely raises the amount of the original work used and lowers the likelihood that the use will be found fair.</p>
<p>However, most damming of all is the 1841 case <a href="http://www.faculty.piercelaw.edu/redfield/library/Pdf/case-folsom.marsh.pdf">Folsom v. Marsh</a> (PDF) that found the following when dealing with the issue of &#8220;transformative&#8221; use:</p>
<blockquote><p>(if a user) cites the most important parts of the work, with a view, not to criticise, but to supersede the use of the original work, and substitute the review for it, such a use will be deemed in law a piracy.</p></blockquote>
<p>Though it is impossible to predict whether or not a use will be deemed &#8220;fair&#8221; until it goes before a judge and/or jury, there seems to be a lot of reason to doubt whether or not RSS Brief will pass muster in that situation. </p>
<p>Most damming of all being its stated attempt to replace the original work and the lack of any opt out mechanism, such as the one Google uses to ensure its cache is fair use. </p>
<p><strong>The Spam Issue</strong></p>
<p>Though many readers would love an &#8220;important parts only&#8221; feed, so would spammers. Fortunately for them, RSS Brief offers up just such a feed on their service, one that essentially scrapes, processes and rebroadcasts the original feed in their &#8220;brief&#8221; format.</p>
<p>Spammers will, most likely, grow to love these feeds. Not only are they keyword rich and to the point, but can easily be combined with other feeds from the same service to create rapid-fire blogs with short posts, something search engines seem to love.</p>
<p>Already spammers take advantage of Technorati, Icerocket and Google Blog Search feeds for much the same purpose. They enjoy the keyword density those feeds provide and the fact that they raise fewer copyright issues than scraping full feeds.</p>
<p>Though an RSS Brief feed might be less keyword rich, it would also be much more modified from the original, making it harder for search engines and Webmasters to spot. Depending on the nature of the spammer, they might find this RSS Brief feeds preferable to the existing alternatives. </p>
<p>Also, much like the search feeds, RSS Brief strips out any and all digital fingerprints as well as copyright information contained in the feed. It&#8217;s rush to get to &#8220;just the facts&#8221; causes it leave out some very critical elements to bloggers. This also makes the use of RSS Brief feeds impossible to track, unless they report usage to FeedBurner, and leaves Webmasters in the dark about how many are subscribing to the feed and how they are using it. </p>
<p>Finally, since Pay Per Post is not a search company, it&#8217;s not in a position to punish people who do scrape their feeds. Technorati and Google can blacklist sites that scrape their search results, Pay Per Post has no such card to play. </p>
<p>If spammers aren&#8217;t already looking at RSS Brief as a new tool, they likely will be soon. They seem to seize on new technology as fast as they can and I doubt this service will be any exception.</p>
<p><strong>Conclusions</strong></p>
<p>As interesting as the idea of RSS Brief is, it is poorly executed. As of this writing, there is no means for Webmasters to opt out, no clear safeguards against spam blogging and no consideration to Webmasters. There is </p>
<p>Though Pay Per Post has always been a controversial company, they have always been a company that seemed to value bloggers and the role they play, albeit in a somewhat backhanded way. That is why it seems so odd to me that they created this service with so little consideration to them.</p>
<p>One day they are paying bloggers for reviews, the next they are taking their feeds, without permission or an opt out mechanism, and creating derivative works to be redistributed over RSS.</p>
<p>Hopefully they can get these issues as well as their technical glitches straightened out. The idea is interesting but doing so in the way they are doing it is very dangerous to both them, bloggers and the Internet at large.</p>
<p>It borders on irresponsible and if Pay Per Post is going to change their image, they need to put the good of the Web and of bloggers first. They made that mistake when they first launched their primary service and it seems that history is, in a strange way, repeating itself.</p>
<p>Hopefully that won&#8217;t be the case.</p>
<p><strong>Note:</strong> If there is an interest in an excerpt-only &#8220;just the facts&#8221; feed for this site, I will create one. WordPress has the tools to do that and I&#8217;ll simply create the second feed this weekend. If interested, please post a comment below or send me an email.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/09/14/rss-brief-another-scrapingspam-threat/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>A Scrape of a Scrape</title>
		<link>http://www.plagiarismtoday.com/2007/08/07/a-scrape-of-a-scrape/</link>
		<comments>http://www.plagiarismtoday.com/2007/08/07/a-scrape-of-a-scrape/#comments</comments>
		<pubDate>Tue, 07 Aug 2007 15:07:58 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Personal Experiences]]></category>
		<category><![CDATA[Prevention]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[ie7]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Spam-Blogging]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[Splogs]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/08/07/a-scrape-of-a-scrape/</guid>
		<description><![CDATA[I often get asked by reporters and bloggers alike exactly how bad scraping is on the Web. I discuss my past experiments on the topic and how, depending on your keywords, suspicious traffic starts showing up with the first post. However, as I was searching for information on IE7 security flaws for another site I&#8217;m...]]></description>
			<content:encoded><![CDATA[<p>I often get asked by reporters and bloggers alike exactly how bad scraping is on the Web. I discuss my <a href="http://www.plagiarismtoday.com/2007/05/22/scraping-starts-from-the-very-first-post/">past experiments</a> on the topic and how, depending on your keywords, suspicious traffic starts showing up with the first post. </p>
<p>However, as I was searching for information on IE7 security flaws for another site I&#8217;m working on, I ran across something that was truly mind-blowing. </p>
<p>On <a href="http://www.google.com/blogsearch">Google Blogsearch</a>, this result (nofollowed) was one of the first to pop up. One look at it and you can clearly tell that it is a scrape of another post. However, kindly enough, the scraper left information about their source. I followed through on that and was <a href="http://www.feedsfarm.com/article/48d710004d785851cf997003863381fe55195210.html">taken to this entry</a> (nofollowed), yet another scraped page. </p>
<p>It was only after following the results link there that I was taken to the <a href="http://blogs.msdn.com/ie/archive/2006/11/14/how-i-ll-judge-ie7-security.aspx">original post</a> on the IEBlog. </p>
<p><span id="more-570"></span>It is stunning, though not surprising, to think that scraping is so common that scrapers are picking up each other&#8217;s blogs. What makes this situation somewhat unique is that we were able to follow the trail since both scraper sites link to their original source. However, it shows the potential for a post to get scraped again and again as its copies get picked up by other spambots.</p>
<p>In short, every feed your work appears on can, and most likely will, be scraped, even if the appearance is unwanted. It may even be possible to piece together much longer chains of scraping, where you end up with a fifth or sixth generation scrape.</p>
<p>In this case, the first feed was most likely a scrape of a search engine feed such as Google Blogsearch or Technorati. The second one is a news site that, it appears, is reposting and redistributing the entire content of feeds in certain places, though stripping formatting in the process. </p>
<p>This gives us yet another reason to <a href="http://www.plagiarismtoday.com/2006/10/09/five-essential-wordpress-content-protection-plugins/">get a handle on our RSS feeds</a> and make sure that they don&#8217;t fall in the wrong hands to begin with. Though these sites attributed their use, most are not so generous and even attributed scraping can cause problems. </p>
<p>All in all, it is best to be mindful of this problem and respond accordingly. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/08/07/a-scrape-of-a-scrape/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>WordPress Plugin: Copyright Feed</title>
		<link>http://www.plagiarismtoday.com/2007/05/04/wordpress-plugin-copyright-feed/</link>
		<comments>http://www.plagiarismtoday.com/2007/05/04/wordpress-plugin-copyright-feed/#comments</comments>
		<pubDate>Fri, 04 May 2007 21:07:59 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Prevention]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[Extension]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[Plugin]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Spam-Blogging]]></category>
		<category><![CDATA[Spamming]]></category>
		<category><![CDATA[Wordpress]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/05/04/wordpress-plugin-copyright-feed/</guid>
		<description><![CDATA[A new WordPress plugin by Frank Bueltge (in German) entitled Copyfeed is attempting to revolutionize the way WordPress users protect their feed&#8217;s content. The goal is to not simply help bloggers discover if and where their content is being scraped, but also give them information to help them track down the scraper and, when used...]]></description>
			<content:encoded><![CDATA[<p>A new WordPress plugin by <a href="http://bueltge.de">Frank Bueltge</a> (in German) entitled <a href="http://wordpress.org/extend/plugins/copyfeed/">Copyfeed</a> is attempting to revolutionize the way WordPress users protect their feed&#8217;s content.</p>
<p>The goal is to not simply help bloggers discover if and where their content is being scraped, but also give them information to help them track down the scraper and, when used in conjunction with plugins such as <a href="http://redalt.com/Resources/Plugins/AntiLeech">Antileech</a>, stop the infringement outright.</p>
<p>It is a potentially exciting plugin that could replace a other extensions that are currently available, effectively combining the functionality of at least two <a href="http://www.plagiarismtoday.com/2006/10/09/five-essential-wordpress-content-protection-plugins/">indispensable plugins</a> into one, and holds a lot of potential to help WordPress bloggers stop infringement and prevent spam bloggers from taking their content.</p>
<p><span id="more-484"></span><strong>How it Works</strong></p>
<p>Copyfeed is something of a swiss army knife when it comes to extending your feed. It has several different functions, many of which are already available.</p>
<p>First, the plugin can add a digital fingerprint to the plugin and provides easy links to search for copies of your work. In that regard, it works a great deal like <a href="http://www.maxpower.ca/wordpress-plugin-digital-fingerprint-detecting-content-theft/2006/09/25/">MaxPower&#8217;s Digital Fingerprint Plugin</a> and can serve much of the same purpose. </p>
<p>Second, the plugin adds a customized feed copyright notice, similar to <a href="http://blog.taragana.com/index.php/archive/wordpress-plugin-to-automatically-add-copyright-message-to-your-rss-atom-feeds/">Angsuman’s Feed Copyrighter Plugin</a>. This plugin, however, takes it a step farther making it easier to add your custom copyright information and then and even making it possible to add HTML to the footer.</p>
<p>However, where the plugin goes above and beyond the currently available tools is by adding the ability to include the IP address of the feed reader in the feed itself. This means that, if the feed is scraped, the scraper will also be publishing the IP address of the computer they used to scrape the feed, making it very easy to either block the address, or input it into Antileech for redirection.</p>
<p>Finally, Copyfeed also offers the ability to whitelist certain domains from seeing the above information. This means that it is would be possible to protect your regular feed, but allow <a href="http://www.blogburst.com/">BlogBurst</a> or another desired syndication service to bypass that content without having to point them to a separate feed.</p>
<p>It&#8217;s a very powerful set of features that, when combined with Antileech, can provide almost total protection of the feed from scrapers and spammers. All of this also says nothing about the other attributes that Copyfeed can add, including comments and related posts, that have nothing to do with content theft preventions. </p>
<p><strong>Some Problems</strong></p>
<p>However, this isn&#8217;t to say that the plugin is perfect, there are several issues with it that may stop it, in some cases at least, from being ready for full use. </p>
<p>First off, both the plugin and the site are currently only available in German. Though you can use tools to <a href="http://translate.google.com/translate?u=http%3A%2F%2Fbueltge.de%2Fwp-feed-plugin%2F204%2F&#038;langpair=de%7Cen&#038;hl=en&#038;ie=UTF8">translate the site</a>, the plugin administration panel rests in the admin area of the WordPress install and can&#8217;t be easily viewed by such tools.</p>
<p>Fortunately, Bueltge has promised to include a translated version of the plugin with his next release, along with &#8220;more features&#8221; but it has not been released as of this writing.</p>
<p>Second, it is uncertain how legitimate readers will feel about having their IP address printed on their feed&#8217;s entries. Some, especially those who are unaware of how the Internet works and what an IP address really is, might cite privacy concerns. Others, especially those who use Web-based readers, might be confused as the IP address would not be theirs, but that of the site they read the feeds on.</p>
<p>It would be up to the user of the feed to explain these issues if they chose to use that feature. </p>
<p>Finally, aspects of the plugin, obviously, will not work well with <a href="http://www.feedburner.com">FeedBurner</a>. Though the copyright notice and the digital fingerprint will both work correctly, the IP feature will not. The plugin would just report the IP address of FeedBurner and not the person reusing the feed, making it useless in that case.</p>
<p>However, there are many people that will still likely find the plugin very useful and many more will do so once the plugin has been translated into other languages.</p>
<p><strong>Conclusions</strong></p>
<p>All in all, this plugin is an impressive effort to stop feed scraping and will almost certainly become an essential tool once translations of it are offered. In fact, it is possible with just a few extra features that it could become something of a one-stop shop for battling feed scrapers in WordPress, becoming almost the only plugin needed.</p>
<p>However, until the translated versions are released, all most of us can do is sit and wait in anticipation for the tool to become available.</p>
<p>It seems almost certain that this plugin, for many WordPress users, will be a nice step forward in protecting their feed. Hopefully, once I am able to use it, it will live up to those expectations.</p>
<p>In the meantime, I&#8217;d be very interested in hearing comments from those who are using Copyfeed on their sites. I&#8217;d love to hear how it is working out for you.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/05/04/wordpress-plugin-copyright-feed/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Update: Six Apart Working on Copyright Issues</title>
		<link>http://www.plagiarismtoday.com/2007/04/05/update-six-apart-working-on-copyright-issues/</link>
		<comments>http://www.plagiarismtoday.com/2007/04/05/update-six-apart-working-on-copyright-issues/#comments</comments>
		<pubDate>Thu, 05 Apr 2007 20:27:00 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Legal Issues]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Six-Apart]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Spam-Blogging]]></category>
		<category><![CDATA[Splogging]]></category>
		<category><![CDATA[Splogs]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/04/05/update-six-apart-working-on-copyright-issues/</guid>
		<description><![CDATA[To update my previous story on Six Apart. I received a call this afternoon from Jane Anderson of Six Apart. They are working on addressing the copyright issues and are discussing what action to take at this time. They&#8217;ve promised to be in touch with me over the coming days and weeks to keep me...]]></description>
			<content:encoded><![CDATA[<p>To update my <a href="http://www.plagiarismtoday.com/2007/04/03/six-apartrojo-now-spam-bloggers/">previous story</a> on <a href="http://www.sixapart.com">Six Apart</a>. I received a call this afternoon from Jane Anderson of Six Apart. They are working on addressing the copyright issues and are discussing what action to take at this time. They&#8217;ve promised to be in touch with me over the coming days and weeks to keep me up to date on how things develop.</p>
<p>Needless to say, I will post updates as they come in.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/04/05/update-six-apart-working-on-copyright-issues/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Six Apart/Rojo: Now Spam Bloggers?</title>
		<link>http://www.plagiarismtoday.com/2007/04/03/six-apartrojo-now-spam-bloggers/</link>
		<comments>http://www.plagiarismtoday.com/2007/04/03/six-apartrojo-now-spam-bloggers/#comments</comments>
		<pubDate>Tue, 03 Apr 2007 17:09:51 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Legal Issues]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[Digg]]></category>
		<category><![CDATA[Livejournal]]></category>
		<category><![CDATA[Nooz]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[Reddit]]></category>
		<category><![CDATA[Rojo]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Six-Apart]]></category>
		<category><![CDATA[Spam-Blogging]]></category>
		<category><![CDATA[Splogging]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/04/03/six-apartrojo-now-spam-bloggers/</guid>
		<description><![CDATA[- Article Updated &#8211; See Below - Six Apart was one of the first rock stars of the blogging world. Propelled to fame on the back of its Movable Type blogging platform, it quickly became one of the most recognized names in the blogging world. Though Movable Type has largely been replaced by newer blogging...]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.plagiarismtoday.com/2007/04/03/six-apartrojo-now-spam-bloggers/sixapart-logo/" rel="attachment wp-att-461" title="SixApart Logo"><img src="http://www.plagiarismtoday.com/wp-content/uploads/2007/04/sixapart_small.png" title="SixApart Logo" alt="SixApart Logo" align="left" hspace="5" vspace="5" /></a><strong>- Article Updated &#8211; See Below -<br />
</strong></p>
<p>Six Apart was one of the first rock stars of the blogging world. Propelled to fame on the back of its <a href="http://www.movabletype.com/">Movable Type</a> blogging platform, it quickly became one of the most recognized names in the blogging world.</p>
<p>Though Movable Type has largely been replaced by newer blogging applications, including <a href="http://www.wordpress.org">WordPress</a>, Six Apart has remained very active in the blogging world, not only offering <a href="http://www.sixapart.com/typepad/">Typepad</a>, a popular blogging service, but also <a href="http://gigaom.com/2005/01/04/six-apart-to-buy-live-journal/">purchasing several other blogging comapnies</a>, including <a href="http://www.sixapart.com/livejournal/">LiveJournal</a> and <a href="http://www.rojo.com/" rel="nofollow">Rojo</a>.</p>
<p>However, some of these subsidaries have begun engaging in practices that many bloggers consider unethical. One of the sites under Six Apart&#8217;s control even engages in <a href="http://www.plagiarismtoday.com/2006/09/25/bitacle-debacle/">behavior akin to Bitacle</a>.</p>
<p>This has left some to wonder why Six Apart, a company largely respected in the Blogging world, has begun to play fast and loose with RSS feeds and copyrighted content.  Worse still, why have they begun using tactics largely reserved for spam bloggers?</p>
<p>Sadly, the answers are not very clear.</p>
<p><span id="more-462"></span><strong> LiveJournal Syndication</strong></p>
<p>The least worrisome of Six Apart&#8217;s scraping activities revolves around their LiveJournal service.  There, paid members can take advantage of their &#8220;Syndication&#8221; feature. It allows users to select an RSS feed and LiveJournal then creates a specialized page for the feed. The feed can then be added as a &#8220;friend&#8221;, the same as if it were an actual LiveJournal member, and can appear in friend lists.</p>
<p>The Syndication feature is worrisome because it creates an &#8220;account&#8221; with duplicate content from the feed. The site displays the entire contents of the feed (<a href="http://syndicated.livejournal.com/officialgaiman/" rel="nofollow">see sample</a> using <a href="http://www.neilgaiman.com/journal/index.html">Neil Gaiman&#8217;s Journal</a>) and allows users to post comments without returning to the original site.</p>
<p>However, with the LiveJournal Syndication service, attribution is very clear and all synidcated accounts are on a separate subdomain (syndicated.livejournal.com). Also, the LiveJournal team has, historically, been very responsive about removing feeds that their owners don&#8217;t want to be scraped. Furthermore, results from the Syndication service <a href="http://www.google.com/search?hl=en&amp;safe=off&amp;client=firefox-a&amp;rls=org.mozilla%3Aen-US%3Aofficial&amp;hs=uBk&amp;q=site%3Asyndicated.livejounal.com&amp;btnG=Search">do not appear in Google</a> eliminating most of the major concers one has with scraping.</p>
<p>Still, many bloggers are likely to be concerned that a duplicate of their blog exists, that users can and do comment to it and that LiveJournal users no longer need to subscribde to the feed directly or visit their site.</p>
<p><strong>Rojo Front Page</strong></p>
<p><a href="http://www.plagiarismtoday.com/wp-content/uploads/2007/04/rojo.png" title="Rojo Screenshot"><img src="http://www.plagiarismtoday.com/wp-content/uploads/2007/04/rojo.thumbnail.png" title="Rojo Screenshot" alt="Rojo Screenshot" align="left" hspace="5" vspace="5" /></a>When Six Apart <a href="http://www.sixapart.com/about/press/2006/09/six_apart_acqui_1.html">aquired RSS reader Rojo in September 2006</a>, it also aquired some of Rojo&#8217;s bad habits.</p>
<p>Rojo&#8217;s home page functions almost exactly like a rapidly-updating spam blog. It features the full content of the most popular feed items of the day, all next to Google Adsense ads (see screenshot above). The site is then further sub-divided into new categories, including &#8220;politics&#8221; &#8220;Web 2.0&#8243;, etc., it is also possible to view the original feed on Rojo without visiting the original site (<a href="http://www.rojo.com/feed/c0ft_NCxBCFNH03_">see PTs feed on Rojo</a>) and those feeds are also surrounded by ads.</p>
<p>Attribution on Rojo is prominent and the headlines do link back to the original story. However a &#8220;Rojolink&#8221; feature encourages others to use the Rojo permalink for the article rather than link to the original site.</p>
<p>At the very least, <a href="http://calacanis.com/">Jason Calacanis</a> will likely be upset by this. He has repeatedly stated that he will not allow his full feeds to be placed next to ads, <a href="http://www.rojo.com/feed/V1WuPMhWMOkiZv9A">something that Rojo does</a>.</p>
<p>Though most people expect RSS readers to make money off of other people&#8217;s content, generally it is also expected that they will add value to the feed by making it easier for people to subscribe. Instead, Rojo has just created a valueless duplicate of the feeds, and surrounded the content with ads.</p>
<p><strong>All The Nooz</strong></p>
<p>Worst of all Six Apart&#8217;s properties though is the Rojo-owned site <a href="http://www.nooz.com" rel="nofollow">Nooz.com</a>. Nooz is designed to function like Digg for Myspace.  Nooz users pick articles from the Web, vote on them and add them to their special Nooz widgets that they they place on their Myspace profiles.</p>
<p>The problem with Nooz, however, is not the widgets but the way the content is obtained. Rather than letting users select their own articles from the Web, like Digg or Reddit, Nooz forces users to select from versions of the blog that it has scraped and reposted on its own site (<a href="http://www.nooz.com/feed/c0ft_NCxBCFNH03_" rel="nofollow">see Plagiarism Today on Nooz</a>). Once again, as with Rojo itself, Nooz offers &#8220;Noozlinks&#8221; to encourage people to link to Nooz&#8217;s scraped copy, rather than the original.</p>
<p>Though no ads appear on Nooz at this time, Nooz.com is accessible by the search engines, <a href="http://www.google.com/search?q=site%3Anooz.com&amp;ie=utf-8&amp;oe=utf-8&amp;aq=t&amp;rls=org.mozilla:en-US:official&amp;client=firefox-a">Google estimates</a> that about 150,000 pages have been indexed already. Even worse, all of the contact addresses for Nooz, <a href="http://www.nooz.com/about/policies/copyright-policy/" rel="nofollow">including the copyright agent</a>, all bounced back.</p>
<p>Nooz is not only scraping and reposting feeds without permission, but it is being irresponsible in doing so. There is no means to ask Nooz to stop reusing the content.</p>
<p>If you don&#8217;t like the way Nooz uses your content, quite frankly, you are out of luck at the moment.</p>
<p style="font-weight: bold">A Murmured Outcry</p>
<p>Six Apart is no stranger to blogging, as discussed above, they helped ignite the blogging movement with their software. They are not unfamiliar with the ettiquite of blogging and should realize, at least on some level, that some bloggers will not ba happy to see their feeds scraped and republished on someone else&#8217;s site, all the while surrounded by ads.</p>
<p>The reasons Six Apart allows this to continue are dubious at best. Legal scholars have already agreed that <a href="http://www.plagiarismtoday.com/2007/01/29/twil-discusses-implied-licenses-on-rss-feeds/">there is no implied license with RSS feeds</a>, this use, as long as it is executed without permission, is basically copyright infringement. Unless a CC license or a direct agreements permits the use, what Six Apart is doing in all three cases is, most likely, illegal.</p>
<p>To my knowledge, no one has complained about these three uses for the following reasons. Why is a mystery, but the reasons may include the following:</p>
<ol>
<li>Very few people seem to be affected by the LiveJournal Syndication feature. Since only paid members can take advantage of it, severely limiting the pool, only very large blogs are scraped. Also, LiveJournal has been very cooperative in removing people that don&#8217;t want to participate. Furthermore, since the Syndicated blogs are not picked up by search engines, it&#8217;s unlikely most bloggers know that they exist.</li>
<li>Few bloggers want to upset Rojo since many readers use the feed reader service to subscribe to blogs. Currently, about 5% of all Plagiarsim Today subscribers use Rojo.</li>
<li>Nooz seems to have flown under the radar, targeted mostly at Myspace users, generally a separate group from bloggers, and still a relatively new creation (its current incarnation starting some time this year).</li>
</ol>
<p>No matter the reasons though, these issues are not going away. RSS scraping and reuse issues will likely be around for a very long time, that is, until a licensing scheme emerges that resolves the issue once and for all.</p>
<p><strong>Conclusions </strong></p>
<p>What Six Apart is doing is wrong. Though I have no major issues with their use of my content, save perhaps on Rojo where the use is more commercial (and thus a violation of <a href="http://creativecommons.org/licenses/by-nc-sa/2.5/">my Creative Commons License</a>), Six Apart is taking content from thousands of blogs, without permission, and reposting them on various sites. That is copyright infringement and there is little way around that.</p>
<p>Though some might argue that Six Apart&#8217;s scraping would qualify for protection under the DMCA (section 512(b)) protection for caching services. However, <a href="http://www.plagiarismtoday.com/2007/01/16/debunking-the-dmca-caching-loophole/">as discussed earlier</a>, that is not likely the case.</p>
<p>All of Sixapart&#8217;s sites modify the content and create permanent files, both violations of the caching provision. It also does not follow accepted practices (as there are no accepted practices for scraping and republishing RSS feeds) and it is not automated, seemingly relying at every step on users to submit the original feed.</p>
<p>It is unlikely, at best, that Six Apart would obtain the same kind of protection that was <a href="http://www.searchenginejournal.com/google-cache-is-ruled-legal-fair-use/2837/">afforded the Google Cache</a>, especially considering both the commercial nature of the use and the apparent intent of setting up the copy as a substitute for the original. The latter is shown by the new permalinks and location of cached material (placed before the link to the original).</p>
<p>Six Apart desperately needs to look at its policy for reusing others content. In that regard, it should look toward sites such as <a href="http://www.digg.com">Digg</a> and <a href="http://www.reddit.com">Reddit</a> that have built great communities without infringing on copyright.</p>
<p>In short, there&#8217;s no reason for a social news site to scrape and repost content like Rojo and Nooz currently do.  Links and snippets are perfectly adequate.</p>
<p>When it&#8217;s all said and done, Six Apart seems to have nothing to gain by scraping and reposting content as it does. Successful news sites have, for a very long time, worked well with content creators and there seems to be no reason for Six Apart to try and change that, especially in a way that is both legally dubious and likely to cause outrage.</p>
<p>Hopefully they will reevaluate their policies soon and come up with a more fair approach to its sites. In the meantime, they are treading on very thin legal ice and dealing with a very wary public.</p>
<p><em><strong>Hat tip:</strong> Thanks to <a href="http://www.typetive.com/">Cybele of Typetive</a> for the heads up about Nooz.com </em></p>
<p><em>Note: During the course of writing this article, which started Thursday, I made several attempts to contact Six Apart by both email and phone. I was able to get in touch with Jane Anderson, Six Apart&#8217;s <a href="http://www.sixapart.com/about/press/">press contact</a>. We scheduled a time for an interview on Monday but, when I called in there was no answer. Subsequent attempts to contact Six Apart via both office phone and cell phone have produced no answer. I will update this article when and if I get further information from them.</em></p>
<p><strong>Update:</strong> I&#8217;ve gotten back in touch with Jane Anderson, she is speaking with her counterparts at Six Apart and will be back in touch with me soon. They have scheduled a meeting for tomorrow to discuss these issues. I will report back after I hear from them.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/04/03/six-apartrojo-now-spam-bloggers/feed/</wfw:commentRss>
		<slash:comments>41</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced

Served from: www.plagiarismtoday.com @ 2012-02-13 13:56:18 -->
