<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Plagiarism TodaySpamming | Plagiarism Today</title>
	<atom:link href="http://www.plagiarismtoday.com/tag/spamming/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.plagiarismtoday.com</link>
	<description>Content Theft, Plagiarism, Copyright Infringement</description>
	<lastBuildDate>Mon, 13 Feb 2012 06:51:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>FAQs: The Basics of RSS Scraping</title>
		<link>http://www.plagiarismtoday.com/2011/05/09/faqs-the-basics-of-rss-scraping/</link>
		<comments>http://www.plagiarismtoday.com/2011/05/09/faqs-the-basics-of-rss-scraping/#comments</comments>
		<pubDate>Mon, 09 May 2011 18:21:39 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[RSS scraping]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Spam-Blogging]]></category>
		<category><![CDATA[Spamming]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=9659</guid>
		<description><![CDATA[RSS Scraping is a problem nearly every webmaster is going to have to face at some point, here's the basics on what it is and what to do about it.]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2011/05/rss-big-icon1-250x250.png" alt="" title="rss-big-icon" width="250" height="250" class="alignleft size-medium wp-image-9664" />RSS scraping is one of the most common and most frustrating types of content theft bloggers, forum admins and other site owners will face as they grow their presence online. Not only does it, often, allow the scraper to grab all of the content from the original site easily, but it also is a tactic used by spammers, who not only are able to exploit the content for search engine gains, but are also among the most despised infringers online.</p>
<p>As such, it&#8217;s important for all webmasters and content creators to be aware of what RSS scraping is, how it works and where it&#8217;s going in the future. Even though <a href="http://www.staynalive.com/2011/05/twitter-and-facebook-both-quietly-kill.html">RSS as a protocol may be on the ropes</a>, RSS scraping is not a problem that&#8217;s going away and, in fact, may be getting a lot worse in the coming years.</p>
<p>With that in mind, here is a quick FAQ on some of the more common questions asked about RSS scraping and what can be done about it.<span id="more-9659"></span></p>
<h4>What is RSS?</h4>
<p>RSS, sometimes referred to as Really Simple Syndication or <a href="http://www.whatisrss.com/">Rich Site Summary</a>, is a protocol that makes it easy for other sites and tools to access the content in your site by formatting your content in a consistent, easy-to-parse way.</p>
<p>Contrary to an HTML document, which could have the content be anywhere on the page, RSS indicates clearly what is the headline, body and other elements of the content. This makes it easy to grab the content and display it elsewhere without the surrounding formatting and HTML code.</p>
<h4>How is RSS Normally Used?</h4>
<p>Traditionally, RSS has been used to enable readers to subscribe to a site using various RSS readers such as <a href="http://www.google.com/reader">Google Reader</a>, <a href="http://www.feeddemon.com/">Feed Demon</a> and even many mail clients. </p>
<p>However, RSS has also been used to power other services, such as <a href="http://www.mailchimp.com/features/rss-to-email/">email newsletters</a> and even <a href="http://www.facebook.com/RSS.Graffiti">Facebook integration</a>.</p>
<h4>What is RSS Scraping?</h4>
<p>RSS scraping is when a third party, usually a spammer, grabs the content in an RSS and republishes it wholesale on another site. </p>
<p>In this regard, RSS scrapers work a great deal like Google Reader, grabbing your site&#8217;s content and displaying it on a site but, where Google Reader places the content behind a password protected wall that can only be accessed by the subscriber (or those who are shared the individual story), scrapers instead place the content on a public site for anyone to view, including search engines.</p>
<h4>Why do People Scrape RSS Feeds?</h4>
<p>Spammers seek high rankings in search engines so they can get traffic to display their ads against or sell products with. To do this, they need content but creating content by hand is time-consuming and difficult, especially when much of it is going to make no difference in the search engines.</p>
<p>RSS scraping is an easy way for spammers, and other sites, to quickly fill their pages with content, even if the content comes solely from other sites.</p>
<h4>How Can RSS Scraping Hurt Me?</h4>
<p>In most cases, RSS scraping doesn&#8217;t hurt. Google and other search engines have become savvy enough about spam that most of the time, they don&#8217;t give much credence to spam sites, keeping them from getting a lot of traffic or harming you in the rankings. </p>
<p>However, the system is far from perfect and there are many times spammers outrank the sites they scrape from for relevant terms. This is especially true with new sites or those that don&#8217;t have a strong search engine presence.</p>
<p>Less likely is that others may confuse the spam site as either being the original site or as being one endorsed by you, thus actively taking traffic from you. Few people, however, make this mistake with spam sites as the distinction is usually very clear.</p>
<p>All in all, the risk from an individual case of RSS scraping is actually fairly low, but the problem is that there is rarely just one or two such scrapers working at any given time.</p>
<h4>What Can I Do About RSS Scraping?</h4>
<p>Dealing with RSS scraping starts with good SEO practices. If you link between your posts, get good inbound mentions and earn social networking shares, odds are that RSS scraping won&#8217;t greatly impact you.</p>
<p>If it does, you can alway seek to have the content removed by either <a href="http://www.plagiarismtoday.com/stopping-internet-plagiarism/4-contacting-the-host/">filing a DMCA notice with the spammer&#8217;s host</a> or, if that fails, <a href="http://www.plagiarismtoday.com/stopping-internet-plagiarism/6-when-all-else-fails/">sending one to Google</a>. </p>
<p>If RSS scraping becomes a more serious and more recurring problem, you  may want to consider truncating your feeds or eliminating them. <a href="http://www.plagiarismtoday.com/2007/01/04/the-six-worst-ways-to-protect-content/">Though that would be an extreme last resort</a>.</p>
<h4>Is RSS Scraping Illegal?</h4>
<p>Some have made arguments that distributing your content via an RSS feed, even if you didn&#8217;t realize you were doing it, creates an implied license to use it in this manner. However, <a href="http://www.plagiarismtoday.com/2006/08/29/why-rss-scraping-isnt-ok/">there are many problems with that and other related arguments on RSS scraping</a>. </p>
<p>Generally, RSS scraping is considered to be copyright infringement, though there are <a href="http://www.plagiarismtoday.com/2006/08/24/linkworthy-scraping-as-a-legal-minefield/'">other legal arguments against RSS scraping</a> as well. </p>
<h4>What if I Want to Encourage RSS Scraping and Reuse</h4>
<p>If you want others to scrape your RSS feed, you can actually give blanket permission to do that by <a href="http://wiki.creativecommons.org/Syndication">inserting a Creative Commons license into your feed</a>. This will let bots that do scraping know your intentions and, those that are complying with the law should be able to follow your wishes.</p>
<h4>How Can I Track RSS Scraping?</h4>
<p>Many people will find RSS scrapers on accident when they search for keywords relevent to their blog or site. However, you can keep track of your content using automated tools like <a href="https://fairshare.attributor.com/fairshare/">Fairshare</a> that are designed for tracking dynamic content.</p>
<p>In the end though, its best to keep an eye on the search engines for terms that others commonly find your site through as scrapers will often show up for those same results though, initially, they will likely be lower than your site.</p>
<h4>What is the Future of RSS Scraping</h4>
<p>Though it&#8217;s difficult to predict what spam tactics will be popular in the coming years, RSS scraping has been a problem for at least six years and is continuing today.</p>
<p>That being said, it has fallen out of favor with many spammers, who prefer content generation or scraping excerpts from feeds to avoid duplicate content penalties in the search engines. Still, many active spammers use the method though spammers have clearly become more diversified in this area.</p>
<h4>Bottom Line</h4>
<p>There&#8217;s no doubt that RSS scraping can be and often is very annoying and very problematic. That being said, there&#8217;s no reason that it should be a major headache or that it should become a reason to walk away from your site. Most cases of RSS scraping don&#8217;t have a major impact on a blog and those that do can usually be dealt with.</p>
<p>That being said, if you are having a serious problem with RSS scraping, please f<a href="http://www.plagiarismtoday.com/contact-pt/">eel free to drop me a line or</a>, if you think you may need outside help, feel free to <a href="http://copybyte.com">see if I can help via my consulting services</a>. </p>
<p>All in all, RSS scraping is a reality most bloggers and webmasters will have to deal with, but it&#8217;s not one that should sink your site if you&#8217;re savvy about how to handle it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2011/05/09/faqs-the-basics-of-rss-scraping/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>WordPressDirect Addresses Spam Issue</title>
		<link>http://www.plagiarismtoday.com/2008/12/02/wordpressdirect-addresses-spam-issue/</link>
		<comments>http://www.plagiarismtoday.com/2008/12/02/wordpressdirect-addresses-spam-issue/#comments</comments>
		<pubDate>Tue, 02 Dec 2008 15:00:19 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[Spamming]]></category>
		<category><![CDATA[splog]]></category>
		<category><![CDATA[Wordpress]]></category>
		<category><![CDATA[wordpressdirect]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=2197</guid>
		<description><![CDATA[WordPressDirect, a move that it hopes will placate the concerns many have expressed about the service, is removing auto-posting from free members. But is it enough to calm the angry mob?]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/12/wordpressdirect-logo-300x52.png" alt="wordpressdirect-logo" title="wordpressdirect-logo" width="300" height="52" class="alignleft size-medium wp-image-2198" />WordPressDirect, the controversial WordPress setup and management service that was <a href="http://mashable.com/2008/11/23/wordpressdirect/">covered on Mashable</a> and <a href="http://www.blogherald.com/2008/11/24/wordpressdirect-blogging-tool-or-spam-engine/">by myself on the Blog Herald</a>, has announced a change in its policy that it hopes will alleviate many of the spam concerns.</p>
<p>The policy change, will remove all of the automated content posting features from free user accounts, which make up the &#8220;vast majority&#8221; of WPD members, according to Marty Rozmanith, the creator of WPD.</p>
<p>The tools, however, will remain available for all paid members of the service, regardless of the level they choose. </p>
<p>Previously, unpaid members had limited access to some of the content posting tools, including the Yahoo! Answers, article database and RSS posting tool, enabling free members, who were limited to only three blogs, to automatically post content from a variety of sources, typically without permission.</p>
<p>Whether this does anything to stem the vitriol that has been directed at the service remains to be seen, but I can&#8217;t see how many will be convinced, especially when there are so many difficult questions to be answered.<span id="more-2197"></span></p>
<h4>WordPressDirect Recap</h4>
<p>For those who did not read the previous articles about WPD, the service promotes itself as a &#8220;WordPress deployment and maintenance service that helps people especially those with very little technical experience) create a search-optimized WordPress blog.&#8221;</p>
<p>In short, it is a one-click install program that not only sets up the software, but also adds a theme, optimizes the permanlinks and makes a handful of other SEO-oriented changes. In that regard, it is much like <a href="http://www.netenberg.com/fantastico.php">Fantastico</a>, but with added features to help get the blog started.</p>
<p>However, WordPressDirect stepped into controversy with its add-on tools, which allow users to automatically update the blogs they create using content from a variety of sources including RSS feeds, online article databases and more.</p>
<p>This caused many, especially on the Mashable article, to accuse the site of being a spam service. In that regard, it does share many traits, especially when you look at how the tools work and where they pull their content from.</p>
<p>WordPressDirect attempted to defend itself against the accusations, blaming much of the problem on their marketing, but the attempts to make peace fell on deaf ears for the most part. This, in turn, led to the recent changes they just announced.</p>
<h4>Fixing the Problem?</h4>
<p>Most likely, these changes are going to do little to nothing to placate the mob that has formed around WordPressDirect. Though the changes mean that the 9000+ free members of the site will no longer have the ability to automatically scrape and repost content, it says nothing about the paid members. The limitations on free accounts, including just three blogs per user, effectively meant that no one could actually be a master spammer with a free account (unless they spammed WPD and set up thousands of accounts).</p>
<p>To many, including myself, this sounds like a very shrewd maneuver. Though it removes most of the users from the ability to do spam-like things, it does not affect the paid ones and the email contained several pitches for the paid packages. It seems not like an attempt to shed the spam-related but to profit from it.</p>
<p>This move does not remove these tools from the power users nor does it impact their bottom line in any meaningful way, other than perhaps adding a few new paid members.</p>
<p>WPD, as a service, is walking a very thin line. It is trying to proclaim itself to not be a spam tool while offering many of the exact same features that are found in spam applications. Though, as I said in my Blog Herald article, it would make a very poor spamming program, it is completely foreseeable and almost certain that users, likely even most users, would use it for that purpose.</p>
<p>Furthermore, issues such as the trademark concerns over the use of the WordPress name, the lack of attribution of copied works, etc. remain unaddressed. Though it is a good step, it seems to be one either too small or in an unrelated direction.</p>
<h4>Conclusions</h4>
<p>Shortly after my Blog Herald article was released, WPD sent out an email to all members saying that it was &#8220;most balanced article&#8221; he could find.</p>
<p>Though I try to balanced with all of my coverage, I can not hide the fact that WordPressDirect has me very uneasy and nervous. The service has far too much use for evil and, even though I don&#8217;t know if its creators built the service with such intentions, that is the use that instantly springs to mind for myself and many others.</p>
<p>The problem is that a service such as what WPD proclaims to be, a WordPress installation aid that auto-optimizes the blog, could be very useful. I could even see someone such as myself using it rather than keeping a WordPress checklist for every new blog I install (I routinely get recruited to help with WP installations). </p>
<p>But as useful as that could be, the service, is too hot to touch right now and I seriously doubt that is going to change with these recent revisions to their policies. Though I am going to keep an eye open on the marketing changes they mentioned, I don&#8217;t see WPD becoming any less of a tainted name anytime soon.</p>
<p>To repair its name, WPD is going to have to make sacrifices that may hurt its business. Sadly, it doesn&#8217;t seem to be what they are doing right now. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/12/02/wordpressdirect-addresses-spam-issue/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Attributor Analyzes TrueAudience</title>
		<link>http://www.plagiarismtoday.com/2008/11/19/attributor-analyzes-trueaudience/</link>
		<comments>http://www.plagiarismtoday.com/2008/11/19/attributor-analyzes-trueaudience/#comments</comments>
		<pubDate>Wed, 19 Nov 2008 18:08:15 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Attributor]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[Spamming]]></category>
		<category><![CDATA[Splogs]]></category>
		<category><![CDATA[study]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=2123</guid>
		<description><![CDATA[A recent study by content tracking service Attributor has found that, for many publishers, their audience off their site completely dwarfs the pageviews they can count.]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/04/attributor-logo.jpg" alt="" title="attributor-logo" width="206" height="77" class="alignleft size-full wp-image-1000" />A recent study by content tracking company Attributor attempted to determine the true audience of a Web publisher by analyzing both the viewership the site&#8217;s content gets on its own site and what it gets on other sites where it is copied onto, usually without a license.</p>
<p>The results were stunning. <a href="http://www.attributor.com/blog/trueaudience/">According their report</a>, on average the sites that they studied had 140% more views of their content on other sites than the original. This meant that well over half of all views of the content took place on sites other than the creator&#8217;s and were unavailable for either monetization or, in many cases, attribution.</p>
<p>Though the results are interesting, they likely are not at all surprising to many who deal with copyright and plagiarism issues on the Web. With human copying, RSS aggregation (both good and bad) and other republication as common as it is, many had already suspected that the audience of a content was much larger off the site than on it. Attributor is simply one of the first to conduct a study that shows it.</p>
<p>However, there are several elements of the study that are interesting beyond the initial findings and may offer clues as to what Webmasters are most at risk of having their content misused.<span id="more-2123"></span></p>
<h4>How it Was Done</h4>
<p><a href="http://www.attributor.com/docs/TrueAudience.pdf">According to the report</a> (PDF), they analyzed 100 publishers from the <a href="http://lists.compete.com/">Compete Top 1000</a>, first discarding their existing customers and sites with partial feeds, and added some publishers into the list on top of that to ensure a good mix of different topics.</p>
<p>They then ran the sites through the content matching service and analyzed all of the copies that had higher than a 50% match and more than 125 words the same. After removing known licensees, they looked at the sites that had information on and used traffic estimates from Compete to get an approximate idea of how large the viewership was on these match sites.</p>
<p>After that, they then broke down the results by broad category to see which kinds of publishers had the largest &#8220;Audience Multiplier&#8221;, meaning viewership of their content on other sites. </p>
<p>The results were that, on average, the publishers tested had nearly 60% of their content viewership on other sites. This leads to missed opportunities both for linkbuilding and for monetization as well as possible causes for removal requests.</p>
<p>This obviously will be of great interest to many Web publishers, who are looking for ways to maximize their audience in the face of an economic slow down, but may not come as a surprise to those that have studied these issues.</p>
<p>However, other findings of the study are potentially even more useful to Webmasters, especially those in high-risk fields.</p>
<h4>High-Risk Topics</h4>
<p>The study, in addition to looking at publishers in general, broke down their results by content category and the results there were staggering.</p>
<p>Of the sites listed, automotive sites fared the worst. For them, they had nearly seven times the audience on other sites than they did on their own. Travel sites also had a high multiplier, over five times the amount and movie reviews had just under five. </p>
<p>In each of the cases above, the sites have audiences on other parts of the Web that easily dwarf their own traffic, meaning they are experiencing the highest level of unlicensed copying and the sites that are copying them have the highest amount traffic levels.</p>
<p>The topics that fared best were politics and health, both of which had barely over one. However, in both cases, the audience is still larger on the rest of the Web, only in these cases it is by a very small margin.</p>
<p>Why there is such a wide divide between the different topics is very difficult to say. Without the full statistics, which were not available in the report, it could be due to a variety of reasons. Though it seems unlikely that one site would be scraped and republished significantly less than another, especially since nearly all of the topics have a strong spammer following, it could be a sign that copying and pasting has had a higher degree of success in certain categories.</p>
<p>For example, with political sites, most people visit the one or two sites that they trust rather than performing blind searches. However, when automotive problems arise, people tend to put queries into Google and trust the search results. Even though both sites likely have many copies of their content, one is able to rely on their brand loyalty to keep much of their audience close by.</p>
<p>However, this is only a guess, but it is clear that it is time to think about the sites in danger slightly differently. Technology and health were two categories with very low multipliers, though they both have a very high tendency to attract spammers.</p>
<h4>Some Caveats</h4>
<p>It is worth noting that the study, while useful, should not be considered scientifically valid. The sampling size is too small and the traffic statistics, though likely about as good as possible, leaves room for error.</p>
<p>It is very difficult to imagine a more thorough study being performed without the backing of a major university, but any study in this area is likely to face similar challenges and limitations. </p>
<p>The other element is that this study focuses on large publishers and not regular bloggers. Whether this means that bloggers would have a much higher audience multiplier due to their smaller initial audience or a smaller one due to less copying and scraping remains to be seen.</p>
<p>Though, likely, the results on the different categories of content and their relative risk may transfer well to smaller publishers, a separate study would likely be needed for smaller bloggers to see how their audience compares.</p>
<p>Still, the purpose of the study is not to necessarily achieve these goals, but to illustrate the possibility of a much larger audience outside of the original site. This is something many have suspected but, to my knowledge, this is the first study to attempt to discuss the issue.</p>
<h4>Conclusions</h4>
<p>The bottom line is simple, most likely the audience for your content is much larger off your site than it is on it. What you do about that is completely up to you. Whether you attempt to monetize it, turn it into promotion or request removal of it (or a combination of all three) is up to the individual author and the course they want to take.</p>
<p>No matter what though, it is clear that this audience and its potential (both for harm and for good) is too big to ignore and it is important to start tracking and understanding what is going on. Whether it is through a professional service such as Attributor, one targeted at individuals such as <a href="http://www.copyscape.com">Copyscape</a> or even simple Google searches, the time to understand these uses is now.</p>
<p>What the Attributor study illustrates, more than anything, is the need for an even deeper understanding of how this copying takes place, what it means for publishers and what strategies could they use to maximize their benefit from it.</p>
<p>This is an area ripe for exploration moving forward and one that will require a great deal of creativity and work.</p>
<p><em>Disclosure: I work as a consultant for Attributor.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/11/19/attributor-analyzes-trueaudience/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>2008: Looking Ahead</title>
		<link>http://www.plagiarismtoday.com/2008/01/02/2008-looking-ahead/</link>
		<comments>http://www.plagiarismtoday.com/2008/01/02/2008-looking-ahead/#comments</comments>
		<pubDate>Wed, 02 Jan 2008 17:16:43 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[DMCA]]></category>
		<category><![CDATA[Legal Issues]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[MPAA]]></category>
		<category><![CDATA[perez-hilton]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[RIAA]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[Spamming]]></category>
		<category><![CDATA[Splogs]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2008/01/02/2008-looking-ahead/</guid>
		<description><![CDATA[Considering that my predictions for 2007 fell somewhere between &#8220;mixed&#8221; and &#8220;unmitigated disaster&#8221; I am almost loathe to try again. However, in the spirit of the season, I almost can not control myself. So, with that in mind, I&#8217;m taking a look at the year ahead with seven new predictions about what is in store...]]></description>
			<content:encoded><![CDATA[<p>Considering that <a href="http://www.plagiarismtoday.com/2008/01/01/2007-a-year-in-content-theft/">my predictions for 2007</a> fell somewhere between &#8220;mixed&#8221; and &#8220;unmitigated disaster&#8221; I am almost loathe to try again. However, in the spirit of the season, I almost can not control myself.</p>
<p>So, with that in mind, I&#8217;m taking a look at the year ahead with seven new predictions about what is in store for us in 2008 when it comes to content theft and copyright issues. </p>
<p>Hopefully these predictions will strike a good balance between the dead obvious and the completely insane as I try to figure out what is coming up in this clearly unpredictable field.<br />
<span id="more-772"></span><br />
<strong>Spinning Spam Increases</strong></p>
<p>I <a href="http://www.blogherald.com/2007/12/31/2008-the-year-ahead-for-spam-blogs/">made this prediction a few days ago</a> on the Blog Herald but it is worth repeating here.</p>
<p>Spammers are facing increased competition for search engine results from both legitimate sites and other spammers. Couple that with duplicate content penalties, copyright issues and smarter search engines, and innovation is essential for their survival.</p>
<p>One of the forms that innovation will likely take is the form of spinning content, through a combination of synonymizing and translating. This type of content theft is harder to detect and will require new techniques to detect and stop.</p>
<p>Expect more on this here in the coming weeks. </p>
<p><strong>New Technologies Change the Game</strong></p>
<p><a href="http://www.attributor.com"><img SRC="http://img.skitch.com/20080102-xpds4gkf1qjw9uwj3u3jtm9415.png" align="right" hspace="10" vspace="10"/></a>Another repeat from the aforementioned Blog Herald article, but yet another important one. Companies, such as <a href="http://www.attributor.com">Attributor</a>, are poised to really change the game by providing powerful, easy to use tools that can not only improves the effectiveness of content theft fighting, but makes it accessible to a broader audience. </p>
<p>However, even if Attributor doesn&#8217;t work out, other companies, some of whom are stealth, will likely be coming onto the scene this year and could have similar results.</p>
<p>In short, this is going to be the year in which companies realize the potential market in helping bloggers protect their content and start to exploit it.</p>
<p><strong>False DMCA Notices</strong></p>
<p>Hopefully I am allowed to repeat a prediction from last year as this one is almost a given. With thousands and thousands of DMCA notices being hurled each day, it is a certainty that some will be off target.</p>
<p>Expect several more DMCA notice controversies over the next year, the perfect storm of rabid copyright holders, remix culture and lawsuit-frightened hosts is just too much to ignore.</p>
<p><strong>RIAA Loses Traction</strong></p>
<p><a href="http://www.riaa.com"><img SRC="http://img.skitch.com/20080102-1c88sccwydwi52ifppmniirpi6.png" align="left" hspace="10" vspace="10"/></a>Though 2007 was a bad year for both the RIAA and the music industry, 2008 will likely be even worse. This may well be the year that the record industry starts to distance itself from its boogeyman and the RIAA will start to loose support even among its members.</p>
<p>Some of this has already happened with <a href="http://mashable.com/2007/11/28/emi-to-cut-riaa-funding-death-of-riaa-near/">EMI planning on slashing funding</a>, but with the record labels turning against DRM, I doubt they will be the only ones.</p>
<p>No matter the wins or losses they receive in the courtrooms, expect the RIAA to leave 2008 even weaker than it went in.</p>
<p><strong>Perez Hilton Loses</strong></p>
<p>Though I honestly didn&#8217;t expect <a href="http://www.plagiarismtoday.com/2007/02/23/perez-hilton-gets-sued-again/">these lawsuits</a> to reach 2008 but, since they have, it seems likely that they will come to some kind of a conclusion this year. If they do, expect Perez Hilton to suffer a few grave setbacks.</p>
<p>Granted, the case will probably not be over in its entirety in the next 365 days, appeals and so forth can take many years, but the writing should be on the wall for Hilton well before the ball drops on 2008.</p>
<p>Anything else would be the product of either a stunning upset, or an unbearable delay.</p>
<p><strong>Old Media Joins the Fun</strong></p>
<p>Though the record and movie studios have been fighting to protect their content on the Web for years now, old media, namely magazines and newspapers, have not been extremely active in this field.</p>
<p>Expect that to change in the coming year but, unlike RIAA, expect them to better understand how to leverage the Web. I suspect that their delay in entering the fray has taught them a great deal and, looking at the <a href="http://www.downloadsquad.com/2007/09/18/the-new-york-times-pay-for-content-service-bites-the-dust/">New York Times shift away from paid content</a>, it seems likely that newspapers &#8220;get it&#8221; at least a little bit more.</p>
<p>Perhaps this is a case of wishful thinking, but it isn&#8217;t without at least some grounding in reality.</p>
<p><strong>Image Search Grows Up</strong></p>
<p>Image search, especially when dealing with content theft, is an ineffective black art. I am hoping, perhaps against my better judgment, that 2008 will be the year that image search grows up and we can actually detect duplicate images on the Web even if the file has been renamed, has had the EXIF data changed or has been altered.</p>
<p>This is easily my farthest step out on the limb as I have little reason to believe this, but the market here is just too big to ignore and. with more and more photography winding up on the Web and so much content theft/plagiarism in this area, image protection is easily the next big market for protection.</p>
<p>That is, if it isn&#8217;t already the current big market.</p>
<p><strong>Conclusions</strong></p>
<p>As far as copyright issues go, 2007 was a wild year and 2008 is poised to be even more crazy. </p>
<p>However, no matter what happens in 2008, or how wrong these predictions turn out, stay tuned to this site as I will do my best to keep you up to date on what is going on and, hopefully, share some useful information along the way.</p>
<p>Thank you all for making 2007 such a great year for Plagiarism Today and I am looking forward to a very exciting 2008. </p>
<p><em><strong>Disclosure:</strong> I am a consultant for Attributor. </em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/01/02/2008-looking-ahead/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Perhaps I Spoke Too Soon</title>
		<link>http://www.plagiarismtoday.com/2007/07/03/perhaps-i-spoke-too-soon/</link>
		<comments>http://www.plagiarismtoday.com/2007/07/03/perhaps-i-spoke-too-soon/#comments</comments>
		<pubDate>Tue, 03 Jul 2007 22:18:21 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Legal Issues]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Adsense]]></category>
		<category><![CDATA[Blogger]]></category>
		<category><![CDATA[blogspst]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[Spamming]]></category>
		<category><![CDATA[Splogs]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/07/03/perhaps-i-spoke-too-soon/</guid>
		<description><![CDATA[In a pair of previous posts I lauded Google&#8217;s progress in the war on Spam on its Blogspot service. Though my intuition was confirmed, at least somewhat, by Google itself, it appears I might have spoken too soon. I recently ran across this link on the social news site Reddit. It is a submissions listing...]]></description>
			<content:encoded><![CDATA[<p>In a pair of previous posts I lauded Google&#8217;s <a href="http://www.plagiarismtoday.com/2007/06/26/is-blogger-on-the-offensive-against-spam/">progress in the war on Spam on its Blogspot service</a>. <a href="http://www.plagiarismtoday.com/2007/06/28/update-google-responds-regarding-blogspot-spam/">Though my intuition was confirmed</a>, at least somewhat, by Google itself, it appears I might have spoken too soon.</p>
<p>I recently ran across this link on the social news site Reddit. It is a submissions listing for the user &#8220;lecoq&#8221;, who has submitted hundreds of entries from various Blogspot Blogs. All of the entries that I checked <a href="http://weirdoddities.blogspot.com/2007/06/zebra-donkey-zonkey.html">contained content</a> <a href="http://www.firstcoastnews.com/news/strange/news-article.aspx?storyid=36339">found first on other sites</a> and all were surrounded by different ad units, usually from several different ad networks.</p>
<p>The end result is that these blogs appear to be nothing more than the traditional definition of spam. However, the list, which appeared on the front page of Reddit a few days ago, has not resulted in any removals by Google.</p>
<p>It does appear though, at this moment, that this is the work of a human and not of an automated scraper. There seems to only be about six spam blogs involved and they pull from a variety of sources.</p>
<p><span id="more-529"></span>Though this definitely does not follow the typical format for a spam blogger, it is worrisome that, even after all of the attention this list received, that almost none of the blogs have been shut down. It is obvious, given the submission pattern, that the intent here is to spam social news sites with copied content, and spam is a violation of <a href="http://www.blogger.com/content.g">Google&#8217;s Blogger Content TOS</a>.</p>
<p>Hopefully, the offensive is ongoing and this is just a hiccup in the system, probably caused by the human element in the posting, and that Google will address it soon. In the meantime, I&#8217;m going to keep an eye on this list and see about contacting some of the content owners to alert them of the misuse.</p>
<p>I encourage others to do the same.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/07/03/perhaps-i-spoke-too-soon/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Using .htaccess to Stop Content Theft</title>
		<link>http://www.plagiarismtoday.com/2007/07/02/using-htaccess-to-stop-content-theft/</link>
		<comments>http://www.plagiarismtoday.com/2007/07/02/using-htaccess-to-stop-content-theft/#comments</comments>
		<pubDate>Mon, 02 Jul 2007 17:22:38 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Prevention]]></category>
		<category><![CDATA[.htaccess]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[hotlinking]]></category>
		<category><![CDATA[images]]></category>
		<category><![CDATA[Photos]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[Spamming]]></category>
		<category><![CDATA[Splogs]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/07/02/using-htaccess-to-stop-content-theft/</guid>
		<description><![CDATA[Having control over your own server can be a very powerful thing. It enables you to control who can access your site, how they visit it and what they can see. Generally, however, that power is best left unused. For the most part, restricting people&#8217;s access to your site is a bad move. Though you...]]></description>
			<content:encoded><![CDATA[<p>Having control over your own server can be a very powerful thing. It enables you to control who can access your site, how they visit it and what they can see.</p>
<p>Generally, however, that power is best left unused. For the most part, restricting people&#8217;s access to your site is a bad move. Though you can use your powers to carve out a members-0nly area or prevent others from accessing administrative areas of the site, turning people away from the door is usually unwise.</p>
<p>Still, there are some people that you want to keep out. RSS scrapers and image hotlinkers, for example, offer nothing to your site but instead only steal your content, your bandwidth and your other resources.  If you can prevent them from accessing your site in the first place, without impacting other users, it is probably in your best interest to do so.</p>
<p>Fortunately, with Apache&#8217;s .htaccess file, it is possible to do all of those things and more. All one has to do is understand a few basics and get the code that they need.</p>
<p><span id="more-526"></span><strong>A Quick Primer</strong></p>
<p><a href="http://httpd.apache.org/docs/1.3/howto/htaccess.html">According to the Apache Software Foundation</a>, .htaccess is a distributed configuration file that provides &#8220;a way to make configuration changes on a per-directory basis&#8221;. It is most commonly used when a Webmaster has access to the server, but not the core configuration files for that server. This is typical of most shared hosting environments.</p>
<p>When editing an .htaccess file, there are three important things to remember:</p>
<ol>
<li><strong>.htaccess is the name of the file:</strong> In short, htaccess is the extension and there is no file name. This can make editing the file difficult on some computers, but it is important that the convention be followed. If needed, name the file something else and rename it after uploading it to your server.</li>
<li><strong>It is an ASCII file:</strong> .htaccess is a plain text file and should only be edited in a text editor such as Notepad.</li>
<li><strong>It only works with Apache: </strong>Though other servers, <a href="http://support.microsoft.com/kb/324064">such as Microsoft&#8217;s IIS Server</a>, offer similar features. .htaccess itself is only for Apache-based servers. If you are unsure of what kind of server you have, check with your hosting provider.</li>
</ol>
<p>Finally, it is important, when working with .htaccess, to back up well and be careful with your edits. A poorly-constructed .htaccess file can render your site useless.</p>
<p>But despite these warnings, .htaccess files are, generally, very easy to edit and manipulate. Furthermore, there is a lot of very good free code ready for you to copy, paste and manipulate to fit your needs.</p>
<p><strong>Stop Image/File Hotlinking</strong></p>
<p>One of the easiest and most basic tasks that can be performed with .htaccess is stopping image/file hotlinking. This is the process by which other sites link directly to your files, either having them display or download directly from their site. This not only amounts to content theft, as well as often plagiarism, but also  bandwidth theft as your server spends the resources to serve the file everyone someone on their site calls for it.</p>
<p><a href="http://www.zann-marketing.com/developer/20050713/stop-image-hotlinking-using-htaccess.html">According to Zann Marketing</a>, the process is very simple. All one has to do is navigate to their images folder and either create a new .htaccess file or add the following code to their existing one:</p>
<blockquote><p>RewriteEngine on<br />
RewriteCond %{HTTP_REFERER} !^$<br />
RewriteCond %{HTTP_REFERER} !^http://yoursite.com.*$ [NC]<br />
RewriteCond %{HTTP_REFERER} !^http://www.yoursite.com.*$ [NC]<br />
ReWriteRule .*\.(gif|jpg|png)$ &#8211; [F]</p></blockquote>
<p>The first line tells the server to turn the Rewrite engine on,  the second line instructs the server to check and see if the referrer is blank, the third and fourth line check to make sure that is not from your own site and the fifth line instructs the server to disallow the request for the selected file types if none of the above statements are true.</p>
<p>With this code, you can easily modify it several different ways including:</p>
<ol>
<li><strong>Add New Domains:</strong> You can add new domains and sites to allow hotlinking from. The original example from Zann Marketing includes the IP address for Google Images, for example. You can include other search engines as well.</li>
<li><strong>Add New File Types:</strong> By editing the last line, you can modify your rules to include any kind of file necessary including movie files, documents and anything else you wish to have protected from hotlinking.</li>
<li><strong>Disable Access to Blank Referrers:</strong> By removing the second line, you can prevent access from browsers and tools that return a blank referrer. Though some scrapers and black hat spiders do this, so do many visitors in a bid to protect their privacy.</li>
</ol>
<p>Though this method will not stop people from saving your images to their hard drive and uploading it where they please, it can prevent people from stealing both your image and your bandwidth at the same time.</p>
<p>Also, on the original Zann Marketing page, there are examples for blocking just one hotlinker and to redirect hotlinkers to another image, thus pulling the famous &#8220;switcheroo&#8221;.</p>
<p>Finally, if the process of editing the code seems too daunting, you can also use <a href="http://www.htmlbasix.com/disablehotlinking.shtml">HTML Basix&#8217;s .htaccess code generator</a> to create an .htaccess code set for you to copy and paste into your file.</p>
<p><strong>Blocking RSS Scraping</strong></p>
<p>Equally easy, or in some cases even easier, than blocking hotlinking is blocking RSS scrapers. All you need to do so is determine the IP address of the RSS scraper. (Note: You can use domains if you wish. However, since not all scraping software is located on the domain itself, IP addresses are more reliable).</p>
<p>The easiest way to determine the IP address of a scraper is to <a href="http://www.plagiarismtoday.com/2007/05/24/copyfeed-plugin-now-available-in-english/">use the Copyfeed plugin</a> and have it place the IP address in the scraped content. This not only eliminates the need to translate domain names into IP addresses, but also works in case where the scraping software is located on another server or computer.</p>
<p>However, if that fails or is not an option and the scraped site is hosted on its own domain, you can simply use the IP address for the server itself. To determine the IP address for a domain, simply enter it into a site like <a href="http://www.domaintools.com/">Domain Tools</a> and let it get the IP for you. It only takes a few seconds.</p>
<p>If the scraper is using a free service such as Blogspot, you will likely have to look into your server logs and attempt to find traffic on the feed that times out with when the posts go up on the scraper site. It is a risky task to undertake as you can accidentally block legitimate users and it can be very time-consuming on larger sites, but it is the only option in some cases if you wish to use blocking techniques.</p>
<p>No matter what method you use, once you have the IP address, all that is required, <a href="http://www.javascriptkit.com/howto/htaccess5.shtml">according to JavascriptKit</a>, is the following code in the .htaccess file of your feed&#8217;s directory:</p>
<blockquote><p>order allow,deny<br />
deny from xxx.xxx.xxx.xxx<br />
allow from all</p></blockquote>
<p>Editing the code is easy, all you have to do is replace the Xs with the IP address of the scraper. You can add more lines to as new scrapers emerge and you can also use wildcards by leaving off numbers. For example 123.123.123., would block all IP addresses that start with 123.123.123. This can be useful if a scraper has an IP that changes, but only within a certain range.</p>
<p>It is important to note that this code will block ALL access to your site for that IP address. However, there is very little reason to allow a scraper onto your site as, most likely, they are only accessing the feed anyway.</p>
<p>Also, if you want to redirect scrapers to a fake feed, you can use the method <a href="http://www.hung-truong.com/blog/2006/06/22/how-to-stop-rss-scrapers-from-stealing-your-content-plus-revenge/">discussed on Hung Truong</a>, which often generates very humorous results.</p>
<p>Finally, once again, if you are uneasy with editing the files yourself, <a href="http://www.htmlbasix.com/blockusers.shtml">HTML Basix offers another .htaccess generator</a>, this one to block users. It is also useful to stop RSS scraping.</p>
<p>The bottom line is that, once you obtain the IP address of the scraper, it is trivial to block them using .htaccess. All you need is a little bit of understanding and some freely-available code.</p>
<p><strong>Conclusions</strong></p>
<p>Though .htaccess editing can seem very intimidating to a novice, it is actually very easy to do. With the proper tools and a few fundamentals, anyone can manipulate their .htaccess files and use it to their advantage.</p>
<p>Though these manipulations won&#8217;t do anything to stop human plagiarism it can stop some of the more common types of plagiarism before they happen, all without impacting legitimate users at all. It makes sense, if possible, to use these methods to your advantage.</p>
<p>However, it is important to note that you are unlikely to find a free host that allows manipulation of .htaccess files. This is, predominantly, the feature of paid hosting companies. Also, it will not work if your images and RSS feeds are on another server, such as Flickr or FeedBurner.</p>
<p>But if you&#8217;ve paid for your hosting, it makes sense to use the tools that come with that kind of an upgrade. One of those is the ability to protect your content at the server level.</p>
<p>It is a great power and one that is sorely underused on the Web.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/07/02/using-htaccess-to-stop-content-theft/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>Update: Google Responds Regarding Blogspot Spam</title>
		<link>http://www.plagiarismtoday.com/2007/06/28/update-google-responds-regarding-blogspot-spam/</link>
		<comments>http://www.plagiarismtoday.com/2007/06/28/update-google-responds-regarding-blogspot-spam/#comments</comments>
		<pubDate>Thu, 28 Jun 2007 16:30:53 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[Spamming]]></category>
		<category><![CDATA[Splogs]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/06/28/update-google-responds-regarding-blogspot-spam/</guid>
		<description><![CDATA[Yesterday I received an email from a representative at Google Blogspot. In my inquiry to him, I had asked whether or not Google was on a recent offensive against spam blogs, as I had speculated earlier this week. As with most replies from Google about such matters, it was very vague and short on details....]]></description>
			<content:encoded><![CDATA[<p>Yesterday I received an email from a representative at Google Blogspot. In my inquiry to him, I had asked whether or not Google was on a recent offensive against spam blogs, <a href="http://www.plagiarismtoday.com/2007/06/26/is-blogger-on-the-offensive-against-spam/">as I had speculated earlier this week</a>. </p>
<p>As with most replies from Google about such matters, it was very vague and short on details. However, it did state the following: </p>
<blockquote><p>&#8220;I can&#8217;t really comment on anything further than what you&#8217;ve noticed, but you&#8217;re right: 1) some spammy blogs are showing the Terms of Service warning; and 2) this is a good thing.&#8221;</p></blockquote>
<p>He went on to say that many spam blogs, possibly most, are currently being shut down even before the search engines have a chance to crawl them.</p>
<p>Though I will have to continue to dig for more information. It is nice to confirm that Google is taking strong action and that, perhaps for the first time, it seems to be fairly effective. We will have to see how and if spam bloggers change their game and, if they do, how Google will respond.</p>
<p>The cat and mouse game, most likely, has just begun. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/06/28/update-google-responds-regarding-blogspot-spam/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>302 Hijacking: An Old Danger Made New Again</title>
		<link>http://www.plagiarismtoday.com/2007/06/14/302-hijacking-an-old-danger-made-new-again/</link>
		<comments>http://www.plagiarismtoday.com/2007/06/14/302-hijacking-an-old-danger-made-new-again/#comments</comments>
		<pubDate>Thu, 14 Jun 2007 15:30:54 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Prevention]]></category>
		<category><![CDATA[302]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[hijacking]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[redirects]]></category>
		<category><![CDATA[seo]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Spamming]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/06/14/302-hijacking-an-old-danger-made-new-again/</guid>
		<description><![CDATA[Ralph Rocks is a fragrance by the Ralph Lauren company. Now Smell This is a popular blog about perfumes that wrote the top-ranked page for a search on the fragrance. However, looking at the Google results, you&#8217;d never know that. The top search result belongs not to Now Smell This, but a fashion site called...]]></description>
			<content:encoded><![CDATA[<p><a href="http://nowsmellthis.blogharbor.com/blog/_archives/2006/8/25/2263700.html">Ralph Rocks</a> is a fragrance by the Ralph Lauren company. <a href="http://nowsmellthis.blogharbor.com/">Now Smell This</a> is a popular blog about perfumes that wrote the top-ranked page for a search on the fragrance.  </p>
<p>However, looking at the Google results, <a href="http://www.google.com/search?client=safari&#038;rls=en&#038;q=%22Ralph+Rocks%22&#038;ie=UTF-8&#038;oe=UTF-8">you&#8217;d never know that</a>. The top search result belongs not to Now Smell This, but a fashion site called <a href="http://www.stylefeeder.com/">Stylefeeder</a> (nofollowed). </p>
<p>Now Smell This is at the very bottom of the front page, tenth over all.</p>
<p>But if you click the Stylefeeder link, something strange happens, you get taken not to Stylefeeder&#8217;s site, but rather, to Now Smell This. Though the domain in Google clearly reads Stylefeeder.com, you land on Now Smell This&#8217; Ralph Rocks page.</p>
<p>A simple look at the source code of the Stylefeeder page reveals the problem, it&#8217;s not a page at all. It&#8217;s a redirect. What happened is that Now Smell This has fallen victim to an almost ancient form of search engine spam, the 302 referrer hijack. </p>
<p>It&#8217;s an old threat, but as this case proves it is still around and it is a way for a spammer to steal your content and your ranking without ever copying a single word.</p>
<p><span id="more-514"></span><strong>Hijacking 101</strong></p>
<p>The 302 hijack is actually pretty straightforward. As Claus Schmidt explains <a href="http://clsc.net/research/google-302-page-hijack.htm">in his paper on the subject</a>, the <a href="http://www.internetofficer.com/seo/302-redirect/">302 redirect</a> is supposed to be used to temporary redirect users and search engines to a new site.</p>
<p>To a search engine or a browser, it is a way of saying that the content you seek is no longer here, but that it is, for the moment, at this new link. However, since the redirect is temporary, search engines hang on to the original link as it may change. Google, generally, continues to spider and index the 302 page in case it changes or redirects elsewhere.</p>
<p>The problem is that, since no page actually exists for the 302 referral, it is not a page but simply a script, search engines, in some cases at least, index the content from the new site and attribute it to the non-existent 302 page. This appears to be what happened with the &#8220;Ralph Rocks&#8221; case above. </p>
<p>In short, what happens is this:</p>
<ol>
<li>Google stumbles across the hijacker&#8217;s 302 redirect. Interprets it as saying &#8220;content over there for now&#8221;. </li>
<li>Since Google sees that it is a temporary redirect, indexes the page as if it exists and then uses the content from the page it points to index the content.</li>
<li>The original page, which is now the target of a 302 redirect, gets less weight with Google since it could, theoretically, change at any time and is supposedly just a temporary home.</li>
<li>The redirect page, even though it doesn&#8217;t exist, can get moved up in the results for all of the keywords present in the original site. </li>
</ol>
<p>When it is all said and done, the spammer has tricked the search engine into thinking that the original page is just a temporary site, a stop gap of sorts, and that its non-existent page is the original work. </p>
<p>It&#8217;s a devious trick and it gives spammers a means to scrape content without ever copying a single word. In the eyes of the search engines, they completely replace the original work.</p>
<p><strong>Taking the Plane to Cuba</strong></p>
<p>Once the spammer has control of the site&#8217;s search engine presence, he or she can do what they want with it.</p>
<p>Where traditional hijacking differs from what is going on with Stylefeeder is that, often times, the spammer will attempt to cloak their real intentions, offering the 302 redirect to the search engines, but sending human visitors to another site altogether. </p>
<p>However, the dangers of this redirect go well beyond mere spam. As Schmidt explains, it can also be used to redirect visitors to adult sites, set up false bank/credit card site or create false storefronts.</p>
<p>It is a very dangerous exploit and it is one that, theoretically, was closed of years ago. However, as cases such as this one and sites such as <a href="http://www.googlejacking.org/">Google Jacking</a> prove, this problem is still very real and very present. </p>
<p>It&#8217;s a scary problem for Webmasters, unlike scraping, which can be blocked, or plagiarism, which can be detected, 302 referral spam can not be blocked effectively and can not be detected until the spammer has already achieved their goal. According to Schmidt, once it has taken place, the damage is done and there is no easy way to claw back out.</p>
<p>That, in turn, makes taking precautions against such attacks very important. Something that is easier said than done.</p>
<p><strong>Precautions</strong></p>
<p>Schmidt goes on to recommend a series of steps that Webmasters can take to guard themselves against this kind of hijacking, they include the following.</p>
<ol>
<li>Redirecting non-www pages
</li>
<li>Use absolute internal linking on your site (full links with domain names)
</li>
<li>Have random, updated content on each page
</li>
<li>Use the &#8220;base&#8221; meta tag
</li>
</ol>
<p>Most of these precautions are simple to do, especially if you control your own server, and have no real impact on the end user. Thus, it makes sense, even if the potential reward is very small, to take the steps.</p>
<p><strong>Some Good News</strong></p>
<p>The good news is that, though it is clear 302 redirect spam is still a problem, it is also definitely on its way out. Most of the problem was dealt with by the search engines years ago and, though some are still able to exploit it, those who do so successfully seem to be few and far between.</p>
<p>In the case of Now Smell This, it is unlikely that the site in question would have gotten away with it if they had attempted to cloak their intentions. Google, by all accounts, has gotten a great deal better about detecting cloaking and that, in turn, has made this kind of spam much more difficult to execute.</p>
<p>Clearly, the heyday for this kind of spam is over. As devastating as it can be for content owners, it is a less of a concern now than it was a year ago and much less than it was three or four.</p>
<p>This is one area where the search engines have truly gotten smarter, just not smart enough to stamp out the problem completely as of yet.</p>
<p><strong>Conclusions</strong></p>
<p>Page hijacking, at least via 302 redirects, is not the problem it used to be. It&#8217;s harder than ever to get away with and it seems the major search engines have all done a decent job keeping it from overrunning their results. </p>
<p>However, the problem is not gone completely. 302 redirects can, and still do, affect search engine results and allow people to use your content against you, even if they never copy a single word.</p>
<p>Though it is not the concern it once was, it is still worthwhile to take a few simple precautions to prevent it from becoming a much larger issue. After all, only one site has to be successful with it to severely impact your own search engine ranking for a keyword and only a few have to be successful before your entire domain suffers.</p>
<p>But the real onus to stop this problem, as Schmidt points out, lies not with Webmasters but search engines. Though we can take precautions to help search engines tell the difference between a real redirect and a spam one, if the search engines don&#8217;t pick up on that, there is nothing that can be done.</p>
<p>Smarter search engines, ones not easily fooled by simple tricks, will be the real solution. Then again, that&#8217;s the same thing people have been saying about spam blogs for years and that problem has not gone away either. </p>
<p><em>Note: I attempted to contact Stylefeeder for this article and did not hear back in time to go to press. If I do hear from them, I will update this page.</em> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/06/14/302-hijacking-an-old-danger-made-new-again/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>WordPress Plugin: Copyright Feed</title>
		<link>http://www.plagiarismtoday.com/2007/05/04/wordpress-plugin-copyright-feed/</link>
		<comments>http://www.plagiarismtoday.com/2007/05/04/wordpress-plugin-copyright-feed/#comments</comments>
		<pubDate>Fri, 04 May 2007 21:07:59 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Prevention]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[Extension]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[Plugin]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Spam-Blogging]]></category>
		<category><![CDATA[Spamming]]></category>
		<category><![CDATA[Wordpress]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/05/04/wordpress-plugin-copyright-feed/</guid>
		<description><![CDATA[A new WordPress plugin by Frank Bueltge (in German) entitled Copyfeed is attempting to revolutionize the way WordPress users protect their feed&#8217;s content. The goal is to not simply help bloggers discover if and where their content is being scraped, but also give them information to help them track down the scraper and, when used...]]></description>
			<content:encoded><![CDATA[<p>A new WordPress plugin by <a href="http://bueltge.de">Frank Bueltge</a> (in German) entitled <a href="http://wordpress.org/extend/plugins/copyfeed/">Copyfeed</a> is attempting to revolutionize the way WordPress users protect their feed&#8217;s content.</p>
<p>The goal is to not simply help bloggers discover if and where their content is being scraped, but also give them information to help them track down the scraper and, when used in conjunction with plugins such as <a href="http://redalt.com/Resources/Plugins/AntiLeech">Antileech</a>, stop the infringement outright.</p>
<p>It is a potentially exciting plugin that could replace a other extensions that are currently available, effectively combining the functionality of at least two <a href="http://www.plagiarismtoday.com/2006/10/09/five-essential-wordpress-content-protection-plugins/">indispensable plugins</a> into one, and holds a lot of potential to help WordPress bloggers stop infringement and prevent spam bloggers from taking their content.</p>
<p><span id="more-484"></span><strong>How it Works</strong></p>
<p>Copyfeed is something of a swiss army knife when it comes to extending your feed. It has several different functions, many of which are already available.</p>
<p>First, the plugin can add a digital fingerprint to the plugin and provides easy links to search for copies of your work. In that regard, it works a great deal like <a href="http://www.maxpower.ca/wordpress-plugin-digital-fingerprint-detecting-content-theft/2006/09/25/">MaxPower&#8217;s Digital Fingerprint Plugin</a> and can serve much of the same purpose. </p>
<p>Second, the plugin adds a customized feed copyright notice, similar to <a href="http://blog.taragana.com/index.php/archive/wordpress-plugin-to-automatically-add-copyright-message-to-your-rss-atom-feeds/">Angsuman’s Feed Copyrighter Plugin</a>. This plugin, however, takes it a step farther making it easier to add your custom copyright information and then and even making it possible to add HTML to the footer.</p>
<p>However, where the plugin goes above and beyond the currently available tools is by adding the ability to include the IP address of the feed reader in the feed itself. This means that, if the feed is scraped, the scraper will also be publishing the IP address of the computer they used to scrape the feed, making it very easy to either block the address, or input it into Antileech for redirection.</p>
<p>Finally, Copyfeed also offers the ability to whitelist certain domains from seeing the above information. This means that it is would be possible to protect your regular feed, but allow <a href="http://www.blogburst.com/">BlogBurst</a> or another desired syndication service to bypass that content without having to point them to a separate feed.</p>
<p>It&#8217;s a very powerful set of features that, when combined with Antileech, can provide almost total protection of the feed from scrapers and spammers. All of this also says nothing about the other attributes that Copyfeed can add, including comments and related posts, that have nothing to do with content theft preventions. </p>
<p><strong>Some Problems</strong></p>
<p>However, this isn&#8217;t to say that the plugin is perfect, there are several issues with it that may stop it, in some cases at least, from being ready for full use. </p>
<p>First off, both the plugin and the site are currently only available in German. Though you can use tools to <a href="http://translate.google.com/translate?u=http%3A%2F%2Fbueltge.de%2Fwp-feed-plugin%2F204%2F&#038;langpair=de%7Cen&#038;hl=en&#038;ie=UTF8">translate the site</a>, the plugin administration panel rests in the admin area of the WordPress install and can&#8217;t be easily viewed by such tools.</p>
<p>Fortunately, Bueltge has promised to include a translated version of the plugin with his next release, along with &#8220;more features&#8221; but it has not been released as of this writing.</p>
<p>Second, it is uncertain how legitimate readers will feel about having their IP address printed on their feed&#8217;s entries. Some, especially those who are unaware of how the Internet works and what an IP address really is, might cite privacy concerns. Others, especially those who use Web-based readers, might be confused as the IP address would not be theirs, but that of the site they read the feeds on.</p>
<p>It would be up to the user of the feed to explain these issues if they chose to use that feature. </p>
<p>Finally, aspects of the plugin, obviously, will not work well with <a href="http://www.feedburner.com">FeedBurner</a>. Though the copyright notice and the digital fingerprint will both work correctly, the IP feature will not. The plugin would just report the IP address of FeedBurner and not the person reusing the feed, making it useless in that case.</p>
<p>However, there are many people that will still likely find the plugin very useful and many more will do so once the plugin has been translated into other languages.</p>
<p><strong>Conclusions</strong></p>
<p>All in all, this plugin is an impressive effort to stop feed scraping and will almost certainly become an essential tool once translations of it are offered. In fact, it is possible with just a few extra features that it could become something of a one-stop shop for battling feed scrapers in WordPress, becoming almost the only plugin needed.</p>
<p>However, until the translated versions are released, all most of us can do is sit and wait in anticipation for the tool to become available.</p>
<p>It seems almost certain that this plugin, for many WordPress users, will be a nice step forward in protecting their feed. Hopefully, once I am able to use it, it will live up to those expectations.</p>
<p>In the meantime, I&#8217;d be very interested in hearing comments from those who are using Copyfeed on their sites. I&#8217;d love to hear how it is working out for you.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/05/04/wordpress-plugin-copyright-feed/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced

Served from: www.plagiarismtoday.com @ 2012-02-13 07:29:48 -->
