<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Plagiarism Todaysearch spam | Plagiarism Today</title>
	<atom:link href="http://www.plagiarismtoday.com/tag/search-spam/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.plagiarismtoday.com</link>
	<description>Content Theft, Plagiarism, Copyright Infringement</description>
	<lastBuildDate>Mon, 13 Feb 2012 06:51:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Spotting Spam Blogs</title>
		<link>http://www.plagiarismtoday.com/2008/07/15/spotting-spam-blogs/</link>
		<comments>http://www.plagiarismtoday.com/2008/07/15/spotting-spam-blogs/#comments</comments>
		<pubDate>Tue, 15 Jul 2008 15:57:02 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Personal Experiences]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[search spam]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[spinning]]></category>
		<category><![CDATA[Splogs]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=1302</guid>
		<description><![CDATA[Spammers are making it harder and harder to separate their creations from those of amateur bloggers. However, by understanding various ways to spot spam blogs and how spammers try to beat those methods, you can better detect junk sites yourself. ]]></description>
			<content:encoded><![CDATA[<p><img class="picleft size-full wp-image-1305" title="splogspot_logo" align="left" src="http://www.plagiarismtoday.com/wp-content/uploads/2008/07/splogspot_logo.gif" alt="SplogSpot.com Logo" width="258" height="133" /></p>
<p>When people find out that their content is being copied without permission, how they seek to handle it is often determined, in part, by whether or not the site is a spam blog.</p>
<p>Where many might be willing to forgive copying by a novice blogger, especially with the promise of a link back, most are not prepared to have their content used so a spammer can trick the search engines and sell questionable items.</p>
<p>This means that, very often, I am forced to make snap judgments about whether a site is a spam blog or not, something that is becoming increasingly difficult as spammers have improved their techniques.</p>
<p>So how does one tell if a blog is a spam blog? The answer is not as simple as it once was but there are still ways one can detect a spammy site.</p>
<p><span id="more-1302"></span></p>
<h4>The Spammer Dilemma</h4>
<p>Spammers, over the years, have gotten better and better at making their blogs look human-edited. Though they still can not make their sites appear to be &#8220;good&#8221; blogs, they, in many cases, can pass off as the efforts of novice bloggers or of non-native English speakers.</p>
<p>This can create quite a problem when approaching a suspected spam blog. Is it a spammer using the default Blogspot template or is it someone new to blogging that doesn&#8217;t know how to change the template? Is the strange word choice the result of <a title="Spinning Spammers" href="http://www.plagiarismtoday.com/2007/11/08/modified-scraping-on-the-rise/">automated spinning</a> or someone learning English? If the spam blog did its job, it can be difficult to say.</p>
<p>However, most would agree that being heavy-handed with humans who copy, especially those who make some attempt to provide attribution, is counter-productive. Especially when you consider that the person struggling with English may either grow into an important blogger or, worse yet, already be a major figure in their part of the world, it becomes clear why telling humans from machines is important.</p>
<p>But how to do it? There are several different ways, but unfortunately none of them seem to work 100% of the time.So it is important to take all of the methods below into account, look at how spammers beat them, and develop an informed opinion.</p>
<h4>PageRank Check</h4>
<p><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/07/pagerank.png" alt="" title="pagerank" width="118" height="97" align="right" class="picright alignright size-full wp-image-1310" />One of my sneakier tricks was to check the site&#8217;s PageRank and see if Google had given it either a n/a or a 0. Either would indicate that the site was either very new or had been deemed spam by Google. Either way, it certainly warranted suspicion.</p>
<p><strong>How Spammers Beat It:</strong> Tricking Google. This method has become less effective as Google seems to be assigning PageRank to more and more obvious spam blogs. That is a subject for another article.</p>
<p><strong>Turning the Tide:</strong> PageRank is still a decent indicator of spamminess, but it is no longer as reliable as it was. It is best to ignore PageRank if you have other reasons to be suspicious of a blog.</p>
<h4>&#8220;About&#8221; Page</h4>
<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/07/aboutpage.png" alt="" title="aboutpage" width="144" height="163" align="left" class="picleft alignleft size-full wp-image-1312" />Since spammers that use WordPress installs typically spend as little time as possible setting up their blogs, they routinely leave the &#8220;About&#8221; page, which is created as part of the install, with its default text. Very few human-generated sites have this problem.</p>
<p><strong>How Spammers Beat It:</strong> Spammers have started either deleting or filling in the about page. However, those that fill in the page often use it as an opportunity to keyword stuff, often further tipping their hand as a spam blog.</p>
<p><strong>Turning the Tide:</strong> If an about page does not have actual information about the site or the owner, it is very likely spam. Some spammers are starting to include fake information, but few seem to be able to resist the opportunity to keyword stuff and link.</p>
<h4>Posting Rate</h4>
<p>The goal of a spam blog is to get as much junk content into it as possible, as such, spammers routinely have extremely high posting frequency, often well over 100 posts per day. It would be physically impossible for a human to post so much content without the aid of a machine, creating a dead giveaway that the site is spam.</p>
<p><strong>How Spammers Beat It:</strong> Some spammers have begun to show restraint, only having their blogs update a few times per day and at irregular intervals, to more closely mimic a human blogger.</p>
<p><strong>Turning the Tide:</strong> The content is more telling than the frequency, unless the posting frequency is outrageous. Consider an extremely high posting volume to be a dead spam giveaway but don&#8217;t write off a site because it has a reasonable rate.</p>
<h4>Formulaic Posting</h4>
<p>We&#8217;ve all seen the spam blogs that start out with something like &#8220;I saw an interesting post today about&#8230;&#8221; and then proceeds to inject a few keywords and quote from the scraped article. By themselves, these posts may appear semi-legitimate, especially with trackbacks, but are clearly spam when you look at them in group.</p>
<p><strong>How Spammers Beat It:</strong> Spammers have started to use multiple post templates in the same blog. However, the limited set means that, if this method is chosen, it is still easily detected over the course of about ten posts.</p>
<p><strong>Turning the Tide:</strong> Check and see if the posts have the same pattern, are roughly the same length or all contain quoted material. These are all signs of a spam blog.</p>
<h4>Ugly Templates</h4>
<p><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/07/uglylayout.png" alt="" title="uglylayout" width="162" height="123" align="right" class="picright alignright size-full wp-image-1316" />Sometimes the first sign a blog is spam is the template that it is in. If the template is the default WordPress theme or a stock BlogSpot theme without modifications, it&#8217;s a likely tip off of spam content.</p>
<p><strong>How Spammers Beat It:</strong> Spammers have been getting better about mixing up their themes. Most spam software applications come with a variety of themes that are rotated and, given the ease with which most blogs can be skinned, spam blogs can be amazingly varied.</p>
<p><strong>Turning the Tide:</strong> Fortunately, spammer themes still don&#8217;t have any elements of hand-crafting. There are very rarely custom images (or contain only very crude ones), the CSS often looks off, the color scheme is often jarring and the elements many times do not fit together correctly. If you see a glaring mistake that would be caught by anyone looking at the site, it is likely spam.</p>
<h4>Domain Names</h4>
<p>Spam blogs are typically restricted to three types of domains, 1) .us, .info and other strange extensions 2) domains stuffed with keywords (and often hyphens) 3) Free blog hosts (primarily Blogspot still).</p>
<p><strong>How Spammers Beat It:</strong> Spammers are participating in the domain aftermarket, snatching up expired domains that have had sites on them previously. This helps them carry both the PageRank of the old site, in some cases, and obtain a more &#8220;honest&#8221; name. Spammers are also spreading to other free blog services, including little-known ones, as well as social networking sites.</p>
<p><strong>Turning the Tide:</strong> If you are unsure about a domain, use <a title="Domain Tools" href="http://www.domaintools.com">Domain Tools</a> to investigate it. Look specifically for false whois information or other irregularities. Still, most spam blogs are hosted on spam domains. Better ones are too expensive for spammers to buy in bulk and are more profitable at auction than as spam tools.</p>
<h4>Ad Excess/Spam Blogroll</h4>
<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/07/longblogroll.png" alt="" title="longblogroll" width="194" height="189" align="left" class="picleft alignleft size-full wp-image-1314" />Many spam blogs earn their money by framing the content in a slew of ads, generally from one of the public advertising networks. If not, then they often times use the blogroll to put out obviously spammy links in hopes of building PageRank and search engine position for those domains.</p>
<p><strong>How Spammers Beat It:</strong> The formula is simple, fewer ads, fewer links, more spam blogs. Spammers have begun to show restraint with both their ads and their outbound links but are creating larger and larger spam farms to compensate. Spammers are also turning to alternate sources of revenue, such as Amazon afiiliate IDs, to better hide their activities. Others will mix &#8220;good&#8221; links with &#8220;spam&#8221; ones in their blogroll to further hide the nature of the site.</p>
<p><strong>Turning the Tide:</strong> One spam link is too many. Hover over the URLs in the Blogroll and check for any that are suspicious or out of place. When checking for ads, look not so much as quantity, but for the appearance that they were simply &#8220;stuck in&#8221;. Spammers don&#8217;t have time to integrate ads with their site usually.</p>
<h4>Conclusions</h4>
<p>When looking through these elements, any one of these would make me suspicious of a site&#8217;s origin, save perhaps if the site were hosted on a free blog host. Two, in turn, would make it a likely spam blog and three or above would make it a virtual lock.</p>
<p>The bottom line is that, while spammers are not making it any easier to spot their handiwork, it can still be detected by a careful eye (or a not-so-careful eye in many cases).</p>
<p>Though the spammer&#8217;s survival depends on staying under the radar and fooling humans and search engines alike, the nature of creating tens of thousands of junk blogs means that sacrifices have to be made and the results will have limitations.</p>
<p>By exploiting those weaknesses, we can continue to detect and stop spam and separate the spammers from those who are just getting started.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/07/15/spotting-spam-blogs/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Workfriendly Goes Offline</title>
		<link>http://www.plagiarismtoday.com/2008/07/10/workfriendly-goes-offline/</link>
		<comments>http://www.plagiarismtoday.com/2008/07/10/workfriendly-goes-offline/#comments</comments>
		<pubDate>Thu, 10 Jul 2008 14:36:24 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[duplicate-content]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[plagiarim]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[search spam]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[workfriendly]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=1295</guid>
		<description><![CDATA[Safe-surfing site and "accidental scraper" Workfriendly is now offline after more than two years of pushing duplicate content into Google. ]]></description>
			<content:encoded><![CDATA[<p><IMG SRC="http://www.plagiarismtoday.com/images/workfriendlylogo1-20080710-093524.png" alt="Workfriendly Logo" align="left" class="picleft">Workfriendly, a site previously reported on Plagiarism Today <a href="http://www.plagiarismtoday.com/2007/11/09/workfriendly/" title="Workfriendly an Accidental Scraper">back in November 2007</a> and again in <a href="http://www.plagiarismtoday.com/2008/04/08/workfriendly-yet-another-issue/" title="Another Workfriendly Issue">April of this year</a>, stopped functioning sometime within the past few days, bringing an end to the problems it created for many Webmasters.</p>
<p>The site currently is just a &#8220;parked&#8221; domain page running ads for the domain&#8217;s registrar, GoDaddy. According to the <a href="http://whois.domaintools.com/workfriendly.net" title="Workfriendly Whois">whois information for the site</a>, the domain was &#8220;updated&#8221; on the eighth, indicating that it possibly expired and was transferred to another owner. </p>
<p>Workfriendly attempted to disguise Web surfing as a Microsoft Word document by formatting Web pages to appear as text in a Word file while bordering the site content with a fake border designed to look like the application. This was supposed to make it &#8220;safer&#8221; to surf at work as it would raise less suspicion should anyone see your monitor.</p>
<p>The site created problems, however, when it allowed search engines to index its modified pages, injecting many thousands of of pages worth of duplicate content into Google. It also created headaches by not obeying certain meta tags, causing links to break on some sites and for Google to report those errors as broken links on the original domain.</p>
<p>It is unclear at this time if the outage is temporary or permanent, however, the site has been down for at least two days, making a temporary outage increasingly unlikely. </p>
<p><strong>Hat tip:</strong> Special thanks to <a href="http://www.sciencebase.com/">David Bradley of Sciencebase</a> (stupid typos, thanks for the catch!) for letting me know that Workfriendly is not working</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/07/10/workfriendly-goes-offline/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>The Popularity of Plagiarism</title>
		<link>http://www.plagiarismtoday.com/2008/07/02/the-popularity-of-plagiarism/</link>
		<comments>http://www.plagiarismtoday.com/2008/07/02/the-popularity-of-plagiarism/#comments</comments>
		<pubDate>Wed, 02 Jul 2008 15:44:15 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Personal Experiences]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[google trends]]></category>
		<category><![CDATA[MPAA]]></category>
		<category><![CDATA[plagiarim]]></category>
		<category><![CDATA[RIAA]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[search spam]]></category>
		<category><![CDATA[Search-Engines]]></category>
		<category><![CDATA[Splogging]]></category>
		<category><![CDATA[Splogs]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=1290</guid>
		<description><![CDATA[Inspired by recent posts, I decided to take a look at Google Trends and see how search terms relative to content theft were doing. ]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.plagiarismtoday.comwp-content/uploads/2008/07/skitched-20080702-110241.png" alt="Google Trends Logo" align="left" class="picleft"/>A pair of recent articles, <a href="http://www.louisgray.com/live/2008/06/on-web-if-youre-not-growing-youre-dying.html" title="If You're Not Growing You're Dying">one by Louis Gray</a> and <a href="http://codingexperiments.com/archives/149" title="">another by possible248</a> (who co-authors the blog along with, among others, Voyagerfan5761, are regular here) showcased public interest in relavent search terms, namely company names and Linux distributions respectively, using <a href="http://trends.google.com/trends?hl=en" title="Google Trends">Google Trends</a>.</p>
<p>This, in turn, inspired me to do my own keyword analysis to gauge if and how public interest in topics relevant to this site have changed over the years. </p>
<p>What I found was surprising and seemed to run counter to what I was seeing with my own traffic but was interesting nonetheless.<br />
<span id="more-1290"></span></p>
<h4>Plagiarism</h4>
<p><img src="http://www.plagiarismtoday.comwp-content/uploads/2008/07/skitched-20080702-105214.png" alt="Google Trends for Plagiarism"></p>
<p>Perhaps the most obvious keyword and definitely the most common one that leads visitors to this site, this keyword has <a href="http://trends.google.com/trends?q=plagiarism&#038;ctab=0&#038;hl=en&#038;geo=all&#038;date=all&#038;sort=0" title="Google Trends Plagiarism">seen surprisingly little change over the past few years</a>. </p>
<p>Over all, the graph for it is flat with a few &#8220;ticks&#8221; upward when news stories, such as the Obama controversy and the Kaavya Viswanathan scandal, broke. There are also season downward ticks at the end of every year, likely due to the holidays.</p>
<p>In general, it appears that the overall interest in plagiarism, both academically and artistically, has remained consistent and unchanged.</p>
<h4>Content Theft</h4>
<p><img src="http://www.plagiarismtoday.comwp-content/uploads/2008/07/content-theft-google-trends-20080702-103956.png" alt="Google Trends for Content Theft"></p>
<p>Probably the most unusual graph, <a href="http://trends.google.com/trends?q=content+theft&#038;ctab=0&#038;hl=en&#038;geo=all&#038;date=all&#038;sort=0" title="Content Theft on Google Trends">content theft as a search term</a> spiked in mid-2005, around the time this site was founded, and then leveled off, only to become a regular search term again in recent months.</p>
<p>It is unclear to me what has caused these specific spikes but the latest one seems to be holding and showing some sustainable interest in the topic. Something that could indicate greater public interest in the issue and in the term itself.</p>
<h4>Copyright</h4>
<p><img src="http://www.plagiarismtoday.comwp-content/uploads/2008/07/skitched-20080702-105332.png" alt="Google Trends for Copyright"></p>
<p>Copyright, on the other hand, <a href="http://trends.google.com/trends?q=Copyright&#038;ctab=0&#038;hl=en&#038;geo=all&#038;date=all&#038;sort=0" title="Google Trends Copyright">has seen a marked decrease over the past few years</a>, at least as a search term.</p>
<p>While this seems counter-intuitive, considering that stories about copyright, especially as it pertains to the RIAA/MPAA, seem to dominate social news sites, please are clearly not search for copyright information as much as they used to.</p>
<p>This is reflected even more strongly in the <a href="http://trends.google.com/trends?q=RIAA&#038;ctab=0&#038;hl=en&#038;geo=all&#038;date=all&#038;sort=0" title="Google Trends RIAA">related graph for the RIAA</a> and <a href="http://trends.google.com/trends?q=DMCA&#038;ctab=0&#038;hl=en&#038;geo=all&#038;date=all&#038;sort=0">the DMCA</a>, where the downward slope is even more pronounced and, in the case of the RIAA, seems to almost disappear completely.</p>
<p>Though it doesn&#8217;t appear that people have lost interest in copyright issues, it is clear that they are not searching for them as much as they once were.</p>
<h4>Duplicate Content</h4>
<p><img src="http://www.plagiarismtoday.comwp-content/uploads/2008/07/skitched-20080702-105447.png" alt="Google Trends for Duplicate Content"></p>
<p>One of the greater concerns people have about plagiarism is the issue of duplicate content. As we can see on the graph above, the term <a href="http://trends.google.com/trends?q=duplicate+content&#038;ctab=0&#038;hl=en&#038;geo=all&#038;date=all" title="Google Trends Duplicate Content">rocketed onto the chart in early 2007</a>, stabilized and seems to be slowly marching upward. </p>
<p>Duplicate content, of course, covers more than just plagiarism and scraping, but a wide variety of SEO concerns. However, it is clear that this is a topic being talked about more and more. It is unclear in what capacity this term is being searched for. </p>
<h4>Plagiarism Detection Tools</h4>
<p><img src="http://www.plagiarismtoday.comwp-content/uploads/2008/07/skitched-20080702-100727.png" alt="Google Trends for Duplicate Content"></p>
<p>Looking at the chart for <a href="http://www.copyscape.com">Copyscape</a> (shown above) shows a steady increase in the number of searches over the past year and a half. This seems to mesh with my own experience, which has shown a great increase in content protection over the past 18 months. </p>
<p>Other Plagiarism detection tools, such as <a href="http://www.bitscan.com">Bitscan</a> and <a href="http://www.attributor.com">Attributor</a>, did not have enough information for Google Trends to draw any conclusions. Academic plagiarism detection tools, such as Turnitin, <a href="http://trends.google.com/trends?q=Turnitin&#038;ctab=0&#038;hl=en&#038;geo=all&#038;date=all&#038;sort=0" title="Turnitin on Google Trends">have shown a steady increase with seasonal dips as school lets out</a>. </p>
<h4>Long Tail Keywords</h4>
<p>Unfortunately, a lot of the keywords most specific to this site such as &#8220;spam blogs&#8221;, &#8220;splogs&#8221;, &#8220;RSS scraping&#8221;, etc. did not have enough data to produce results. Many of these terms are fairly new, created since I started Plagiarism Today, and are not widely used. </p>
<p>It will be interesting to see in a year or two if these keywords start to register then.</p>
<h4>Caveats</h4>
<p>In doing this &#8220;study&#8221; I realize that Google Trends is both limited and a largely invalid source of data. Not only is the data proprietary, meaning it can not be vetted, but the information is relative and contains little hard data. </p>
<p>Also, many of the keywords looked at are not keywords that are searched for by typical searchers and instead would only be searched for by bloggers. Others, however, were likely searched by both. This means that we may not have an accurate picture of how just content creators feel about these issues.</p>
<p>The goal of this check was just to get a quick idea of what was going on and what the potential attitudes were.</p>
<h4>Conclusions</h4>
<p>When I personally look at these charts, I draw three conclusions.</p>
<p>First, I see that there is a sharp decrease in the interest of searchers in the legal aspects of copyright. This could be due to greater understanding about copyright, and thus less need to search about it, or just that that users have just moved on from the early copyright controversies of the late nineties.</p>
<p>Second, there is a clear, if slow, increase in interest in tracking one&#8217;s own content and the non-legal penalties that come from infringing or being infringed. This could be a sign that creators are not thinking about these issues in the light of a legal paradigm, but rather, in a more practical framework.</p>
<p>Finally, it is clear that the interest in plagiarism, both academically and artistically, remains fairly steady and that it remains an issue of interest even after the scandals fade from the headlines.</p>
<p>Personally, this site has seen an explosive growth over the past year, both doubling in traffic and enabling me to leave my day job to work full-time as a consultant. Clearly, things are changing in this area. </p>
<p>I look forward to following these changes closely over the coming years.</p>
<p><strong>Note:</strong> All of the graphs in this post are <a href="http://www.google.com/intl/en/trends/about.html#18" title="Google Trends Terms of Use">used with permission from Google</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/07/02/the-popularity-of-plagiarism/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Finding the Age of a Page</title>
		<link>http://www.plagiarismtoday.com/2008/06/06/finding-the-age-of-a-page/</link>
		<comments>http://www.plagiarismtoday.com/2008/06/06/finding-the-age-of-a-page/#comments</comments>
		<pubDate>Fri, 06 Jun 2008 15:52:16 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[google blog search]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[search spam]]></category>
		<category><![CDATA[Search-Engines]]></category>
		<category><![CDATA[seo]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[Splogs]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=1254</guid>
		<description><![CDATA[If you need a quick and easy way to get an idea of when a post went life, there is a Firefox plugin that uses google to put that information just a click away.]]></description>
			<content:encoded><![CDATA[<p><IMG SRC="http://www.plagiarismtoday.com/images/linkdiagnosis-logo-20080606-104242.png" alt="Link Diagnosis Logo" align="left" class="picleft">One of the more difficult challenges on the Web is determining when a page was created. We simply can not trust the date and time stamps provided with the content we read as both good guys and bad guys alike <a href="http://www.plagiarismtoday.com/2008/05/27/spam-bloggers-who-backdate/" title="Spam Bloggers who Backdate">change the date of their posts as necessary</a>.</p>
<p>Search engines, however, can provide a much better set of statistics than a site&#8217;s own timestamps. The only issue is that gleaning the needed information can be difficult. Fortunately, a relatively new Firefox plugin entitled <a href="http://www.linkdiagnosis.com" title="Link Diagnosis">Link Diagnosis</a> helps with that by taking the dirty work out of determining when a page was indexed by Google.</p>
<p>The tool, while not perfect, can be a valuable asset when trying to determine approximately when a page appeared on the Web.<br />
<span id="more-1254"></span></p>
<h4>How it Works</h4>
<p><IMG SRC="http://www.plagiarismtoday.com/images/get-page-age-20080606-104402.png" alt="Get Page Age Screenshot"align="right" class="picright">Link Diagnosis is actually a robust plugin designed to analyze incoming links to a URL for SEO purposes. However, as one of its &#8220;hidden features&#8221; it is able to deteremine, approximately, <a href="http://blog.linkdiagnosis.com/?p=19" title="http://blog.linkdiagnosis.com/?p=19">the day the URL appeared in Google</a>.</p>
<p>It works simply by having the user right click the page they want to check, select the &#8220;Get Page Age&#8221; option and, after a few seconds they are greeted with a JavaScript popup containing the date the script detected the site appeared.</p>
<p>It works by using <a href="http://www.googletutor.com/2006/08/22/more-google-hacking-using-the-inurl-operator/" title="Google INURL">Google&#8217;s INURL command</a> which, when used in conjunction with a date filter, causes Google to display a date by each resulting URL. What the plugin does is take the URL you wish to check, create the search query and then automatically extract the applicable date, thus turning a multi-step process into a one-click solutions.</p>
<p>For anyone seeking to find out the date of a site, this could prove to be both a powerful tool and a good time saver as well.</p>
<h4>Why to Use It</h4>
<p>There are many reasons why you might want to check out the age of a particular page. </p>
<p>For one, you can use it to check if a spam blog or a plagiarist was indexed by Google before or after your original post (provided it was indexed at all). This can help determine what action you should take against the site. </p>
<p>However, many will also find its non-repudiation services to be very useful. If there ever is a dispute about who posted an article or an image first, this tool can help resolve it by providing an independent view on which went up first.</p>
<p>Though certainly not as accurate as <a href="http://www.numly.com">Numly</a> or <a href="http://www.myfreecopyright.com">MyFreeCopyright</a>, using Google is far more accurate than looking at the <a href="http://www.archive.org">Web Archive</a>, especially considering that the latter can take over six months to display any information about a URL.</p>
<p>Still, Link Diagnosis is still far from perfect in this area. there are many issues one will have if one tries to rely upon this for non-repudiation.</p>
<h4>Limitations</h4>
<p><IMG SRC="http://www.plagiarismtoday.com/images/page-age-capture-20080606-104544.png" alt="Get Page Age Error" align="left" class="picleft">Before you begin to make heavy use of this service bear in mind the following caveats:</p>
<p><OL><LI><strong>Google&#8217;s Limitations:</strong> The biggest issue of using the INURL method is that Google is not always index a site or a page immediately after it goes up. There are often delays. Also, the service can only work with pages already in the Google database, anything that has been blacklisted, either by the creator or by Google, will return no results.</LI><br />
<LI><strong>URLs and Not Content:</strong> The function will tell you when the URL appeared in Google, not the content on the page. For permalinks that may be acceptable but dynamic pages, such as the front page of Plagiarism Today, it can create a problem.</LI><br />
<LI><strong>Different Owners:</strong> Also, the system detects when a URL was first indexed by Google, not who owned it at the time. If a site changes ownership, even if it is taken out of Google during the transition, the date shown for the home page will be long to the original owner. </LI></OL></p>
<p>In short, the tools is subject to the exact same gaming and manipulation that Google and the other search engines are. As such, it can provide some quick and dirty information, especially on permalinks, but should never be taken as the ultimate gospel on the age of a page.</p>
<p>Link Diagnosis is no substitute for a true non-repudiation service and it does not claim to be.</p>
<h4>Conclusions</h4>
<p>Personally, I find the other features of Link Diagnosis much more compelling than its &#8220;page age&#8221; feature. Though it is great for a quick analysis, especially of a spam blog permalink, it may not always tell the complete truth or have the information you are seeking.</p>
<p>It is a great analysis tool but it should not be assumed to be the plain truth. There are plenty of ways that it could be wrong.</p>
<p>So, as with every tool, be sure to use it in conjunction with common sense and logic. Have it available, use it if needed, but don&#8217;t use it as a replacement for your own judgment.</p>
<p>No tool is that powerful.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/06/06/finding-the-age-of-a-page/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Rise of Made-For-Amazon Spam</title>
		<link>http://www.plagiarismtoday.com/2008/06/04/the-rise-of-made-for-amazon-spam/</link>
		<comments>http://www.plagiarismtoday.com/2008/06/04/the-rise-of-made-for-amazon-spam/#comments</comments>
		<pubDate>Wed, 04 Jun 2008 14:52:42 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[affiliate]]></category>
		<category><![CDATA[affiliates]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[referrals]]></category>
		<category><![CDATA[search spam]]></category>
		<category><![CDATA[seo]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[Splogging]]></category>
		<category><![CDATA[Splogs]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=1228</guid>
		<description><![CDATA[As spam techniques evolve, it is inevitable that they begin to turn to newer and more reliable services to publish and profit from their junk content. In just such a push, many spammers are turning to Amazon as a means to make a quick, reliable dollar. ]]></description>
			<content:encoded><![CDATA[<p><IMG SRC="http://www.plagiarismtoday.com/images/Amazon.com__Help%C2%A0%3E%C2%A0Privacy___Security%C2%A0%3E%C2%A0Conditions_of_Use-20080603-231004.png" alt="Amazon Logo" align="left" class="picleft">Spammers are always looking for new ways to profit from their spam blogs and other junk content. To date, most have favored Adsense and other pay-per-click (PPC) schemes due to their ease of set up and high profitability, while others have used spam blogs as a stepping stone to help improve the search engine ranking of other sites.</p>
<p>However, a growing number of spammers have started taking a different approach to the problem. Instead of bathing a site with Adsense ads or spam links, they&#8217;ve begun using Amazon&#8217;s affiliate program to make money from their spam blogs.</p>
<p>Though not a traditional pay-per-click system, Amazon&#8217;s affiliate service makes it very easy for spammers to profit. By inserting a few links, they are able to get referral fees for items sold, in many cases days after the visitor was at the spam blog.</p>
<p>Given the current trend toward targeting technology and high-ticket prices, that amount could equal hundreds of dollars per sale and there is precious little that users can do to prevent spammers from earning the cash.<br />
<span id="more-1228"></span></p>
<h4>The Benefits of Amazon</h4>
<p>For spammers, especially those targeting search terms for big ticket items, there are many reasons why Amazon would be a good &#8220;partner&#8221; for their sites.</p>
<p><OL><LI><strong>Higher Rates:</strong> Though there is no promise that a click is likely or that such a click will result in a sale, one sale is likely worth many dozens of clicks in terms of dollar value. With Adsense and other PPC rates in constant fluctuation, Amazon&#8217;s referral service offers a very stable revenue stream.</LI><br />
<LI><strong>Less Spammy:</strong> Where a slathering of Adsense ads may tip off even the most unaware visitor that something is wrong with the blog, nothing about a few Amazon links appears spammy to either humans or search engines. </LI><br />
<LI><strong>More Trust:</strong> Amazon is a major brand name on the Web and well-trusted. People are much less likely to believe that Amazon would partner with purveyors of junk even though the major PPC systems are backed by companies like Google, Yahoo! and Microsoft.</LI><br />
<LI><strong>Stickier:</strong> If you exit a spam blog by clicking an Amazon link you may think you are not giving them any money so long as you don&#8217;t buy anything. But it is possible, in some situations, that they could get the referral fee for items bought days or weeks later. There is no simple way to know if a purchase resulted in a referral fee being paid.</LI><br />
<LI><strong>More Difficult Removal:</strong> Where the <a href="http://www.google.com/adsense_dmca.html" title="Adsense DMCA">process for reporting spam bloggers</a> to Adsense is well-known, the process for reporting to Amazon is less clear. </LI></OL></p>
<p>Given the nature of spammers to constantly seek out new methods and techniques, it was only a matter of time before began to reach out to Amazon in a more meaningful way. The benefits are just too great to ignore.</p>
<h4>Challenges to Webmasters</h4>
<p><IMG SRC="http://www.plagiarismtoday.com/images/Element_Properties-20080604-084807.png" alt="Amazon Spam"align="right" class="picright">When many content creators discover their work being misused by a spam blog, they almost immediately seek to attack the revenue side of the site. The theory is that is does more damage to the spammer than simply getting the content pulled down.</p>
<p>However, working with Amazon can be tricky. For one, it is not always obvious that a spam blog is using Amazon links as they are often disguised as other kinds of links. </p>
<p>But even if one does spot the Amazon links, it can be very difficult to report the matter to Amazon. Though Google&#8217;s Adsense DMCA policy is <a href="http://www.plagiarismtoday.com/2006/08/31/adsense-and-the-dmca/" title="Adsense and the DMCA">the subject of great controversy</a>, it is at least an established protocol. Amazon has no such system in place. </p>
<p><a href="http://www.amazon.com/gp/help/customer/display.html/002-4945302-5169629?ie=UTF8&#038;nodeId=508088#copyright" title="Amazon DMCA">Amazon does have a DMCA process</a>, but it is unclear how it would apply in this case. Since Amazon isn&#8217;t actually hosting any material on the infringing site, as is the case with Adsense, there is no clear role for them to play. Furthermore, despite heavy searching of the Amazon site, I found no link for reporting an infringing affiliate.</p>
<p>Amazon, from an abuse standpoint, seems to be caught completely off-guard by this problem and has no real technique for resolving these issues.</p>
<p>This gives spammers at least a temporary edge if they use Amazon to turn their profits.</p>
<p>In the meantime though, the best that Webmasters can do is report infringing sites via the DMCA process and make it clear that they are reporting an affiliate, not a direct infringement of their content.</p>
<p>Hopefully, the complaint will make it to whoever is needed in order to act upon it. </p>
<h4>Conclusions</h4>
<p>Earlier this week, I wrote on the Blog Herald about <a href="http://www.blogherald.com/2008/06/02/assembling-the-spam-puzzle/" title="Assembing the Spam Puzzle">assembling the spam puzzle</a> and the need for cooperation to really make any progress on spam. </p>
<p>I was speaking exactly of this type of cooperation.</p>
<p>Amazon is a potential weak spot in the fight against spam. The service already has <a href="http://www.askdavetaylor.com/how_do_i_avoid_affiliate_link_hijacking.html" title="Amazon Affiliate Hijacking">known weaknesses against affiliate hijacking</a> and that has helped give rise to the spyware and malware problems we face today, weaknesses in dealing with junk content could be fueling the spam blogs we see tomorrow.</p>
<p>When a company gets as large and as powerful as Amazon or Google, it has the potential to do either incredible good or incredible harm. But even if a company tries not to be evil, failing to think like an evil doer means that their systems can be exploited.</p>
<p>Currently, we&#8217;re fortunate in that the problem seems to be limited mostly to technology blogs, especially those in gaming and computers, but it is very likely to spread. </p>
<p>This is especially true as Adsense either starts to push back more or rates begin to drop. </p>
<p>The question is, how will Amazon respond and will it be enough?</p>
<p><em><strong>Note:</strong> I contacted Amazon prior to this article and did not hear back from them before publication. I will update this article should they respond. </em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/06/04/the-rise-of-made-for-amazon-spam/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Putting Your Feed on Probation</title>
		<link>http://www.plagiarismtoday.com/2008/04/29/putting-your-feed-on-probation/</link>
		<comments>http://www.plagiarismtoday.com/2008/04/29/putting-your-feed-on-probation/#comments</comments>
		<pubDate>Tue, 29 Apr 2008 16:10:17 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Prevention]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[RSS scraping]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[search spam]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[Splogging]]></category>
		<category><![CDATA[Splogs]]></category>
		<category><![CDATA[web spam]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=993</guid>
		<description><![CDATA[Darren Rowse of Problogger fame suggested a that new bloggers should start their sites out with a partial feed and switch to a full one once they have enough "trust" from Google. But could the system work?]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  class="picleft alignleft size-medium wp-image-995" title="problogger-logo" src="http://www.plagiarismtoday.com/wp-content/uploads/2008/04/problogger-logo.jpg" alt="" width="250" height="43" />In a &#8220;speed blogging&#8221; post from earlier this week, Darren Rowse of ProBlogger <a title="Problogger question" href="http://www.problogger.net/archives/2008/04/26/full-or-partial-rss-feeds/">answered a question from a reader</a> wondering if their blog should use a partial or a full RSS feed. The reader pointed out that, on the ProBlogger site, the feed was set to full but on <a title="Digital Photography School" href="http://digital-photography-school.com/blog/">Digital Photography School</a>, another site he operates, the feed was partial.</p>
<p>Rowse&#8217;s answer, however, was not a typical one to the question and, instead, offered a completely different piece of advice. According to Rowse, he prefers to keep feeds partial until the site has been around for some time and has obtained &#8220;authority in the eyes of Google&#8221;, creating what is effectively a probation period for the feed, only letting it loose when it is ready to be released.</p>
<p>This raises an interesting question in the full vs. partial feed debate, are the two mutually exclusive and can one site &#8220;grow&#8221; into another?</p>
<p>It is a topic worth debating as more and more bloggers run into this issue and try to determine what is best for their content.</p>
<h4><span id="more-993"></span>The Basic Idea</h4>
<p>The problem with scraping is that many sites are have their content republished heavily and never see any ill effects from it. Large, well-known blogs such as <a href="http://www.techcrunch.com">TechCrunch</a> and <a href="http://www.mashable.com">Mashable</a>, are routinely republished wholesale and still maintain both their search engine ranking and their brand.</p>
<p>The reason is that these are well-known and trusted sites. Google and the other search engines give them priority over new sites, such as the ones spammers create, and trust that the content on them is original.</p>
<p>Typically, scraping most strongly affects newer and/or smaller sites. If the search engines and/or readers don&#8217;t know who you are, they could very well give the top slot to a spam blogger. However, it takes time to build up a reputation and earn a trusted position with Google.</p>
<p>This means that, if you start out with a full feed from day one, you are vulnerable to scrapers hijacking your position on the search engines. With their cross linking and frequent use of expired domains, it is possible a spammer could actually have more trust than a new legitimate blog, making their scraping especially damaging.</p>
<p>With that in mind, Rowse&#8217;s solution could actually be an answer to the problem. There are many reasons one would expect this system to work and why bloggers, especially those starting out, should at least consider it.</p>
<h4>Why It Could Work</h4>
<p>The logic behind using a feed probation period, such as what Rowse proposes, makes a lot of sense. Consider the following points:</p>
<ol>
<li><strong>More Original Content:</strong> Using truncated feeds in the early stages of a site guarantees more original content on your site and that can help you build search engine trust more quickly, protecting you when you make the transition to full feeds in a drive to obtain more subscribers.</li>
<li><strong>Easier Transition:</strong> It is much easier on your readers to go from partial to full feeds rather than the other way around. If you start out with full feeds and change your mind, it could upset many readers.</li>
<li><strong>Better for Servers:</strong> New sites typically start out with very small hosts. As such, the feed can be a heavy burden until the site warrants a move to a larger &#8220;house&#8221;. A partial feed can mitigate against that.</li>
<li><strong>Focus on Linkbuilding:</strong> Most new blogs, in the early days, are focused more on linkbuilding than subscribers. As such, a truncated feed is not a major obstacle in those cases.</li>
<li><strong>Promotional Event:</strong> Many blogs turn the activation of full feeds into a promotional event and use it as a tool to lure new subscribers. If you start out with full feeds, you miss this opportunity.</li>
</ol>
<p>But while the idea seems very sound, there are still problems with it. Putting your feed on probation might be a wonderful idea, but only if a few problems can first be overcome.</p>
<h4>Issues with the System</h4>
<p>Though a feed probation period seems to work well with what we know about search engines and scraping, it also creates more than a few problems. Those include the following:</p>
<ol>
<li><strong>Weakened Readership:</strong> Many people refuse to subscribe to partial feeds and, if they see that your feed is truncated, they will not care that it is just a probation period. They will simply skip over it and likely not return.</li>
<li><strong>Missed Opportunities for Pinging:</strong> One of the great things about feeds is that you can ping the various search engines to let them know something is new. If you have a partial feed, this can lessen the impact of such pings and actually work to hurt your search engine trust, possibly negating much of the benefit you would hope to gain..</li>
<li><strong>Slower Growth:</strong> The two items above could make it more difficult to obtain inbound links since mostly subscribers link to your site, and that, when combined with the lack of pinging, could actually cause it to take longer to build search engine trust and growth in readers.</li>
</ol>
<p>However, the biggest problem is that many, if not most, sites will never reach a point where they can safely extend their feed.</p>
<p>For example, after three years of operation, Plagiarism Today has well over 1000 subscribers and a PageRank of 5. A solid site in both categories, but not a huge one either. While it might seem, when combined with the site&#8217;s longevity, that it would be relatively safe from scrapers, I&#8217;ve seen many scrapings sites with a PageRank between a 3 and a 5, high enough to cause concern, and my other sites, including one run over ten years, still has an issue with copied results ranking higher than the original.</p>
<p>While there may come a time in which it is more safe for a site to change to a full feed, for most there is never a time where it is perfectly safe. One is still going to have to deal with scrapers, especially those that rank well in search engines, and work to protect their content from unattributed and spammy use.</p>
<p>As such, the feed probation period may be something of a waste since you have to take the same action regardless of whether you took advantage of it or not.</p>
<h4>Conclusions</h4>
<p>The idea of a feed probation period has some merit. However, for most blogs, it will not provide much benefit.</p>
<p>No matter what you do initially, when you have a full feed, scrapers are going to grab your content and, unless you are a mega-blogger, some of them are going to have the capacity to hurt you. You need to be aware of those and be ready to take action against them.</p>
<p>If you decide to use this method, bear in mind that, in exchange for the greataer protection during the fragile early months of the site, you will likely experience slower growth. Also realize that it will not cure your scraping ills.</p>
<p>Because even though Rowse never called this technique a cure for scraping and mentioned that he still deals with as many cases as he can, I know well from experience that many people are looking for a &#8220;magic bullet&#8221; to make this matter go away.</p>
<p>Sadly, that bullet doesn&#8217;t exist and this certainly is not it. However, it might still be a useful tool, especially for those that plan to discuss spam-friendly topics.</p>
<p>Just use it with caution and keep in mind that you&#8217;ll still need to take other steps to protect your work.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/04/29/putting-your-feed-on-probation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Blogger CAPTCHA Cracked</title>
		<link>http://www.plagiarismtoday.com/2008/04/28/googles-blogspot-captcha-cracked/</link>
		<comments>http://www.plagiarismtoday.com/2008/04/28/googles-blogspot-captcha-cracked/#comments</comments>
		<pubDate>Mon, 28 Apr 2008 16:46:17 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Prevention]]></category>
		<category><![CDATA[Blogger]]></category>
		<category><![CDATA[Blogspot]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[search spam]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[spammers]]></category>
		<category><![CDATA[Splogging]]></category>
		<category><![CDATA[Splogs]]></category>
		<category><![CDATA[web spam]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=989</guid>
		<description><![CDATA[Though it seemed as if Google was starting to make some headway into the spam blog problem on its Blogger service, the spammers seem to have turned the tide by cracking the CAPTCHA system and creating more accounts than ever before. ]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  class="picleft alignleft size-medium wp-image-990" title="blogger-logo" src="http://www.plagiarismtoday.com/wp-content/uploads/2008/04/blogger-logo.jpg" alt="" width="250" height="81" />Google&#8217;s Blogger service, already one of the <a href="http://www.plagiarismtoday.com/2007/03/22/when-will-google-stand-up-to-spam/">largest sources of spam blogs on the Web</a>, is now being innundated with another wave of spammers following the <a title="Google CAPTCHA broken" href="http://www.thestandard.com/news/2008/04/25/spammers-ramp-siege-googles-blogger-bots">cracking of the Google CAPTCHA system</a>. This means that spammers can now fully automate the process of creating and setting up new Blogger spam blogs, making the process even faster and enabling the creation of more spam blogs than ever before.</p>
<p>Though these spam blogs will take many different approaches, inevitably, many of these spam blogs will use scraping as a means to fill their pages and appear more authentic to both Google the search engine and Google the host administrator.</p>
<p>Bloggers, especially those that frequently have spam-friendly keywords in their sites, should be aware of the likelihood of increased scraping on the Blogger service. Now would be an excellent time for sites that offer email subscriptions to <a title="Scraping Via Email" href="http://www.plagiarismtoday.com/2008/04/15/new-trend-scraping-via-email/">check for any @blogger.com accounts</a> and everyone to consider taking feed protection steps such as installing <a title="Antileech" href="http://redalt.com/Resources/Plugins/AntiLeech">Antileech</a>, creating a <a title="Feed Heater and Feed Footer" href="http://www.plagiarismtoday.com/2008/01/16/two-new-anti-scrpaing-wordpress-plugins/">feed header/footer</a> or using a <a title="Digital Fingerprint" href="http://www.plagiarismtoday.com/2006/10/05/update-digital-fingerprint-plugin-beta-2/">digital fingerprint</a>. </p>
<p>Sadly though it recently seemed as if Google was on the <a title="Google Attacks Spam" href="http://www.plagiarismtoday.com/2007/06/26/is-blogger-on-the-offensive-against-spam/">offensive against spam</a>, it now appears as if the tables have turned.</p>
<p>While the new spam wave is still ramping up, now is the best chance for bloggers to be aware of the issue and be prepared to <a title="Blogger DMCA" href="http://www.google.com/blogger_dmca.html">take action as needed</a>. Hopefully, Google will fix this issue soon and the impact of the problem will be limited.</p>
<p>If not, then Blogspot could easily become even more of a spam wasteland than before, making it even more difficult for legitimate bloggers to get noticed on the service and for Webmasters everywhere to keep their content out of spammer&#8217;s hands.  </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/04/28/googles-blogspot-captcha-cracked/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The Best Way to Report Spam to Google</title>
		<link>http://www.plagiarismtoday.com/2008/03/21/the-best-way-to-report-spam-to-google/</link>
		<comments>http://www.plagiarismtoday.com/2008/03/21/the-best-way-to-report-spam-to-google/#comments</comments>
		<pubDate>Fri, 21 Mar 2008 14:51:44 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Videos]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[google blog search]]></category>
		<category><![CDATA[google-video]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[search spam]]></category>
		<category><![CDATA[Search-Engines]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[Splogging]]></category>
		<category><![CDATA[Splogs]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2008/03/21/the-best-way-to-report-spam-to-google/</guid>
		<description><![CDATA[Many complain that it is very difficult to get Google to take action on reported spam blogs. However, a simple trick may make it easier to get the search engine's attention when reporting junk content. ]]></description>
			<content:encoded><![CDATA[<p><img SRC="http://aycu34.webshots.com/image/48033/2000709108004570234_rs.jpg" alt="Google Webmaster Tools Image" align="left" class="picleft"/>I was going through videos of past WordCamp presentations to <a href="http://dallas.wordcamp.org/schedule/">prepare for my own next week</a> and found myself <a href="http://onemansblog.com/2007/08/04/matt-cutts-lecture-whitehat-seo-tips-for-bloggers/">re-watching a presentation</a> by Google&#8217;s <a href="http://www.mattcutts.com/blog/">Matt Cutts</a> that he gave at WordCamp San Francisco in 2007.</p>
<p>At the forty minute mark in the presentation, Cutts said something that was interesting to those of us who deal with spam blogs but has been largely overlooked. When discussing <a href="http://www.google.com/webmasters/">Google&#8217;s Webmaster Center</a>, he mentioned that you can report spam through their Webmaster Tools feature and that they &#8220;give more weight&#8221; to those reports than the ones made through <a href="http://www.google.com/contact/spamreport.html">their public form</a>.</p>
<p>In short, if you have access to Google&#8217;s Webmaster Tools, which is free and easy to register for, you can use the form in there to file a more meaningful spam report. Best of all, the form is identical to the public one and and should not seem foreign to anyone used to filing spam reports.</p>
<p><img SRC="http://aycu30.webshots.com/image/47549/2006358667072020216_rs.jpg" alt="How to report Google Spam"align="right" class="picright"/>This is assumedly because the spam form in the Webmaster Tools is not anonymous, unlike the public one. Google, understandably, gives more significance to reports where they know the party providing the information.<br />
<span id="more-855"></span><br />
To file the report, simply log into the Webmaster tools dashboard and click the &#8220;Report spam in our index&#8221; link on the right hand side. You report paid links.</p>
<p>This may resolve many of the <a href="http://www.webmasterworld.com/forum30/32931.htm">claims that</a> Google <a href="http://www.quickonlinetips.com/archives/2006/07/how-to-complain-and-report-spam-blogger-blogs/">does not respond</a> (see comments) to spam reports.</p>
<p>All in all, while this is a very simple trick, it might help with the reporting of spam in cases where a DMCA notice is simply not practical.</p>
<p><strong>Note:</strong> In a strange coincidence, I found the video on <a href="http://onemansblog.com/">John Pozadzides blog</a>, who will be speaking directly before me at WordCamp Dallas. </p>
<p><object classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" width="437" height="370" id="viddler"><param name="movie" value="http://www.viddler.com/player/34fc548d/" /><param name="allowScriptAccess" value="always" /><param name="allowFullScreen" value="true" /><embed src="http://www.viddler.com/player/34fc548d/" width="437" height="370" type="application/x-shockwave-flash" allowScriptAccess="always" allowFullScreen="true" name="viddler" ></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/03/21/the-best-way-to-report-spam-to-google/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced

Served from: www.plagiarismtoday.com @ 2012-02-13 09:43:33 -->
