Rise of the Twitter Scrapers
By Jonathan Bailey • Feb 13th, 2009 • Category: Articles, News
It was an inevitability. As Twitter has grown in popularity, both as a networking and as a promotion tool, it has become an increasingly enticing target for spammers.
To date, most of the Twitter spam has been of the auto-follow variety. A spammer sets up an account, links it with a site they want to promote and the proceeds to follow hundreds, if not thousands, of strangers. Those strangers not only get the follow notification, thus turning it into a form of email spam, but also are forced to click the link to the Twitter account to determine if it is one they want to follow back, thus exposing them to the advertisements.
As frustrating as these accounts can be, for the most part, these spammers have had little interest in creating a legitimate-looking Twitter presence. They typically post only a few tweets, usually filled with links to the destination site, and they attract almost no followers.
However, a new breed of Twitter users seems to be changing that. These users are creating Twitter accounts that aren’t spammers in the traditional sense, but are actually Twitter scrapers. These accounts grab results from Twitter search feeds and republish them.
The question, however, is whether these new bots are legitimate forms of Twitter expression or a new form of spam that needs to be stopped. Also, if it does need to be stopped, how can it be done?
From Haikus to Shut Ups

If you mention the word “Haiku” in your tweet. It is almost certainly going to wind up on the @haikutwaiku account. It doesn’t matter if you’re posting your latest haiku creation, discussing haikus or just using a hashtag with Haiku in it, the account picks it up and, currently, does not attribute the tweet back nor does it indicate that it is a retweet.
Every tweet in the account is, originally, from another user. For example, this tweet on the @haikutwaiku account is actually from @jennar. Likewise, this @haikutwaiku tweet is from @CobWebsStir.
The @haikutwaiku account is both very active, with nearly 200 tweets per day, and relatively popular, with over 700 followers as of this writing.
Twitter users, for the most part, seem to either tolerate or be oblivious to the copying of the @haikutwaiku account. Most of the discussion with the account has been positive. However, a few Twitter users, such as @timtfj, have expressed displeasure.
This isn’t to say that all Twitter scrapers are plagiarizing their tweets. Another scraper, @shutupmeg targets tweets with the keyword “shut up” and give attribution to the tweets, though it uses “(@username)” rather than the “RT @username” format.
However, the response to @shutupmeg has been much more hostile. This may be because the attribution informs more Twitter users that their tweets are being copied or the keyword in question may attract a more hostile kind of Twitter user.
Either way, these are just two of the wide variety of Twitter bots that are scraping search results and republishing them in a new account. It seems likely that the controversy has just begun.
Copyright, Plagiarism and More
The next obvious question is whether or not any of these scrapers can be accused of copyright infringement, as many spam blogs can? As I pointed out during the Tweetbacks controversy, most tweets don’t rise up to the requisite level of creativity needed for copyright protection. As a result, it is likely that these services don’t raise any direct copyright issues.
However, the @haikutwaiku service may be an exception. Since it targets haiku poetry, a format of literature that is both tweetable and has been ruled protected in the past, it is easy to see how one could reach the conclusion that its activity is an infringement, even though there may still be fair use issues.
Beyond the copyright issues, it is unclear what could be done to stop Twitter scrapers if it were so desired. The current terms of use at Twitter make no mention of auto-posting bots, something that would have likely outlawed WordPress plugins and other tools used by bloggers for getting posts into their Twitter stream.
The end result is that these scraper bots may be here to stay and, unless Twitter users are able to motivate Twitter itself to take some kind of special action, it doesn’t seem likely to change.
Conclusions
Though Twitter scraping is likely annoying, especially when it is plagiarized, the nature of Twitter works against resolving these issues through traditional means. Copyright claims on tweets will be dubious and any Twitter rules that would target these bots would likely ensnare other, more accepted uses of the service.
The real question is how will Twitter users react as these bots become more common? Right now the response is rather mixed, some users expressing outrage and blocking the bots in question, others are tolerating or even enjoying their presence.
The real test will be how these bots are accepted after the novelty has worn off and after spammers begin to use them for more devious purposes. Right now the bots are fairly benign, linking only back to themselves or to nothing at all. Once they are used for promotion of sites or products, attitudes will likely change.
In short, we’ve only seen the very beginning of both the Twitter scrapers and the battle over them. Over the next few months, this will likely be a space where things get very interesting, very quick.
|
|
Protect Your Work. Subscribe to Plagiarism Today via Email or RSS. |
Jonathan Bailey is The Webmaster and author of Plagiarism Today, which he founded in 2005 as a way to help Webmasters going through content theft problems get accurate information and stay up to date on the rapidly-changing field. He is also a consultant to Webmasters and companies to help them devise practical content protection strategies and develop good copyright policies.
Email this author | All posts by Jonathan Bailey

Nice post. Interesting insight. I'm curious to found out more about this in the future.
Glad you liked it. I'm going to be following this closely.
When a Twitter user blocks another Twitter user, as I did when I was followed by someone I thought would steal my poems, is that user, when logged in, completely forbidden from seeing my page or any part of my feed?
That's a tough question. The simplest answer is that the block feature will keep them from seeing your tweets in their timeline. However, it won't prevent them from being able to see your page (logged in or not I don't think it matters) and it won't prevent them from seeing your tweets via search.
If you want to prevent that, you need to set your feed to private. Sad, but true.
I feel that the safe and proper thing to do is always give credit. Just let you followers know this article may be of interest and use a link shrinker. my followers get nice information, I get credit for pointing them to it, and the author gets credit for writing a great article.
I agree that it would be the easiest and most bots to that, but with a hard 140 character limit, it is getting harder and harder to keep attribution and the full tweet in some cases, a serious problem for these bots.
Doesn't make much sense to set your account to private b/c of this. I see the problem less as a one of content ownership (once you put something out on the Web, expect it to be replicated in some way, and to never be able to be completely taken back), and more as one of increasing sneakiness:
If the spammers, who now have crappy accounts use scraping to create accounts that are somewhat real in appearance, they will be harder to detect at first glance (it's still pretty easy, though time wasting to do now). What if they have “normal” accounts made up almost entirely of repurposed content, which is really only there as a filler to embed their sales stuff into?
(I am having even some additional ideas of what they could do which I won't discuss here so's to not encourage them.)
Good thing is, most of their spam will never work (it's annoying/time-wasting though). They don't get the social in Social Media…
I agree that it doesn't make sense to set your account to private. I agree that would be an extreme step and why I labeled it “sad but true” as there no happy medium here to keep a few bad guys away without nuking the effectiveness of the account.
I also agree that, for the most part it is about sneakiness and not content ownership, especially with Twitter. But there are those who do invest a lot of creativity into Twitter and at least have an interest in name recognition.
Hopefully though, you're right about the spam never working, though some of these spam accounts do have a lot of followers…
I created an account on twitter which covers email spam. What I do is get an RSS feed of everyone who has mentioned the words “email spam” in a tweet and go through and RT the ones I like. I do it by hand and I answer anyone who talks back to me. I am trying to let people have fun with emailspam without being a spammer myself. Would love any input on how I am doing and anything I might need to change.
I seriously doubt that you would have much to worry about since it is done by hand and you are following RT conventions. Honestly, this is a gray area that needs to be settled some. Have you had anyone ask you to not RT their posts?
I try to keep it real and answer people fast. It is a gray area which obviously I don't want to be on the wrong side of. I haven't had a single person complain yet, gotten several thanks though.
I seriously doubt that you would have much to worry about since it is done by hand and you are following RT conventions. Honestly, this is a gray area that needs to be settled some. Have you had anyone ask you to not RT their posts?
I try to keep it real and answer people fast. It is a gray area which obviously I don't want to be on the wrong side of. I haven't had a single person complain yet, gotten several thanks though.
this is so helpful. you are a jewel. I'm beginning to think the situation is hopeless, tho. Copyright law and enforcement just can't keep up.
I am being copied and I tweet for animal rescues. Its just annoying bc they are going back to tweets I posted Months ago! and they are under the name @anarchists794 – I don't want my animal rescue associated with anarchists. but nothing I can do except to keep reporting them as spam.
You may be able to also report copyright violations to Twitter separately:
http://help.twitter.com/forums/26257/entries/15795
You may be able to also report copyright violations to Twitter separately:
http://help.twitter.com/forums/26257/entries/15795