Rise of the Twitter Scrapers

twitter-logo

It was an inevitability. As Twitter has grown in popularity, both as a networking and as a promotion tool, it has become an increasingly enticing target for spammers.

To date, most of the Twitter spam has been of the auto-follow variety. A spammer sets up an account, links it with a site they want to promote and the proceeds to follow hundreds, if not thousands, of strangers. Those strangers not only get the follow notification, thus turning it into a form of email spam, but also are forced to click the link to the Twitter account to determine if it is one they want to follow back, thus exposing them to the advertisements.

As frustrating as these accounts can be, for the most part, these spammers have had little interest in creating a legitimate-looking Twitter presence. They typically post only a few tweets, usually filled with links to the destination site, and they attract almost no followers.

However, a new breed of Twitter users seems to be changing that. These users are creating Twitter accounts that aren’t spammers in the traditional sense, but are actually Twitter scrapers. These accounts grab results from Twitter search feeds and republish them.

The question, however, is whether these new bots are legitimate forms of Twitter expression or a new form of spam that needs to be stopped. Also, if it does need to be stopped, how can it be done?

From Haikus to Shut Ups

haiku-default

If you mention the word “Haiku” in your tweet. It is almost certainly going to wind up on the @haikutwaiku account. It doesn’t matter if you’re posting your latest haiku creation, discussing haikus or just using a hashtag with Haiku in it, the account picks it up and, currently, does not attribute the tweet back nor does it indicate that it is a retweet.

Every tweet in the account is, originally, from another user. For example, this tweet on the @haikutwaiku account is actually from @jennar. Likewise, this @haikutwaiku tweet is from @CobWebsStir.

The @haikutwaiku account is both very active, with nearly 200 tweets per day, and relatively popular, with over 700 followers as of this writing.

Twitter users, for the most part, seem to either tolerate or be oblivious to the copying of the @haikutwaiku account. Most of the discussion with the account has been positive. However, a few Twitter users, such as @timtfj, have expressed displeasure.

This isn’t to say that all Twitter scrapers are plagiarizing their tweets. Another scraper, @shutupmeg targets tweets with the keyword “shut up” and give attribution to the tweets, though it uses “(@username)” rather than the “RT @username” format.

However, the response to @shutupmeg has been much more hostile. This may be because the attribution informs more Twitter users that their tweets are being copied or the keyword in question may attract a more hostile kind of Twitter user.

Either way, these are just two of the wide variety of Twitter bots that are scraping search results and republishing them in a new account. It seems likely that the controversy has just begun.

Copyright, Plagiarism and More

The next obvious question is whether or not any of these scrapers can be accused of copyright infringement, as many spam blogs can? As I pointed out during the Tweetbacks controversy, most tweets don’t rise up to the requisite level of creativity needed for copyright protection. As a result, it is likely that these services don’t raise any direct copyright issues.

However, the @haikutwaiku service may be an exception. Since it targets haiku poetry, a format of literature that is both tweetable and has been ruled protected in the past, it is easy to see how one could reach the conclusion that its activity is an infringement, even though there may still be fair use issues.

Beyond the copyright issues, it is unclear what could be done to stop Twitter scrapers if it were so desired. The current terms of use at Twitter make no mention of auto-posting bots, something that would have likely outlawed WordPress plugins and other tools used by bloggers for getting posts into their Twitter stream.

The end result is that these scraper bots may be here to stay and, unless Twitter users are able to motivate Twitter itself to take some kind of special action, it doesn’t seem likely to change.

Conclusions

Though Twitter scraping is likely annoying, especially when it is plagiarized, the nature of Twitter works against resolving these issues through traditional means. Copyright claims on tweets will be dubious and any Twitter rules that would target these bots would likely ensnare other, more accepted uses of the service.

The real question is how will Twitter users react as these bots become more common? Right now the response is rather mixed, some users expressing outrage and blocking the bots in question, others are tolerating or even enjoying their presence.

The real test will be how these bots are accepted after the novelty has worn off and after spammers begin to use them for more devious purposes. Right now the bots are fairly benign, linking only back to themselves or to nothing at all. Once they are used for promotion of sites or products, attitudes will likely change.

In short, we’ve only seen the very beginning of both the Twitter scrapers and the battle over them. Over the next few months, this will likely be a space where things get very interesting, very quick.

If you enjoyed this post, please consider sharing it with your friends. Also, you can subscribe to the RSS feed or sign up for our email newsletter below:
Join The Plagiarism Today Mailing List
39 Responses to Rise of the Twitter Scrapers
  1. Michael P.
    February 13, 2009 | 8:48 am

    Nice post. Interesting insight. I'm curious to found out more about this in the future.

  2. Jonathan Bailey
    February 13, 2009 | 8:52 am

    Glad you liked it. I'm going to be following this closely.

  3. Michael
    February 13, 2009 | 9:15 am

    When a Twitter user blocks another Twitter user, as I did when I was followed by someone I thought would steal my poems, is that user, when logged in, completely forbidden from seeing my page or any part of my feed?

  4. Jonathan Bailey
    February 13, 2009 | 12:25 pm

    That's a tough question. The simplest answer is that the block feature will keep them from seeing your tweets in their timeline. However, it won't prevent them from being able to see your page (logged in or not I don't think it matters) and it won't prevent them from seeing your tweets via search.If you want to prevent that, you need to set your feed to private. Sad, but true.

  5. mlanger (Maria Lange
    February 13, 2009 | 1:47 pm

    <a rel="nofollow" href="http://twitter.com/plagiarismtoday">@plagiarismtoday I reported a Twitter scraper to Twitter about two weeks ago. Account was suspended. Good post; RT: http://is.gd/jrLX

  6. shaunjamison (shaunj
    February 13, 2009 | 1:55 pm

    RT <a rel="nofollow" href="http://twitter.com/plagiarismtoday">@plagiarismtoday The Rise of the Twitter Scrapers: http://is.gd/jrLX copyright issues spammers

  7. mlanger (Maria Lange
    February 13, 2009 | 6:47 pm

    @plagiarismtoday I reported a Twitter scraper to Twitter about two weeks ago. Account was suspended. Good post; RT: http://is.gd/jrLX

  8. shaunjamison (shaunj
    February 13, 2009 | 6:55 pm

    RT @plagiarismtoday The Rise of the Twitter Scrapers: http://is.gd/jrLX copyright issues spammers

  9. susannadee (Susanna
    February 14, 2009 | 8:01 am

    RT : <a rel="nofollow" href="http://twitter.com/pchere">@pchere <a rel="nofollow" href="http://twitter.com/CleverClogs">@CleverClogs Plagiarism Today. Rise of the Twitter Scrapers http://tinyurl.com/cpgwym

  10. natefanaro (Nate Fan
    February 14, 2009 | 9:32 am

    My bot <a rel="nofollow" href="http://twitter.com/shutupmeg">@shutupmeg was brought up in an article. http://is.gd/jrLX What should I do? http://ow.ly/gVS

  11. shutupmeg (shutupmeg
    February 14, 2009 | 9:34 am

    RT <a rel="nofollow" href="http://twitter.com/natefanaro">@natefanaro – My bot <a rel="nofollow" href="http://twitter.com/shutupmeg">@shutupmeg was brought up in an article. http://is.gd/jrLX What should I do? http://ow.ly/gVS

  12. JeremyKendall (Jerem
    February 14, 2009 | 9:49 am

    Rise of the Twitter Scrapers http://is.gd/jrLX

  13. susannadee (Susanna
    February 14, 2009 | 1:01 pm

    RT : @pchere @CleverClogs Plagiarism Today. Rise of the Twitter Scrapers http://tinyurl.com/cpgwym

  14. natefanaro (Nate Fan
    February 14, 2009 | 2:32 pm

    My bot @shutupmeg was brought up in an article. http://is.gd/jrLX What should I do? http://ow.ly/gVS

  15. shutupmeg (shutupmeg
    February 14, 2009 | 2:34 pm

    RT @natefanaro – My bot @shutupmeg was brought up in an article. http://is.gd/jrLX What should I do? http://ow.ly/gVS

  16. JeremyKendall (Jerem
    February 14, 2009 | 2:49 pm

    Rise of the Twitter Scrapers http://is.gd/jrLX

  17. Terry Breedlove
    February 14, 2009 | 4:12 pm

    I feel that the safe and proper thing to do is always give credit. Just let you followers know this article may be of interest and use a link shrinker. my followers get nice information, I get credit for pointing them to it, and the author gets credit for writing a great article.

  18. Nylons (Nancy Lyons)
    February 14, 2009 | 10:05 pm

    is intrigued by the plagiarism debate around tweets. Steal my tweet. Go on. I DARE YA. http://tinyurl.com/cpgwym

  19. Nylons (Nancy Lyons)
    February 15, 2009 | 3:05 am

    is intrigued by the plagiarism debate around tweets. Steal my tweet. Go on. I DARE YA. http://tinyurl.com/cpgwym

  20. professorclock (Dami
    February 15, 2009 | 11:21 am

    Reading: Rise of the Twitter Scrapers | PlagiarismToday http://bit.ly/c6DTs

  21. daveElf (David Elfan
    February 15, 2009 | 2:04 pm

    PLAGIARISM WATCH: Are tweets protected by copyright? http://tinyurl.com/cpgwym #feedly Beats me. ©2009 daveelf :)

  22. professorclock (Dami
    February 15, 2009 | 4:21 pm

    Reading: Rise of the Twitter Scrapers | PlagiarismToday http://bit.ly/c6DTs

  23. daveElf (David Elfan
    February 15, 2009 | 7:04 pm

    PLAGIARISM WATCH: Are tweets protected by copyright? http://tinyurl.com/cpgwym #feedly Beats me. ©2009 daveelf :)

  24. Jonathan Bailey
    February 16, 2009 | 10:52 am

    I agree that it would be the easiest and most bots to that, but with a hard 140 character limit, it is getting harder and harder to keep attribution and the full tweet in some cases, a serious problem for these bots.

  25. caminick (caminick)
    February 16, 2009 | 3:30 pm

    Twitter scraping and fair use http://tinyurl.com/cpgwym

  26. caminick (caminick)
    February 16, 2009 | 8:30 pm

    Twitter scraping and fair use http://tinyurl.com/cpgwym

  27. [...] As has been discussed many times before on this site and elsewhere, most tweets don’t rise to the threshold of copyrightability. There simply [...]

  28. Alex Schleber
    February 24, 2009 | 9:17 am

    Doesn't make much sense to set your account to private b/c of this. I see the problem less as a one of content ownership (once you put something out on the Web, expect it to be replicated in some way, and to never be able to be completely taken back), and more as one of increasing sneakiness:If the spammers, who now have crappy accounts use scraping to create accounts that are somewhat real in appearance, they will be harder to detect at first glance (it's still pretty easy, though time wasting to do now). What if they have "normal" accounts made up almost entirely of repurposed content, which is really only there as a filler to embed their sales stuff into?(I am having even some additional ideas of what they could do which I won't discuss here so's to not encourage them.)Good thing is, most of their spam will never work (it's annoying/time-wasting though). They don't get the social in Social Media…

  29. Jonathan Bailey
    February 24, 2009 | 4:04 pm

    I agree that it doesn't make sense to set your account to private. I agree that would be an extreme step and why I labeled it "sad but true" as there no happy medium here to keep a few bad guys away without nuking the effectiveness of the account.I also agree that, for the most part it is about sneakiness and not content ownership, especially with Twitter. But there are those who do invest a lot of creativity into Twitter and at least have an interest in name recognition.Hopefully though, you're right about the spam never working, though some of these spam accounts do have a lot of followers…

  30. Andromeda
    February 25, 2009 | 7:56 am

    I created an account on twitter which covers email spam. What I do is get an RSS feed of everyone who has mentioned the words "email spam" in a tweet and go through and RT the ones I like. I do it by hand and I answer anyone who talks back to me. I am trying to let people have fun with emailspam without being a spammer myself. Would love any input on how I am doing and anything I might need to change.

  31. Andromeda
    February 27, 2009 | 10:43 am

    I try to keep it real and answer people fast. It is a gray area which obviously I don't want to be on the wrong side of. I haven't had a single person complain yet, gotten several thanks though.

  32. Jonathan Bailey
    February 27, 2009 | 4:20 pm

    I seriously doubt that you would have much to worry about since it is done by hand and you are following RT conventions. Honestly, this is a gray area that needs to be settled some. Have you had anyone ask you to not RT their posts?

  33. AdamPieniazek (Adam
    June 1, 2009 | 4:33 pm

    RT <a rel="nofollow" href="http://twitter.com/plagiarismtoday">@plagiarismtoday Rise of the Twitter Scrapers | PlagiarismToday http://tinyurl.com/cpgwym (via <a rel="nofollow" href="http://twitter.com/tweetmeme">@tweetmeme)

  34. AdamPieniazek (Adam
    June 1, 2009 | 9:33 pm

    RT @plagiarismtoday Rise of the Twitter Scrapers | PlagiarismToday http://tinyurl.com/cpgwym (via @tweetmeme)

  35. bookflame
    October 27, 2009 | 8:16 pm

    this is so helpful. you are a jewel. I'm beginning to think the situation is hopeless, tho. Copyright law and enforcement just can't keep up.

  36. barbie
    October 30, 2009 | 5:39 am

    I am being copied and I tweet for animal rescues. Its just annoying bc they are going back to tweets I posted Months ago! and they are under the name @anarchists794 – I don't want my animal rescue associated with anarchists. but nothing I can do except to keep reporting them as spam.

  37. Jonathan Bailey
    November 2, 2009 | 11:56 am

    You may be able to also report copyright violations to Twitter separately:http://help.twitter.com/forums/26257/entries/1579…

  38. [...] Rise of the Twitter Scrapers – There’s a new kind of spammer in Twitter town. Learn more on PlagiarismToday. [...]

  39. February 2009 Links | Maria's Guides
    October 30, 2011 | 3:57 pm

    [...] Rise of the Twitter Scrapers – There's a new kind of spammer in Twitter town. Learn more on PlagiarismToday. [...]

Leave a Reply

Wanting to leave an <em>phasis on your comment?

Trackback URL http://www.plagiarismtoday.com/2009/02/13/rise-of-the-twitter-scrapers/trackback/