Don’t Plagiarize Us: Twitter Plagiarism Checking

If you want to check your blog for plagiarism, there’s Fairshare (previous coverage). if you want to check your static text for plagiarism, there’s Copyscape (previous coverage) and Plagium (previous coverage). If you want to check your images for plagiarism, there’s Tineye (previous coverage).

But what about your Twitter stream? Up until now there has been precious little.

However, Robby Grossman has developed a new service entitled Don’t Plagiarize Us that it hopes will fill that gap.

The service promises to check your Twitter stream (or anyone else’s) for duplicate tweets and report back on what it finds. However, after putting it through a few paces, I found the results to be interesting, but not altogether useful, especially considering the nature of Twitter and how it works.

Still, it is likely worth a try, especially considering it only takes a second and is completely free.

How it Works

The idea behind Don’t Plagiarize Us (DPU) is pretty basic. You put in your Twitter username, or whatever username you wish to check, and it scans the most recent tweets, usually about ten, and it finds tweets that are similar but aren’t marked as Retweets or replies.

In short, if the tweet is the same or close and it doesn’t contain attribution, the service marks it as suspicious.

One interesting thing DPU does is mark cases that it considers possible “indirect” or “fuzzy” plagiarism. These are cases where the tweets are similar, but not exact, to the source. The idea is to catch people who maybe copied a tweet, but altered a few words or rearranged it in some way.

With that in mind, I gave DPU a few tests and my results are below.

My Tests

For my first test of the service, I decided to run my Twitter account through it. It searched back through over 3000 tweets on the first click and only found three instances of suspected plagiarism, all of them of the “fuzzy” kind.

However, none of the incidents seemed to really be relevant.

For example, it matched my Tweet:

Long day. Time for some Wendy’s.

With this one by @thequestion77:

No baseball practice today. Which is good, cause I’m beat & I have a long day tomorrow. Time for Wendy’s & then some serious lying around

I seriously doubt either was plagiarized and it is almost certain that the similarities are purely coincidental. It was a pattern that fit with the other “fuzzy” matches that DPU handed me.

I then tried the account of my friend and Copyright 2.0 Show co-host Patrick O’Keefe, AKA:@ifroggy. However, this time, DPU only went back ten tweets and failed to find anything. Even after clicking the “Go Back Farther” button a few times, getting back 37 tweets, still nothing showed up.

I then tried a few Twitter celebrities including Ashton Kutcher (@aplusk) and Diddy (@iamdiddy).

Kutcher’s account produced a lot of matches though almost all were “fuzzy” ones that had little significance. There are also false negatives, exact quotes listed as fuzzy ones, and several cases where tweets were in the list twice.

Diddy’s account was less interesting, having found only one duplicate tweet, after going back about just 20 tweets. This one was an exact match that appears to be from a Twitter bot that took Diddy’s tweet and reused it without attribution. This is, most likely, the only true plagiarism I found.

Finally, in the name of fairness, I also ran the creator’s account through DPU. After going back almost 1800 tweets the system found a slew of similar tweets though, once again, the majority were simply “fuzzy” matches that were almost certainly not plagiarisms and the exact matches were far too short to be useful, less than six words.

Of all the copies of his tweets, only one was interesting, but it seems to be a retweet that was missing the attribution, not an attempt to plagiarize as it was just a promo for a livestream.

Thoughts on the Service

After all of this testing, the impression I walked away with was that it was a neat idea for a service, but one that was extremely flawed.

First, the matching did not seem to work very well. As I mentioned above, the fuzzy matches were almost completely useless and even the verbatim matching was often too short to mean much. A service like this would clearly have to focus more on quality, not quantity of matches given the shorter length of tweets and the sheer volume of them.

Also, the service itself seemed unreliable. When I initially tested it I received error messages but, at the encouraging of the creator, I gave it a second try and did indeed find it working, for about 3/4 of my tests. Far too often I was greeted with a notice like this one:

The errors were fairly random and I could often just go back and try my query to fix it, but they were still frequent enough to be extremely annoying.

However, the bigger problem is with Twitter itself. The nature of the service, with tweets under 140 characters, most of which are mundane in nature, and billions of Tweets being passed around, it is almost inevitable that there will be similar tweets without any malicious intent.

Though the service does a decent job ignoring retweets and replies, two elements that could have greatly messed with the results, the fact is that there is still so much coincidental matching that there is little hope for separating the actual plagiarisms from the coincidences and, even if you can, there is little one can do as, in most cases, tweets likely aren’t protected via copyright.

Still, this isn’t to say you shouldn’t give this service a try. It is a fairly cool idea and it is is a great way to see who is tweeting about similar things to you. It might not help you catch a plagiarist, but you might find a few cases where a retweet went sour or people were talking about the same topic as you.

It could, if nothing else, be a new way to meet people on Twitter.

Bottom Line

The system is a neat idea but I don’t think it has much practical use now. With some tweaking it could become more useful for Twitter plagiarism detection though it is unclear what one would do about that in the majority of cases.

The site does mention that they are working on “real time monitoring” of Twitter plagiarism, a feature that could be very useful for those whose tweets are often copied without attribution, but I suspect that the usefulness of that service will be limited to only certain kinds of Twitter users.

All in all though, definitely give this service a try. Even if it isn’t the most useful, it’s free and can still provide some interesting information.