The other night I was pondering the problem of blog plagiarism and wondering what can be done about it. With RSS scraping becoming more popular, the continuing rise of splogs and the deluge of good old-fashioned copy and paste plagiarism taking place, blogs are rapidly becoming a haven for plagiarists of all varieties.
However, it can be difficult to tell the original posts from the copies. With the immediacy of the blogosphere and the ability to roll back timestamps on most entries, a plagiarized copy can appear to be posted at the same time or even before the original, especially if RSS plagiarism is involved.
Clearly, one of the first steps in defeating plagiarism in the blogging world is some kind of content verification system. Fortunately though, the tools we need for such a system are already in place. All that’s needed is for someone to take advantage of them…
The blogosphere is an environment of immediacy. The time difference between a new post and a plagiarized one can be just a few minutes or even a few seconds. Worse still, most blogging packages allow you to change the date and time of the post, enabling bloggers to make their posts seem to be earlier or later than usual.
There’s many legitimate reasons for people to change the timestamp on a post. You might want to forward date one so that it stays at the top (like a bulletin) or backdate it if you wrote it earlier but didn’t post it at that time. There’s a million legitimate reasons to change the time and date on a post, nonetheless though, it does make separating the copies from the originals a grave logistical challenge.
This is further exasperated by blog search engines, such as Technorati, that use the date and time stamp provided by the blogger for organizing their results, not the actual date and time received. By the time it’s all done, there’s no reliable date and time information to be found anywhere on the blogosphere. Just a general hope that everyone is acting in good faith.
However, as the rise in splogging has shown us, not everyone is playing nice…
The Copyright Office
One of the (few) nice things about registering at the U.S. Copyright Office is that it gives you a timestamp of when the work was submitted. Thus, anything that comes after it is clearly an infringement. Theoretically, if you submit every work right as it is created, you’ll have definitive proof of ownership of it.
The problem though is that’s much too slow for bloggers. The three days it takes a package to get to the copyright office is enough time for a post to be plagiarized dozens of times, grow cold and fall off the metaphorical radar. In the two months it can take to get confirmation back, the post is long forgotten. Finally, the $30 registration fee is more than a little prohibitive for bloggers, many of whom do several posts a day.
What is needed is some kind of immediate verification system. Something that is done automatically with every post, take places instantly and has credibility and reliability.
Fortunately, the infrastructure is in place, it just has to be applied to this end.
Pings as the Answer
Right now, most bloggers have a feature where their software alerts (or pings) various services on the Web. Most ping search engines such as Technorati and Ice Rocket to let them know about the new post, others use pinging services such as Pingoat to send out basts of pings to various search engines, directories and listing.
There’s no reason that the same technology can not be reapplied to create certified time stamps. All that’s required is an unbiased third-party service that can receive pings, retrieve the article and then hang on to it, either as a public archive (like archive.org) or a private repository. If a dispute were to arise later, either in a courtroom or publicly, the timestamp could be verified and used as proof.
This system, ideally, would use Coordinated Universal Time (UTC), which is also known as Greenwich Mean Time (GMT), to avoid conflicts with time zones and would be synchronized with the atomic clock. This would help to ensure that everyone was on the same clock and that there was no confusion as to who posted what, when.
This system would be useful for problems other than copyright disputes. It could be used to, finally, determine which blogger broke a story first, tell what was changed in a story since it was first posted and open up a lot of eyes as to how blogging really works.
After all, most people who read blogs without writing one have little clue what goes on behind the scenes. This could be a very eye-opening revelation for many people.
However, the system does have problems that would need to be addressed.
First and foremost, since most pings pull from the RSS feed of the blog, those using truncated feeds, however, will only find limited use in this unless, of course, the site was trained to look at the HTML page and not the feed.
Also, any time you have a centralized server, you are prone to network outages and other computing problems. Though we would love to have an around the clock service, it’s pretty clear that, when you look at the problems Technorati has been having, that running a popular Web site that receives a large volume of pings is a very grave challenge.
It’s also worth noting that plagiarists might not voluntarily ping a service that could easily disprove them.
The hope is that this service would either A) Be so ubiquitious that one wouldn’t think about not pinging it or B) Become revered enough that sites carrying a “Verified by X” label would have much more credibility than those lacking it.
Finally, this isn’t a way to stop plagiarism online, just a way to provide proof and verify blog posts and things posted using blogging software. It would still be up to Webmasters to check for plagiarized work and to stop it. Though this service could, conceivably, be upgraded using algorhythms to check for plagiarism, it would be a huge burden on an already taxed service.
The Ideal Solution
Ideally, a search engine like Technorati would step up and add real timestamps to each submission rather than blindly trusting what the blog sends them. Since the two are not mutually exclusive (the Technorati timestamp could be called “received” and the blog’s “posted”). It could even be a new sort feature for the search engine and would only have to be displayed in cases where the two were different.
This would eliminate the need for another setup to be created and, since everyone pretty much wants to be in Technorati, it would be much less likely to be avoided by plagiarists, even if it could prove their misdeeds.
Hopefully this is something that will come about sooner or later. It’s a natural feature for the search engine, would be easy to add and would aid bloggers greatly.
It’s really a win-win.
In the end, this is just a solution to one problem and is not a “big picture” answer. Still, it’s a solution that can be implemented tomorrow with minimal resources and effort. It’s something that’s so logical and easy that, frankly, it’s amazing it hasn’t been done before.
It would be a huge service to bloggers and viewers alike and could go a long way, not only to helping with copyright disputes, but improving the quality and trust in the blogosphere.
Very noble causes to say the very least.
[tags]Plagiarism, RSS, Content Theft, Pings, Blogging, Blogs[/tags]