If I told you that I had a plagiarism solution that could prevent people from copying and pasting your works, be watermarked to certify ownership, stop RSS plagiarism, prevent sploggers from stealing your work and practically end scraping as we know it, you’d think the technique was a godsend.
That is, until I told you that the fix also slows down your site considerably, makes your text unsearchable and prevents large numbers of people from viewing your site.
Then you’d probably think twice about it.
However, that’s exactly the plagiarism-fighting nature of Hidetext.net, a free service that converts text, both small and large chunks of it, into simple images that can’t be copied and can’t be Googled.
The Reverse OCR
The idea, on its face, is pretty effective. By converting text into an image, you prevent simple copy and paste of your work. As an image, you can take other steps to prevent people from stealing including watermarking it, restricting image hotlinking and so forth.
Thus, the ability to convert large quantities of text, up to 100,000 characters (roughly 20,000 words) into an image is a very appealing to some, especially the most frightened by the idea of content theft. The pot is sweetened by the fact that the images can be hosted on HT’s server, thus eliminating the need for you to use your own bandwidth and resources to house them.
However, the system is far from perfect. Currently, there are only two fonts to choose from and no way to change the formatting of the image. The image you get is set at a specific width with black text on a white background. If your site calls for another look, you’re out of luck (right now at least).
Also, images are always less efficient than text in terms of file size. Transmitting a made up of copy always takes more effort than transmitting just the text. This can cause increased loading times, higher bandwidth usage and problems for end users, especially those on slower connections.
Finally, any text displayed via this means will be hidden from the search engines. Though HT touts this as a feature, as it can be for instances were you don’t want the world knowing you said something, most webmasters live for search-engine friendly content and don’t want to hide their information from them. Thus, as a full-fledged plagiarism solution, HT is overkill.
However, HT was never designed to serve that purpose in the first place.
Even though the service isn’t very good as a general plagiarism solution, it can be useful for other things.
First off, it is great for hiding small amounts of text that you want people to have access too, but not necessarily stuff you want everyone searching and indexing. Email addresses, phone numbers, street addresses and so forth can be easily hidden using this method.
In fact, HT offers a very capable email hiding program just for that purpose.
Also, sections of text that have a high monetary value but do not need to be readily searchable can be hidden through this means. If you were wanting to release a sample chapter of a book to the public, but didn’t want it to be copied and pasted randomly, This technique, combined with watermarking, could make protect such static and important works.
Finally, it could also be useful for protecting important works that you don’t want distributed or changed. For example, the U.S. Copyright Office does something similar to this with their list of designated agents. The site itself is regular HTML, but the individual files are image PDFs. Though this probably was brought on by either necessity or convenience, it also adds a great level of protection against content theft and it prevents someone else from easily making a database of all the designated agents and redistributing it while allowing for some element of searchability.
Still though, as a general purpose anti-plagiarism tool, it’s overkill. Some static sites with shorter works, such as poetry ones, might get some use out of this, but blogs and other content-intense sites simply lose too much in too many ways to make this useful.
This doesn’t mean that it’s useless, far from it, just that it’s not the answer to the overarching questions.
It is important to note that converting a text work to an image does make it vulnerable to image plagiarism instead of regular text plagiarism. Sadly, image plagiarism is still the most common on the Internet today and, unless you take the time to protect your generated image, it’s as easy to steal as any regular copy.
Also, though there are ways to cloak text inside of a Web page, this clearly is going to produce a search engine penalty. With less content easily searchable, you’re less likely to get the traffic you want and, thus, your work is less likely to be read.
Finally, this method will thwart text readers that the blind use to navigate the Web. Unless the entire copy is placed somewhere within the “ALT” element of the “IMG” tag, anyone using a text reader will not be able to see your text. This will effectively make your content inaccessible to anyone that is blind or otherwise unable to use a computer monitor.
In the end though, I do agree with many who say that, if you’re so worried about your text being stolen that you would seriously consider this, then maybe you shouldn’t be posting it on the Web. After all, the goal of the Web is to make the content available to a lot of people very quickly. However, this attracts fewer people and slows them down when they arrive.
Simply put, there is no magic bullet to plagiarism and it comes down to a decision. Do you post your works online knowing that some people will take them or keep them to yourself knowing that they’ll never be read?
Though we must fight plagiarism lest it become such an epidemic that legitimate artists and writers have no motivation to share their work, at least a small amount of it is, sadly, inevitable.
Looking for magic bullets only makes things worse and, though HT is a great tool, especially for free, it can’t do what it was never designed to, much less what is physically impossible.
So, while it has its uses and certainly is a worthwhile tool for many things, it is not the answer.
That, sadly, will likely never come.
[tags]Plagiarism, Images, Copyright Infringement, Content Theft, Security, Search Engines[/tags]