Using CSS to Thwart Content Theft

CSS Vinyl
Creative Commons License photo credit: tiagonicastro

Mikey, a contributor at Rusty Lime, recently posted a very interesting idea for deterring content theft or at least frustrating those who would lift your articles.

The basic premise is to use CSS trickery to ensure that would-be plagiarists pick up an image or a block of text, one that most likely denounces the theft or provides a link back to the original site, even though readers of the original site see nothing at all.

It’s a simple idea that that could deter or mitigate against content theft issues and may help some Webmasters add an extra layer of protection against such misuse.

However, as simple as the system is, it does have a critical flaw that greatly limits its usefulness, especially for bloggers.

The Premise

The actual idea behind the technique is strikingly simple.

Cascading Style Sheets (CSS) can instruct Web browsers how to display a certain item on a page. Whether it is an image, a block of text or something else altogether, CSS can be used to determine the position, size and other variables of it.

However, CSS can also be used to completely hide an element, by simply adding the following line to your site’s CSS file and changing the class name to whatever you desired.

.hiddenclass { display:none; }

From then on, anything you want to hide from your visitors, you simply add the class name to the appropriate tag. For example, to hide an image, you might use this code.

<img class="hiddenclass" src="http://www.yoursite.com/hiddenimage.jpg" alt="" />

This will keep the visitors on your site from seeing the content but, should anyone scrape the HTML code, they will not have matching CSS code on their site, causing the image or text to appear.

Theoretically, this can be used to provide attribution to your own site only on pages that misuse the content. It is a potentially great way to punish spammers without putting any burden on legitimate users.

However, there is an issue with the technique that could make it impractical for many sites.

Fly in the Ointment

As exciting as the idea is, the problem is with publishing via RSS feed. The issue is that RSS readers do not have the ability to interpret CSS and, as such, anyone viewing the content over the feed would see the “hidden” content as well.

One could remove the hidden content from the RSS feed, but that would make the technique useless against RSS scraping, which is the most common form of unwanted republication taking place.

This means that, anything you place in the hidden content needs to be something that can comfortably be displayed in the RSS feed as well. meaning that it is something users of the site will continue to see whatever you are trying to hide.

This prohibits you from using strong content theft warnings and other devices that might be tempting to use.

Illustrating the Problem

To help illustrate this point. I’ve added a special class to my site’s CSS file that will hide certain images. Below, I’m going to display the Plagiarism Today logo twice, first without and second with the CSS class.

Begin Visible

End Visible

Begin “Hidden”

End “Hidden”

If you are viewing this article actually on the site itself, the second image will be hidden and nothing will appear between teh two lines. However, if you are reading it on the RSS feed, you should see the image twice.

This is a recurring problem for me on this site as I use CSS attributes to position the inline images on the site but have to continue to add other code to ensure that they display right when viewed in RSS readers.

Either way, please leave a comment on your experience with this test, what RSS reader you are using and what the outcome was.

Conclusions

There are potential uses for this system. It could be especially useful in environments where you can edit CSS files but not add plugins or otherwise manipulate the RSS feed. It may also help with situations where HTML scraping is a bigger concern than RSS scraping.

For most, this technique will not be very useful but it is still a clever idea that might help some Webmasters better protect their content.

Even though it won’t do anything to actually stop the plagiarist or other rip off artist from using the work, it can mitigate against the damage that they do and add a little bit of frustration to the lives of a plagiarist.

Of course, until RSS feed get better support for CSS, this solution will always be an incomplete one. However, it still is a trick worth keeping in mind, if nothing else in case it becomes useful some day down the road.

30 Responses to Using CSS to Thwart Content Theft

  1. Will says:

    Hi Jonathan! Maybe I am not reading the post correctly, but I am on the site and I see both the visible and hidden images? This would be a great technique, but the flaw you point out makes it useless for the kind of message I would want to appear on a scraper's site! hehe.But if you use it only for attribution, I would think there could be a way to come up with an image that would be unobtrusive to feed readers and provide a visual attribution link or two in the content n the scrapers site. And if you could take it a step further and make the "hidden" content text and images and long enough, it might get around duplicate content to a small extent? In other words, don't just put in an image, but actually add some content that only appears in the feed, not on the original post. I don't know enough about how duplicate content is judged to know if this would make a difference, maybe you do?-Willon edit: I don't see a way to subscribe to a comment thread to get email notification of follow up comments. Did you remove it, or am I blind?

  2. @Will
    No, you're not crazy or reading anything wrong. The server suffered a VERY nasty crash last night and I have not updated everything. The CSS sheet on the site is the backup from before when the article was written, should be fixing it soon, and the plugin that does the comment subscription is not activated yet. That is one of my next things to do.

    So it's all part of the damage to the crash. Fixing it as fast as I can.

    Thanks for your thoughts!

  3. Will says:

    Hi Jonathan! Maybe I am not reading the post correctly, but I am on the site and I see both the visible and hidden images?

    This would be a great technique, but the flaw you point out makes it useless for the kind of message I would want to appear on a scraper’s site! hehe.

    But if you use it only for attribution, I would think there could be a way to come up with an image that would be unobtrusive to feed readers and provide a visual attribution link or two in the content n the scrapers site.

    And if you could take it a step further and make the “hidden” content text and images and long enough, it might get around duplicate content to a small extent? In other words, don’t just put in an image, but actually add some content that only appears in the feed, not on the original post. I don’t know enough about how duplicate content is judged to know if this would make a difference, maybe you do?

    -Will

    on edit: I don’t see a way to subscribe to a comment thread to get email notification of follow up comments. Did you remove it, or am I blind?

    • @Will
      No, you're not crazy or reading anything wrong. The server suffered a VERY nasty crash last night and I have not updated everything. The CSS sheet on the site is the backup from before when the article was written, should be fixing it soon, and the plugin that does the comment subscription is not activated yet. That is one of my next things to do.

      So it's all part of the damage to the crash. Fixing it as fast as I can.

      Thanks for your thoughts!

  4. @Will -
    No, you’re not crazy or reading anything wrong. The server suffered a VERY nasty crash last night and I have not updated everything. The CSS sheet on the site is the backup from before when the article was written, should be fixing it soon, and the plugin that does the comment subscription is not activated yet. That is one of my next things to do.

    So it’s all part of the damage to the crash. Fixing it as fast as I can.

    Thanks for your thoughts!

  5. jardel says:

    it's a nice idea!other option should be this:

    use the hidden class even for rss readers and do a list based on your feed subscribers of where they read the blog, then go to htaccess and disable hotlinking for these readers. I don't know, but might work, also i don't know if there will appear a broken image or it will not appear at all.then you could do a text like "this site scrapped our content bla bla bla, if you are in a feed reader please go to http://url and ask for removal from your rss client" etc etc.Maybe desktop readers suffer seeing the image too.The best technique i've seen so far is the one that you put "blog by author (C) year – year" in the top, all with links and a related articles in the bottom. If someone scraps via rss, there will be a link for the blog, the author, the copyright disclaimer and more 3 or 5 links in the bottom for other articles in the same blog.

  6. jardel says:

    it’s a nice idea!

    other option should be this:
    use the hidden class even for rss readers and do a list based on your feed subscribers of where they read the blog, then go to htaccess and disable hotlinking for these readers. I don’t know, but might work, also i don’t know if there will appear a broken image or it will not appear at all.

    then you could do a text like “this site scrapped our content bla bla bla, if you are in a feed reader please go to http://url and ask for removal from your rss client” etc etc.

    Maybe desktop readers suffer seeing the image too.

    The best technique i’ve seen so far is the one that you put “blog by author (C) year – year” in the top, all with links and a related articles in the bottom. If someone scraps via rss, there will be a link for the blog, the author, the copyright disclaimer and more 3 or 5 links in the bottom for other articles in the same blog.

    • @jardel
      I like the idea of using htaccess but it would become a second job trying to keep up with where the readers were grabbing the feed. Some would be obvious, such as Google Reader, but every new news reader would have to be added.

      In the end, I think your idea at the bottom is best, just add the copyright notice and make it as "yours" as possible. It's not perfect, but it's something…

  7. jardel says:

    it’s a nice idea!

    other option should be this:
    use the hidden class even for rss readers and do a list based on your feed subscribers of where they read the blog, then go to htaccess and disable hotlinking for these readers. I don’t know, but might work, also i don’t know if there will appear a broken image or it will not appear at all.

    then you could do a text like “this site scrapped our content bla bla bla, if you are in a feed reader please go to http://url and ask for removal from your rss client” etc etc.

    Maybe desktop readers suffer seeing the image too.

    The best technique i’ve seen so far is the one that you put “blog by author (C) year – year” in the top, all with links and a related articles in the bottom. If someone scraps via rss, there will be a link for the blog, the author, the copyright disclaimer and more 3 or 5 links in the bottom for other articles in the same blog.

  8. @jardel
    I like the idea of using htaccess but it would become a second job trying to keep up with where the readers were grabbing the feed. Some would be obvious, such as Google Reader, but every new news reader would have to be added.

    In the end, I think your idea at the bottom is best, just add the copyright notice and make it as "yours" as possible. It's not perfect, but it's something…

  9. @jardel -
    I like the idea of using htaccess but it would become a second job trying to keep up with where the readers were grabbing the feed. Some would be obvious, such as Google Reader, but every new news reader would have to be added.

    In the end, I think your idea at the bottom is best, just add the copyright notice and make it as “yours” as possible. It’s not perfect, but it’s something…

  10. Mikey says:

    Hi Jonathan. I’m glad you found my idea interesting. I might even implement it soon.

    Regards,

    Mikey.
    http://www.rustylime.com

  11. Mikey says:

    Hi Jonathan. I’m glad you found my idea interesting. I might even implement it soon.

    Regards,

    Mikey.
    http://www.rustylime.com

  12. @Mikey
    Let me know how it works for you!

  13. @Mikey -
    Let me know how it works for you!

  14. Mike Sharp says:

    The problem with a CSS-based approach is that it affects accessibility, and isn’t very reliable. It also requires you to implement this on every page, and doesn’t protect images and non-html property.

    A better approach is to handle this server-side with an HTTP module (or the equivalent for your platform). This works by checking the HTTP referer header in the request. This tells your server where the user was when they made the request. If the referer header isn’t your domain, then they got the image (or other stolen content) from somewhere else. (btw, referer is mis-spell, but that’s the way the HTTP spec shows it.)

    One way to handle hotlinked images is to replace the requested image with one that informs the viewer that they are seeing a stolen image.

    For example, here’s how to do it on Apache with mod_rewrite:

    http://www.jibble.org/myspace-hotlinking/

    Other platforms have similar approaches, and there are commercial products that can help with this as well.

    Hotlinking images is a real problem, since (copyright issues aside) it costs the victim money in terms of bandwidth. Thomas Scott wrote about this back in 2004 on AListApart:

    http://www.alistapart.com/articles/hotlinking/

    Regards,
    Mike Sharp

  15. Mike Sharp says:

    The problem with a CSS-based approach is that it affects accessibility, and isn’t very reliable. It also requires you to implement this on every page, and doesn’t protect images and non-html property.

    A better approach is to handle this server-side with an HTTP module (or the equivalent for your platform). This works by checking the HTTP referer header in the request. This tells your server where the user was when they made the request. If the referer header isn’t your domain, then they got the image (or other stolen content) from somewhere else. (btw, referer is mis-spell, but that’s the way the HTTP spec shows it.)

    One way to handle hotlinked images is to replace the requested image with one that informs the viewer that they are seeing a stolen image.

    For example, here’s how to do it on Apache with mod_rewrite:

    http://www.jibble.org/myspace-hotlinking/

    Other platforms have similar approaches, and there are commercial products that can help with this as well.

    Hotlinking images is a real problem, since (copyright issues aside) it costs the victim money in terms of bandwidth. Thomas Scott wrote about this back in 2004 on AListApart:

    http://www.alistapart.com/articles/hotlinking/

    Regards,
    Mike Sharp

  16. Mike,

    I agree that the CSS solution has serious flaws but, then again, so does any DRM technique. The solution you present, for example, won't wok on those who turn of referrals (which includes many of my privacy-buff friends) and will not work in all environments as many bloggers don't have access to their server config files.

    That being said, I generally think technology solutions are a waste for many of the reasons you list, still, I discuss them for those who are interested. If you think that a technology-based approach is best, you need to decide what works for you in your situation with your needs.

    You definitely present a good idea for some and I agree it is superior in many ways, but I think every method will have its limitations.

    That's fair to say…

  17. Mike,

    I agree that the CSS solution has serious flaws but, then again, so does any DRM technique. The solution you present, for example, won’t wok on those who turn of referrals (which includes many of my privacy-buff friends) and will not work in all environments as many bloggers don’t have access to their server config files.

    That being said, I generally think technology solutions are a waste for many of the reasons you list, still, I discuss them for those who are interested. If you think that a technology-based approach is best, you need to decide what works for you in your situation with your needs.

    You definitely present a good idea for some and I agree it is superior in many ways, but I think every method will have its limitations.

    That’s fair to say…

  18. Susan says:

    Dear Jonathan, I hesitated to put my website back-up because people were stealing the content from my uncle’s book (used with permission) The copyright of the book was updated again during the 1990s. It would be wonderful to so many people to have the information concerning genealogy, with fun journal stories and pioneer courage. I hope I can keep studying this page and figure this out. Unfortunately, when I was uploading my pages, I avoided Java and css, thinking it was just too much to learn. Now, I need to learn better the pages in the correct resolution and need to learn Java and css, anyway. Thank you so much for your help! It will keep my family’s hard work safe and not used to make someone else money, by selling it as their own. I was just sharing out of love of my family and gave complete credit to those that worked so hard for our family to have. (((hugs))) Susan Lazenby Santa Paula, California

  19. I am sorry to hear about your problems in this area, if I can help in any way, please let me know.

    You’re very welcome for the help and please let me know if there is more that I can do!

  20. [...] August 20, 2008 Three artices of interest Posted by sharonb under Tips, Typography, Webdesign | Tags: copyright, CSS, hosting, plagerism, Typography |   Plagiarism Today has published an interesting CCS technique to counter or at least frustrate anyone who is scraping your site. Check out the article Using CSS to Thwart Content Theft. [...]

  21. [...] The idea for this comes from Plagiarism Today. [...]

  22. [...] The idea for this comes from Plagiarism Today. [...]

  23. Eszter says:

    Well, it definitely doesn’t work for me, tried it with four different posts.

  24. Sorry to hear that it didn't work fo ryou but I'm not wholly surprised, I knew it was a very limited approach to the issue. If you need any help, send me an email and I'll see what I can suggest.

  25. Sorry to hear that it didn’t work fo ryou but I’m not wholly surprised, I knew it was a very limited approach to the issue. If you need any help, send me an email and I’ll see what I can suggest.

  26. Trevor says:

    There are other issues as well. Some content gets copied for innocent reasons like emailing to a friend or for research. Tynt’s Tracer reveals what content is being copied from your website and automatically adds an attribution link back to your original content if it is lifted from your site and pasted into an email, blog or website. This way you get the traffic and the credit. Visit <a href="http://www.tynt.com” target=”_blank”>www.tynt.com <http://www.tynt.com> to sign up now.Trevor Do you know what is being copied from your site?

  27. genealogy says:

    well, to guard my content but it will be much impossible than to prevent someone steal

Leave a Reply

STAY CONNECTED