Distil: The Anti-Scraping Content Protection Network

Distil LogoI’ve talked a lot on Plagiarism Today about the dangers of scraping including both RSS scraping, where someone copies the content in your RSS feed and, usually, republishes it elsewhere, and site scraping, where search-engine like crawlers grab your site’s content for various purposes.

Defending against scraping, however, is incredibly difficult. Though some plugins and tolls like Bad Behavior for WordPress and simple blocking of bots can help, they aren’t perfect or complete solutions and in some cases, can deeply drain both your time and your site’s resources.

However, the team over at Distil thinks they have found a better way. By acting as an intermediary between the Web and your site, they claim to not only be able to filter out most scrapers and infringers, but also to speed up your site and improve its performance.

How it works is by combining the their anti-scraping and bad bot technology with a robust content delivery network, this enables them to not only filter out threats to your site, but also serve much of your static content quickly and from servers located nearest to your visitors.

But is Distil worth the time and money? I decided to give it a trial and see what I found.

What is Distil?

Distil Threat SummaryThe closest comparison one can make to Distil is Cloudflare as both use DNS changes to better protect and speed up your site.

With Distil (or Cloudflare) you edit your DNS settings, which can usually be found at your domain registrar or in your site’s control panel, to direct visitors not to your server, but to a custom nameserver from Distil. Visitors will then query Distil for your site, which first filters out any malicious users and then delivers any content it can from its servers, which are spread all across the world. Anything it can’t deliver, it queries from your server and then provides to the user directly.

The end result, if all goes well, is that most of the content of your site is delivered directly from Distil’s servers, which should be faster than coming from your own, and most malicious users, including scrapers, are filtered out before they ever reach your site or your content. Best of all, the process is completely invisible to end users (other than the potential speed increase).

To find out, if it works as advertisers, I switched Plagiarism Today over to Distil last weekend and, as of this writing, have been using it for the better part of a week.

Setting Up and Using Distil

To start using Distil, you have to first sign up for an account and have it activated. Once that’s done, you’ll be given an address that, using your DNS settings, you will direct both your www.domain.com and domain.com (as well as any other subdomains you want to redirect).

Then, after the DNS servers propagate, you should be using Distil’s service. From there, you can log into the Distil dashboard, which lets you configure a variety of options including:

  • Site Acceleration Settings (if available)
  • Rate Limiting
  • Blocking Known Violators
  • Blocking Bad User Agents
  • Browser Integrity Checks
  • Filter By Country
  • Block Bad Referrers
  • Whitelist/Blacklist
  • WWW/Non-WWW Routing

You also get a bevy of statistical data including information about the number of unique sessions, the total number of requests, total human requests and the total bot requests. Bot requests are then further broken down by the number of search engine requests (which are always allowed) and the number of blocked requests (as well as the reasons for being blocked). The blocked bots are then further broken down by bot type, IP address and more.

The result is that you get an overall perspective of what’s going on with your site, both in terms of human traffic but, more directly, the security threats you’re facing.

But does that make Distil worth trying? A lot of it depends on your needs and what you’re looking to get out of it.

The Good of Distil

The one thing that immediately struck me about Distil is the granular level of control it gives you over security issues. Though Cloudflare offers a good deal of site security, it’s focused on spammers and attackers and only lets you set a broad level of security (low, medium, high or basically off). With Distil, you can set individual options to your liking both to target the threats most relevant to your site and, more importantly, make sure you don’t interfere with legitimate users.

Distil Settings Image

Over the past few days I’ve had no reports of legitimate visitors being hassled by Distil, something that was an occasional problem with Cloudflare, especially for visitors from outside the U.S. and Europe.

So, even though Distil did not block as many bots as Cloudflare (likely because I have the security settings for most features turned down or off), it did a better job staying out of the way and still seemed to stop the most egregious offenders. Over time, I plan on slowly increasing the settings to see if they block more and continue to be non-intrusive.

Beyond security, my first concern after switching to Distil was that my site might take a performance hit. Having been a Cloudflare user for many months, I was used to the power of a robust CDN. However, I did a series of tests both before and after the change and found that Distil was usually slightly faster than Cloudflare, often shaving off 30% of the site’s loading time.

Compare these two example results, first before:

And then after:

PT Distil Test

(Note: While this example isn’t an apples-to-apples test due to differing endpoints, the results were consistent regardless of endpoint. Also, obviously there were other changes made in the four days between the tests, though no major alterations, frontend or back, were made.)

Finally, the support team at Distil is, simply put, the best of any company I’ve worked with. They answered every question I had very promptly, usually within 15 minutes and it didn’t seem to matter what time of the day I was asking it. This enabled me both to get my site set up quickly with Distil despite some confusion and questions and deal with an issue with Google Analytics (that turned out to be my own fault).

All in all, Distil did a good job in providing granular security control, a site performance boost and great support.

The Problems with Distil

The biggest initial problem with Distil is that, in its current form, it is not very simple to use. Not only do you have to wait for your account to be activated by a human, but the process of switching over your DNS is not as straightforward as Cloudflare.

If you aren’t comfortable working with DNS and aren’t familiar with how to edit CNAME and A records, the process is going to be intimidating. Sadly, unlike Cloudflare, there isn’t a great deal of hand holding unless you contact support. While I agree with Distil that’s better to not hand over total DNS control to a third party, as you have to do with Cloudflare, it’s also the much more difficult route for the user.

Another issue I have with Distil is the current pricing structure. The free account, which does not have content acceleration, offers only 5 GB of traffic per month, an amount even a modest blogger will likely blow through quickly. A site Plagiarism Today’s size fits (barely) under the cap for the small account, which offers 50 GB of transfer for $29 per month. However, Cloudflare’s free plan allows for unlimited traffic and it’s pro account, which offers additional statistics and monitoring, is only $20 per month. Other CDNs, such as MaxCDN, charge only $50 for 1 TB (1000 GB) of data.

Distil told me that they are considering restructuring their pricing in the coming weeks, a move that, most likely, will help with this problem.

For now at least, Distil is a terrible deal as CDN though its security features may help to make it more compelling to webmasters concerned about scraping and content misuse.

Finally, Distil, obviously, won’t be able to help with at least some kinds of scraping. RSS scraping likely won’t be blocked unless the bot doing it is already in the system and it is unclear just how many are. However, if you know the bot you can add it yourself in your control panel. Also, any human copying won’t be blocked because the system is designed precisely to allow humans to access your site.

Despite these limitations, there’s still a lot of webmasters who would likely benefit from Distil, even if that number could be a great deal larger down the road.

Bottom Line

Distil isn’t perfect. It’s a new company and it’s product certainly has its share of flaws. Right now, it’s aimed at a fairly niche market of webmasters who are technically savvy, want a great deal of granular control over their site’s security and are willing to pay extra to make it happen.

However, with some changes to its setup procedure, pricing and control panel, it could become a compelling option for many more sites.

In short, Distil is going to be a company to watch in the coming months and years. As it refines its tools and pricing, it could become a major force for helping content creators protect their work.

In the meantime though, other webmasters just wanting a CDN to improve their site’s performance will, most likely, want to look up other solutions, such as Cloudflare and MaxCDN as they are significantly cheaper and, in the case of Cloudflare, provides better analytics, easier setup and at some decent, if simplified, security features.

Still, if you’re in Distil’s niche, which is likely to grow, I can see why it would be a very powerful solution to a complex problem.

2 Responses to Distil: The Anti-Scraping Content Protection Network

  1. wpguidance says:

    Oh, they restructured their pricing alright: no more free or $29/month plans. Pricing now starts at $200/ month for a max of 5 sites. How can they ethically jack their prices up like that in only a matter of months?

    • Distil says:

      Hi wpguidance – Our goal is to delivery insanely great solutions that help websites prevent content theft and web scraping. You’ll have to forgive the variances between the time of this article (January) and now (nearly August).

      Since the time this article was written, our development team has been working around the clock to produce a level of content protection that has not been available in the market. Our network has undergone significant upgrades and expansions, which has improved the performance, behavior analysis, threat detection, and several other features.

      So we’re really proud to offer this level of service and protection, while keeping the cost as low. That being said, we do realize the challenges facing smaller blogging and personal websites. And we are currently working on a partnership to make this enterprise-ready solution available to those websites. If that’s of interest to you, let us know.

      Or you can contact me directly: Sean [at] Distil.it

      Best of luck on your site,
      Sean Harmer

Leave a Reply

STAY CONNECTED