An Inside Look at iCopyright Discovery

By Jonathan Bailey • Sep 30th, 2008 • Category: Articles, Products

icopyright-logo.pngEarlier this month, I reported on iCopyright’s new content tracking tool Discovery. At that point, I only had the information provided in the press release for the service.

However, last week, Mike O’Donnell, the President and CEO of iCopyright, was kind enough to give me a guided tour of the backend. Though I wasn’t able to access anything hands on or experiment with the technology with my own content, that will have to wait until the service is available for iCopyright for Creators users, I was able to see what the service does, how it works and what it can do.

So here is a brief look at what the iCopyright Discovery system can do and how it will likely look when it is available for Creators users shortly. Please bear in mind that this is not a review, just a tour of the key features of the service.

The Basic Premise

The big idea of Discovery is this: Discovery parses your content as you put it up on the Web, accessing either a created XML file or your RSS feed, and then searches for copies of it on the Web.

The service then searches for matches of your content, highlighting ones that it determines to be the most important, and gives you options for remedying the situation. Among the actions it can perform are removal requests, which fundamentally DMCA notices, license requests, which goes through iCopyright’s existing licensing system, and forwarding to legal counsel.

This idea is fundamentally very similar to Attributor and Blogwerx, both of which are still in private testing. However, the execution of the system is going to be what is important. On that front, iCopyright has devised an interesting workflow system that seems to string the process together very well.

Setting Up Discovery

When a user first signs in to Discovery, the first page they’re likely going to head to is, oddly enough, the “Settings” page. The reason for this is that, without visiting the settings page, you have little control over the matches you see and you can’t use several of the remedy options.

From this page, you can set your enforcement agency, useful if you are part of a group that handles your copyright enforcement, and the email address to your legal counsel. This will let you enable addition redress steps down the road. However, the most important settings are the search sensitivity and risk assessment as they determine the matches you see down the road.

The search sensitivity feature allows users to tell Discovery how many matches they want. They can set it so that only the worst matches appear in the system or so that they see almost everything. This is done by tweaking the minimum match ratio, meaning how much of the original work must appear in the copy, the minimum risk factor, discussed below, the minimum site activity and the minimum number of copied words that must appear in the match, useful for sites with short posts.

The Risk Assessment tool is easily one of the most interesting features in iCopyright Discovery. It lets users set the criteria for determining how much of a risk a match site is. You do that by setting sliders for Unique Visitors, which looks at the estimated traffic of the site, the number of inbound links, whether the site displays ads or how much of the content it copies.

These sliders are intended to be abstract in nature and are used to indicate which attributes are more important than others. For example, if you set all to 10, they would be weighed equally. However, if you put one at 5 and the others at 10, the first one would be weighed much less.

These attributes, when combined with the site’s actual use of the content, are used to determine the risk level of the site itself. This, in turn, plays a major role in determining the priority the site is given when analyzing suspect pages.

Sorting Matches

Once you are done telling Discovery what matches you want to see, the system then does a refresh, which takes about an hour according to O’Donnell, and you can then view your matches or “suspects”.

The match sort is organized by a combination of variables, focusing heavily on suspect pages with the highest risk. For each suspect, the system displays the URL of the work, whether it displays ads, whether it links back to your site, roughly how many visitors it gets, the number of inbound links to the site, the match percentage and the risk.

From this page, you can go through the matches and either archive the match, which functions similar to Gmail’s archive function and takes no action, move it to the Whitelist, either pending or approved, or send it to the redress list.

If a site is moved to the whitelist, that means that the use is licensed and future matches from the site will be ignored. You have the option of telling the system to either ignore matches on the URL, the subdomain or the entire domain.

If you move it to the redress list, you can then take further action on the match, including licensing the work or filing a removal demand.

Taking Action

The redress list, as you see below, looks very similar to the suspect list and contains much of the same information. However, the options for what one can do with a suspect are different on this page.

From this page, you can then either offer the site a license, which will send out an email encouraging the site admin to go through the existing iCopyright system, file a link request or send a removal notice.

Removal notices, fundamentally, are DMCA notices though they are written so that, at this stage, they can be sent to Webmasters directly. Link requests are more like informal license offers, but ones where the only stipulation is a link back.

All of the letter types are fully customizable and Discover offers a templating system that lets you build your own letter that automatically inserts the necessary information.

Once you file a redress, you can then track the status of it in the Redress Offers Status page. From there, it will let you know if the redress has been completed and, if it hasn’t, makes it available to be escalated.

If a suspect match is moved to the escalation list, then the user has a whole new series of options for how to deal with the site.

The options include the ability to, forward the situation to your legal counsel (if set up), notify the ISP, which sends a more traditional DMCA notice, notify the enforcement agency (if set up), send a notice to the ad network or demand removal from the search engines.

All in all, the initial Redress List can be looked at as the cease and desist/licensing phase where the Escalation List deals more with the DMCA/lawyer phase.

However, no matter what redress steps you take, Discovery offers a powerful means to track and monitor the progress of the steps that you took.

Tracking and Monitoring

Once you’ve taken a redress action against a suspect site, you can then track and monitor everything that has to do with that particular match.

It provides much more than just a brief history of what has taken place, giving a detailed history of every email sent, comments left in the system, both automatic ones and ones left by the user, as well as other information about the site.

The idea is to maintain a record of every action, including emails, phone calls and other steps, for the purpose of aiding in any potential legal case.

Once the matter is resolved, escalated outside of the system or the match is whitelisted, the case can be archived and thus removed from the suspect pool, allowing you to move on to other matches.

Some personal thoughts

It is very hard for me to offer any real review of the service. Without actually being hands on with the service and using it against my own content, there is not much that I can do.

Right now there are many unknowns for me, including the following:

  1. Match Detection: O’Donnell has said they are partnering with a major search provider to perform the detection but it remains to be seen how effective it is. Match detection is not easy, even with a big search partner, as Copyscape showed. The system will not be of much use if its match detection is not the best in its class.
  2. Resolution Assistance: The hardest part about stopping a plagiarist is not composing the letter, but finding who to send it to. It is easily the biggest time sink in most of my cases and is the number one reason people approach me for help. It remains to be seen how effectively Discovery helps with this process.
  3. Speed/Usability: Obviously, without actually using the system, I can’t tell how fast it moves and how much time it will save you. If the system is sluggish or error-prone, it could greatly hurt its usefulness.

This is not to say that these things are wrong with the current system, just that I don’t know right now and won’t until I can do a full review, likely later this year.

However, judging from what I can see, the system is very impressive. It looks very good, has a solid workflow built into it, though I somewhat disagree with having the ISP step be only available in the escalation section, and seems to be built with the user in mind.

What I like best about Discovery is how the user customizes the system to fit their needs, with their own definitions of what matches to worry about, their own letters and their own general strategy. Any such system should focus on automating what can be automated, but leaving the big decisions to the copyright holder.

What does worry me some is that the system is clearly geared toward larger clients. Discovery is designed to allow for multiple users to access an account and to work with attorneys as well as other rights enforcers. While those are great features for those that need them, it remains to be seen how the system will strip down for smaller copyright holders.

The other downside is that, according to O’Donnell, the version of Discovery for Creators will come with some kind of fee. Though pricing structure has not been discussed, he seemed confident that it would not be available for free.

Still, as these screenshots show, there is a lot to like in the Discovery system and the solution it promises.

It has a great deal of potential and Webmasters who are worried about tracking how their content is used should definitely take a serious look at what iCopyright has to offer.

Conclusions

There’s a lot of reason for me to be excited about the upcoming Discovery system. However, I have to restrain that excitement until I can use the system first hand and see both how effective it is and how smooth the process is.

No matter what though, I am happy to see that people are thinking about these issues and coming up with solutions. This has been a booming industry over the past few years and a lot of very smart companies are already involved and I am happy to be working in this field.

No matter what Discovery itself brings, it can only signal great things for copyright holders and Webmasters. Hopefully, this will help content creators not just enforce their rights, but understand how their work is being reused and encourage the kind of sharing that helps all involved.

Knowledge and tools can only help improve things, so long as those who use them do so wisely.

Short URL to this Post: http://copybyte.com/z/7t

Jonathan Bailey is The Webmaster and author of Plagiarism Today, which he founded in 2005 as a way to help Webmasters going through content theft problems get accurate information and stay up to date on the rapidly-changing field. He is also a consultant to Webmasters and companies to help them devise practical content protection strategies and develop good copyright policies.
Email this author | All posts by Jonathan Bailey

  • Jonathan, a thorough and balanced write-up as always. We can set you up in Conductor, the iCopyright system for Publishers. That would allow you to use Discovery on your content. We do hope to port Discovery to the Creators system in the near future. You're right, for now it is limited to publishers who supply us with an XML feed. A couple of follow up points:

    Match Detection -- we do our own "fingerprinting" of the content. We do use a major search engine to find matches. No need to reinvent the wheel. The big search engines have indexed more pages and have better spiders than we could build.

    Resolution Assistance -- i think Discovery really shines here. It captures various points of contact for the site and allows notices to be sent to some or all of these contacts. Discovery will find the right people. At a minimum, it will find the host ISP and serve them.

    Speed/Usability -- the speed of identifying matches and sending redresses and following up to see if the site took the required action is very good. Where Discovery could use some improvement is doing this automatically so that the publisher does not have to review and act on each suspect individually. We are working on letting the publisher pre-define rules and policies for letting Discovery ID the sites, send redresses and tahe escalation action when appropriate, without human intervention.

    The objective of Discovery is to verify legitimate users and to identify non-legit users so that they become legitimate users. It's not as much about getting sites to stop using content -- although Discovery can do that. It's about enabling sites to use content in a way that compensates the publisher, gives them credit and brings them new traffic. A license or a link action is more valuable than a take-down action!
  • I would definitely be interested in using Discovery on my content. I really like what I see so far but until I use it first hand it is hard to tell. Thank you for answering my questions. I'll be in touch about setting up a Discovery account for myself to do hands-on review.

    Regarding match detection, I agree that there is not much point in reinventing the wheel but, at the same time, I'm not ready to call search a solved problem. Any time you partner with a third party search, as I found out using other products, you share the limitations they have. There's good and bad to that approach though, usually the good does outweigh the bad.

    Resolution Assistance is a tough art in general. This is one thing I'll be looking at closely. I have a pretty big virtual roledex of DMCA agents that I've compiled over the years. If this can be worked out and automated, it will be worth almost anyprice.

    As far as speed goes, I think the main goal right now is to be faster than doing it by hand and, barring any major server issues, It think you will be that. However, I get nervous when I hear about people automating resolution efforts. That is how you get problems such as the YouTube debacles and the recent AP Drudge Retort controversy. I guess I'm just asking that you move with caution into that area.

    Finally, I agree that links and licenses are more valuable. The only issue right now is that there is no legal system. With my personal resolution efforts, my link request efforts have averaged about 50% resolution, DMCA about 95%.

    Hope that helps!
blog comments powered by Disqus