Linkrot Killing Blogger Citation?

When Liz from I Speak of Dreams posted a comment to my Citation Culture Clash article, she made a very interesting point that I had overlooked.

As the entry pointed out, hyperlinks have become the standard for citation in the online world, especially for bloggers. This format of citation has become so entrenched that the Creative Commons Organization even integrated a Uniform Resource Indicator (URI) requirement into the legal text of its licenses, codifying the practice.

Traditionalists, on the other hand, have often balked at the simple hyperlink as a means of attribution, looking instead to requirements put down by groups such as the MLA, despite the obvious advantages of linking cited pages.

But even though hyperlinks are much easier for both author and user, they do present a difficult challenge: Linkrot.

For, while nothing is ever truly deleted off of the Internet, it doesn’t necessarily stay in the same place. A link that provides perfect attribution today might, even without anyone  knowing, point to nothing at all tomorrow.

The Problem of Linkrot

The term "permalink" is something of a misnomer. While it’s true that the link will most likely remain valid long after the entry or item has slid off of the main page, it’s a far cry from permanent.

Sites close down, move to new locations, change their directory structure and remove content every day. While a permalink will likely be around for weeks or months, whether or not it will still be there years later remains debatable. While that might be fine for articles and entries only likely to be relevant for a few months, research papers and static sites that could be around for years need to take note.

The simple fact is that, the older a piece gets, it will have fewer and fewer working links. This severely reduces the effectiveness of hyperlink citation over time.

In short, authors get no attribution, save what is in the article itself, and reap no benefit from having their work reused. Furthermore, users are frustrated and authors using cited works lose their supporting evidence.

It’s a losing situation all around.

Beating Linkrot

The most obvious solution for beating linkrot is for Webmasters to simply not let links go bad. If Webmasters never closed their sites, forwarded traffic when they changed URLs and generally never let links die, there would be no linkrot problem.

On the other hand, if authors citing works meticulously maintained their links and followed up swiftly on bad ones, the problem would be minimal at worst.

Of course, neither solution is practical. Sites will always go down and there will be too many links for anyone to effectively patrol. Even with link checking software to automate detection, the process of updating and maintaining hundreds or thousands of links can be very time prohibitive.

Sadly, there are no 100% effective ways to prevent linkrot, at least not right now. However, there are at least two ways to reduce the problem and, potentially, ease the frustrations.

Two Potential Solutions

The first and most obvious way to deal with the problem of linkrot is to use the Internet Archive (or, to a lesser degree, the Google cache) to help you maintain a cache of your cited pages.

Since the Internet Archive uses a standard format for its links, the process could be done automatically. For example, if I wanted to link to my own "Citation Culture Clash" article, I could do it in the following format: Citation Culture Clash (a)

In the example above, the link remains as it would normally with the addition of an "(a)" to the side for the archived link.

However, the example above also illustrated a critical flaw in the use of the Internet Archive for this purpose, it doesn’t grab everything. That is especially true for new content and some dynamic content. In fact, as of right now, nothing from Plagiarism Today is in the Internet Archive at all (though all of my other sites have been indexed fine).

A more refined solution would be to use WebCite to create cached versions of pages that you wish to reference. In that case, the link would look something like this: Citation Culture Clash (a)

In that example, the archive link works fine, pointing to a custom-made cached version of the original work. The frame around the archived version even contains critical information including a link pointing to the original, the date the cache was created and information about WebCite itself.

Unfortunately though, WebCite requires the author to manually input links he or she wishes to cache. This can be a very time-consuming process and, since the returned links are mostly gibberish, can also be an organizational nightmare. Though their bookmarklet reduces much of the burden, automation, such as through a WordPress Plugin, would be needed to make the process efficient enough for bloggers to seriously consider.

Of course, neither of these issues address the issue of what happens if either the Internet Archive or WebCite closed down. Though both have been very stable long-term establishments with no signs of going anywhere, the danger is always there.

But while neither of these methods might be able to eliminate link rot, they can certainly reduce it and its impact on readers, authors and researchers alike. A link format like the one described above would continue to drive almost all of the traffic to the original author, but also provide users with an alternative link in the event that the original one is down for some reason.

It’s not a perfect solution, but it’s at least a start.

Conclusions

While linkrot certainly is a major concern when citing sources via hyperlink, it should not be the end of the practice. There are already very simple ways to reduce and nearly eliminate the problem. While new software and new conventions may need to be drafted in order to encourage widespread use, the solutions and formats outlined earlier provide at least a foundation to start the dialog.

But even if we aren’t able to solve the problem of linkrot, there’s no reason to believe that switching to a more traditional style would offer any improvement. As almost any teacher knows, even with a complete research library and proper MLA citations in hand, there’s no guarantee that a work cited will be found. Books, magazines and journals get misfiled, removed and destroyed the same as links.

Sadly, the problem with linkrot isn’t so much a sign of a lack of permanence on the Web, but rather, a lack of permanence in information itself.

No matter what format you use, there’s a decent chance that, some day, your work will outlive some or all of the work cited in it. It’s a sad possibility every author should be ready for and at least try and prevent.

[tags]Plagiarism, Content Theft, Copyright Infringement, Webcite, Internet Archive, Linkrot[/tags]

Post comment as twitter logo facebook logo
Sort: Newest | Oldest

I dealt with the same problem in a pair of posts on my blog, Weblog Etiquette vs. Link Rot and Linkrot, Part Deux. The solution I've come up basically boils down to:Don't just link to your source, quote it and identify it.Even when I'm just tossing in a link, I try to either put the full title in the TITLE attribute (as Sergey suggested) or put a representative sentence in an HTML comment. If the site's moved, and I have a verbatim quote, I can often search for that quote and find the new location or a mirror.Unfortunately, I still have a bunch of links to dead stories from Associated Press, Reuters, CNN, etc. Ironically, links to personal blog entries seem to live longer on average than links to news articles, since news sites want you to pay for archive access and therefore remove their old articles.

I dealt with the same problem in a pair of posts on my blog, Weblog Etiquette vs. Link Rot and Linkrot, Part Deux. The solution I've come up basically boils down to: Don't just link to your source, quote it and identify it. Even when I'm just tossing in a link, I try to either put the full title in the TITLE attribute (as Sergey suggested) or put a representative sentence in an HTML comment. If the site's moved, and I have a verbatim quote, I can often search for that quote and find the new location or a mirror. Unfortunately, I still have a bunch of links to dead stories from Associated Press, Reuters, CNN, etc. Ironically, links to personal blog entries seem to live longer on average than links to news articles, since news sites want you to pay for archive access and therefore remove their old articles.

You could also use the link's "title" attribute to specify the work/author. Those will stay even after the link is rotten.

Did you know that Numly.com offers updatable permalinks in addition to copyright disclaimers? It's easy to use just follow the steps below: 1) Simply register your work with Numly.com.2) Receive a Numly Number which you can optionally add to your work with a verification link for copyright protection.3) Give out the following permalink to your article:http://go.numly.com/yournumlynumber4) If at anytime in the future your URL changes, simply update your Numly Number's reference URL on file with Numly and your content can always be tracked back to the source - even if it has had several URLs over time.Sign up today at Numly.com!

Did you know that Numly.com offers updatable permalinks in addition to copyright disclaimers? It's easy to use just follow the steps below:1) Simply register your work with Numly.com. 2) Receive a Numly Number which you can optionally add to your work with a verification link for copyright protection. 3) Give out the following permalink to your article: http://go.numly.com/yournumlynumber 4) If at anytime in the future your URL changes, simply update your Numly Number's reference URL on file with Numly and your content can always be tracked back to the source - even if it has had several URLs over time.Sign up today at Numly.com!

Thanks for the mention! In some of my posts, I've used material from small-town newspapers. I quote heavily--sometimes the whole article--from those sources. Why? They are less likely to have robust on-line archives.I didn't start my blog as a scholarly enterprise. I have learned as I've gone on. I'm aiming to get more precise about citations as I write. For example, including the reporter's name and the date a given article was published. The headline is not always stable, so it's an unreliable element in citation.

Thanks for the mention! In some of my posts, I've used material from small-town newspapers. I quote heavily--sometimes the whole article--from those sources. Why? They are less likely to have robust on-line archives. I didn't start my blog as a scholarly enterprise. I have learned as I've gone on. I'm aiming to get more precise about citations as I write. For example, including the reporter's name and the date a given article was published. The headline is not always stable, so it's an unreliable element in citation.

Michael, I couldn't agree more, as I said in the conclusion, MLA style doesn't guarantee that a work can be found, physical items disappear the same as electronic ones.It would be interesting though to conduct a study and see if link rot is faster or slower than the general loss of books and other physical sources. Logic would seem to dictate that linkrot would cause a faster deterioration, but logic is hardly scientific.

Linkrot may be prevalent, but it is not much different from real-world problems facing researchers. Research papers cite books not in all libraries, articles on online databases not accessible to the reader, and primary source data that cannot be seen by the general public. Primary sources are destroyed. Books are thrown out or sold at libraries. Articles are deleted from databases. I have a book in my bookcase in which the author heavily cited another author who heavily cited yet a third source. But I could not find the third source. I think the third source came from the 1800s, and I could not find any evidence online that the source existed any longer.

Michael, I couldn't agree more, as I said in the conclusion, MLA style doesn't guarantee that a work can be found, physical items disappear the same as electronic ones. It would be interesting though to conduct a study and see if link rot is faster or slower than the general loss of books and other physical sources. Logic would seem to dictate that linkrot would cause a faster deterioration, but logic is hardly scientific.

I dealt with the same problem in a pair of posts on my blog, Weblog Etiquette vs. Link Rot and Linkrot, Part Deux. The solution I've come up basically boils down to:
Don't just link to your source, quote it and identify it.
Even when I'm just tossing in a link, I try to either put the full title in the TITLE attribute (as Sergey suggested) or put a representative sentence in an HTML comment. If the site's moved, and I have a verbatim quote, I can often search for that quote and find the new location or a mirror.
Unfortunately, I still have a bunch of links to dead stories from Associated Press, Reuters, CNN, etc. Ironically, links to personal blog entries seem to live longer on average than links to news articles, since news sites want you to pay for archive access and therefore remove their old articles.

Did you know that Numly.com offers updatable permalinks in addition to copyright disclaimers? It's easy to use just follow the steps below:
1) Simply register your work with Numly.com.
2) Receive a Numly Number which you can optionally add to your work with a verification link for copyright protection.
3) Give out the following permalink to your article:
http://go.numly.com/yournumlynumber
4) If at anytime in the future your URL changes, simply update your Numly Number's reference URL on file with Numly and your content can always be tracked back to the source - even if it has had several URLs over time.
Sign up today at Numly.com!

Thanks for the mention! In some of my posts, I've used material from small-town newspapers. I quote heavily--sometimes the whole article--from those sources. Why? They are less likely to have robust on-line archives.
I didn't start my blog as a scholarly enterprise. I have learned as I've gone on. I'm aiming to get more precise about citations as I write. For example, including the reporter's name and the date a given article was published. The headline is not always stable, so it's an unreliable element in citation.

Michael,
I couldn't agree more, as I said in the conclusion, MLA style doesn't guarantee that a work can be found, physical items disappear the same as electronic ones.
It would be interesting though to conduct a study and see if link rot is faster or slower than the general loss of books and other physical sources.
Logic would seem to dictate that linkrot would cause a faster deterioration, but logic is hardly scientific.

I dealt with the same problem in a pair of posts on my blog, Weblog Etiquette vs. Link Rot and Linkrot, Part Deux. The solution I've come up basically boils down to:

Don't just link to your source, quote it and identify it.

Even when I'm just tossing in a link, I try to either put the full title in the TITLE attribute (as Sergey suggested) or put a representative sentence in an HTML comment. If the site's moved, and I have a verbatim quote, I can often search for that quote and find the new location or a mirror.

Unfortunately, I still have a bunch of links to dead stories from Associated Press, Reuters, CNN, etc. Ironically, links to personal blog entries seem to live longer on average than links to news articles, since news sites want you to pay for archive access and therefore remove their old articles.

Thanks for the mention! In some of my posts, I've used material from small-town newspapers. I quote heavily--sometimes the whole article--from those sources. Why? They are less likely to have robust on-line archives.

I didn't start my blog as a scholarly enterprise. I have learned as I've gone on. I'm aiming to get more precise about citations as I write. For example, including the reporter's name and the date a given article was published. The headline is not always stable, so it's an unreliable element in citation.

Michael,

I couldn't agree more, as I said in the conclusion, MLA style doesn't guarantee that a work can be found, physical items disappear the same as electronic ones.

It would be interesting though to conduct a study and see if link rot is faster or slower than the general loss of books and other physical sources.

Logic would seem to dictate that linkrot would cause a faster deterioration, but logic is hardly scientific.

I dealt with the same problem in a pair of posts on my blog, Weblog Etiquette vs. Link Rot and Linkrot, Part Deux. The solution I've come up basically boils down to:

Don't just link to your source, quote it and identify it.

Even when I'm just tossing in a link, I try to either put the full title in the TITLE attribute (as Sergey suggested) or put a representative sentence in an HTML comment. If the site's moved, and I have a verbatim quote, I can often search for that quote and find the new location or a mirror.

Unfortunately, I still have a bunch of links to dead stories from Associated Press, Reuters, CNN, etc. Ironically, links to personal blog entries seem to live longer on average than links to news articles, since news sites want you to pay for archive access and therefore remove their old articles.

I dealt with the same problem in a pair of posts on my blog, Weblog Etiquette vs. Link Rot and Linkrot, Part Deux. The solution I've come up basically boils down to:

Don't just link to your source, quote it and identify it.

Even when I'm just tossing in a link, I try to either put the full title in the TITLE attribute (as Sergey suggested) or put a representative sentence in an HTML comment. If the site's moved, and I have a verbatim quote, I can often search for that quote and find the new location or a mirror.

Unfortunately, I still have a bunch of links to dead stories from Associated Press, Reuters, CNN, etc. Ironically, links to personal blog entries seem to live longer on average than links to news articles, since news sites want you to pay for archive access and therefore remove their old articles.

You could also use the link's "title" attribute to specify the work/author. Those will stay even after the link is rotten.

You could also use the link's "title" attribute to specify the work/author. Those will stay even after the link is rotten.

Did you know that Numly.com offers updatable permalinks in addition to copyright disclaimers? It's easy to use just follow the steps below:

1) Simply register your work with Numly.com.
2) Receive a Numly Number which you can optionally add to your work with a verification link for copyright protection.
3) Give out the following permalink to your article:
http://go.numly.com/yournumlynumber
4) If at anytime in the future your URL changes, simply update your Numly Number's reference URL on file with Numly and your content can always be tracked back to the source - even if it has had several URLs over time.

Sign up today at Numly.com!

Did you know that Numly.com offers updatable permalinks in addition to copyright disclaimers? It's easy to use just follow the steps below:

1) Simply register your work with Numly.com.
2) Receive a Numly Number which you can optionally add to your work with a verification link for copyright protection.
3) Give out the following permalink to your article:
http://go.numly.com/yournumlynumber
4) If at anytime in the future your URL changes, simply update your Numly Number's reference URL on file with Numly and your content can always be tracked back to the source - even if it has had several URLs over time.

Sign up today at Numly.com!

Thanks for the mention! In some of my posts, I've used material from small-town newspapers. I quote heavily--sometimes the whole article--from those sources. Why? They are less likely to have robust on-line archives.

I didn't start my blog as a scholarly enterprise. I have learned as I've gone on. I'm aiming to get more precise about citations as I write. For example, including the reporter's name and the date a given article was published. The headline is not always stable, so it's an unreliable element in citation.

Thanks for the mention! In some of my posts, I've used material from small-town newspapers. I quote heavily--sometimes the whole article--from those sources. Why? They are less likely to have robust on-line archives.

I didn't start my blog as a scholarly enterprise. I have learned as I've gone on. I'm aiming to get more precise about citations as I write. For example, including the reporter's name and the date a given article was published. The headline is not always stable, so it's an unreliable element in citation.

Michael,

I couldn't agree more, as I said in the conclusion, MLA style doesn't guarantee that a work can be found, physical items disappear the same as electronic ones.

It would be interesting though to conduct a study and see if link rot is faster or slower than the general loss of books and other physical sources.

Logic would seem to dictate that linkrot would cause a faster deterioration, but logic is hardly scientific.

Michael,

I couldn't agree more, as I said in the conclusion, MLA style doesn't guarantee that a work can be found, physical items disappear the same as electronic ones.

It would be interesting though to conduct a study and see if link rot is faster or slower than the general loss of books and other physical sources.

Logic would seem to dictate that linkrot would cause a faster deterioration, but logic is hardly scientific.

Linkrot may be prevalent, but it is not much different from real-world problems facing researchers. Research papers cite books not in all libraries, articles on online databases not accessible to the reader, and primary source data that cannot be seen by the general public. Primary sources are destroyed. Books are thrown out or sold at libraries. Articles are deleted from databases. I have a book in my bookcase in which the author heavily cited another author who heavily cited yet a third source. But I could not find the third source. I think the third source came from the 1800s, and I could not find any evidence online that the source existed any longer.

Linkrot may be prevalent, but it is not much different from real-world problems facing researchers. Research papers cite books not in all libraries, articles on online databases not accessible to the reader, and primary source data that cannot be seen by the general public. Primary sources are destroyed. Books are thrown out or sold at libraries. Articles are deleted from databases. I have a book in my bookcase in which the author heavily cited another author who heavily cited yet a third source. But I could not find the third source. I think the third source came from the 1800s, and I could not find any evidence online that the source existed any longer.

Trackbacks

  1. [...] Linkrot And Blogger Citation from Plagiarism Today – I can see dead links may be an unthought of issue. [...]