Epic Fail…

My favorite error
Creative Commons License photo credit: xaminmo

Warning: This is not a traditional post on Plagiarism Today. This post recounts the events during an extended outage of the site including why the site went down and how I got it back online. If you aren’t interested in this kind of geeky/nerdy stuff, please feel free to skip. If you are, well, I hope it helps you in some way.

Before I begin, I want to make it clear that Plagiarism Today was NOT hacked. A few people wrote me during the downtime to ask if I had fallen down to one of much-talked about WordPress exploits. The answer is no. Though I am sure it is a possibility if someone were dedicated enough, it is not what happened in this case.

Second, I want to thank all of the readers that reached out to me either via email or IM during this outage. Your support and encouragement was a great help and I greatly appreciate all of your kind offers of assistance. It meant more than I can say.

With that in mind, here is the full story of what happened over the course of the past 24 hours or so and where the site sits right now.

First Signs of Trouble

At about ten o’clock on June 12 (central time), I pulled up the site but discovered that Plagiarism Today was responding with an error 500. Having just recently moved the site to a VPS host named VPSLink, I tried to login to my site’s control panel and restart the server.

However, when I tried to load the control panel, which was on the same server, I received another error 500. Realizing it was more serious than previously thought, I jumped onto my hosting control panel, where I managed the entire server, and tried to do the reboot there. However, there, the reboot process froze halfway and never completed.

At this point, I assumed that the problem was still software related and put in a tech support ticket to have the techs do a hard reboot of the server. However, over 40 minutes passed without a response and nothing changed on the server.

It was only after nearly an hour that the techs answered my support request and they did so by deleting the one I had filed without offering any comment or explanation.

Worried, and upset, I filed an “Emergency” server outage report and risked paying extra in the event that the fault for the outage turned out to be mine. I waited for a response and, after about thirty minutes, one of the representatives informed me that there had been an error in a “hardware node” of my server and that they were looking into it.

Realizing that the error was not my fault and there was nothing I could do right then, I took a break to watch some television and spend some time with my wife. When I returned an hour or two later, my host had put up a post in the forums, several hours after the incident had happened, offering more details about the problem.

Apparently one of the hard drives had gone bad and the system was being rebooted and they were running FSCK over the drive to repair the damage.

Unfortunately, several hours later, they updated the post to inform us that the entire RAID array had gone bad and the entire node was crippled. They had a complete backup from 6/10 and an “incremental” one from the night before. They set about replacing the part and restoring from the backup, a process that would take many hours.

Unable to do anything other than transfer my email to a new host (see below), I went to bed and trusted that the restoring from backup would save the site. Unfortunately, that wasn’t the case.

A Bad Morning

When I woke up at a little bit after eight local time, after getting just three or four hours of sleep, I was stunned to find that the site was still down. I posted a reply to the forums asking for an update and they quickly responded by saying that the backups had been restored and that they were rebooting the servers.

Unfortunately, Plagiarism Today was not restored. The main site was returning a “Forbidden” error and the control panel was not loading anything. I attempted several reboots using the hosting manager but to no avail.

The only service available to PT was SSH so I could login and grab files, but nothing else was available.

I wrote tech support to ask what to do and they told me to use SSH to backup important files and then reinitialize the entire server, starting with the operating system install.

This would have meant completely wiping out and everything, including the DNS information, the databases, the WordPress installs and rebuilding it all from the ground up using backups or fresh installations.

As I saw it then. There were three problems with that idea.

  1. The backups I had were questionable at best. If the backups had been perfect, the site would be up, at least theoretically.

  2. The process had many complicated steps that had to be done in order. Easy when you’re working on a second server with a functioning site, much harder when the clock is ticking.

  3. An advisory from the host said the main control panel, the one needed to initialize the server, would not work for a bit and no indication about when it would be available.

I decided it was time to do something else…

Saving the Day

Media Temple LogoThe first move I made was actually taken the night before. While I was waiting for word back from tech support, I decided to stop the email outage. The DNS servers were still working so I created a Google Apps account for the site and directed all of my email there.

With that done, Google was effectively hosting my email, not just receiving the forwards from my own server and, by eliminating the middle man, I could send and receive email again.

However, when after the server came back but the site did not, I realized I had to do something fast to prevent things from getting worse.

Since the move to VPSLink was recent, I had a near-perfect duplicate copy of the site on Media Temple. I was also fortunate to have an automatic database backup in my email box from just a few hours before the downtime began.

So, using the backups, I updated the Media Temple database on my Mac while using my Windows PC to SSH to the current site and download the plugins, theme changes and images that were newer than about two months.

It was at this point, I noticed how incomplete the backups were. Though it had most of the images, the dates on the most recent images were from the seventh. Even after uploading all of the downloaded files, the front page was still a mess of broken images.

However, I was able to work around that. Since I had used Skitch to upload the screenshots and it keeps a history of the files it puts up, I was able to rebuild the images folder and get everything back online.

At this point, the site was pretty much back together so I changed the DNS servers to point back to Media Temple and started cleaning up a few odds and ends. After uploading a few additional folders, updating a few plugins and activating a few others, PT seems to be back mostly in working order and, bit by bit, the DNS is propagating out and the site is coming back to life.

Aftermath

Looking back at what happened, I am still trying to assess the damage. A few things I know are broken/lost.

  • Lost Comments: Three or four comments, including one of mine, were made between the time the database backed up and the server crashed. Those comments are lost. I may be able to bring them back from the emails I get, but I’ll look into that later.

  • Contact Form: The contact form is borked currently, I’m re-installing the plugin now and should have it working within an hour or two (Update: Should be working now)

  • Lost Email: Though I made the switch to Google Apps pretty quick, there is at least some mail that was lost in the process as it was sent after the outage but before the transfer. Depending on how the DNS servers played out, it could have been many hours after the outage began.

  • Sixteen Hour Downtime: The big one for me is that there was a whopping sixteen hours of downtime, possibly more or less depending on how fast the DNS changes made their way to your area.

All totaled, the loses were not major but could have been catastrophic.

Lessons Learned

Looking back at it, this was probably the worst of all possible “natural” disasters that can happen to a Web site. It is hard to think of anything that is more catastrophic than a complete storage failure followed by a bad backup.

If there are any Webmasters wanting some advice on these situations, well, here is what I would say.

  1. Backup, Backup, Backup: My backups saved me. If it hadn’t been for my database backups emailed to me every day, I would have been in much bigger trouble. At best, I would have lost several days of posts and waited several more hours to come online. The only thing not adequately backed up were the images used in the stories and the front page, something I am fixing now.

  2. Exit Strategy: Having a near-perfect mirror of the site offline saved a lot of time and work. It wasn’t really a planned exit strategy, the account was due to be shut down in a few weeks, but it worked as one. If nothing else, having a backup account and host set up can save a great deal of time.

  3. Know How Your Software Works: I was also fortunate that I knew how to install and set up WordPress without assistance. Though I didn’t need to set up a whole new installation, I did have to update this one. With so many hosts offering “one click” installs, I wonder how many Webmasters know how to set up their software should something go wrong.

Some Personal Thoughts

Right now, after having had some time to ponder what happened, I am very upset with VPSLink and the way it was handled. I am going to write a letter to them later that addresses my grievances more clearly (this is more about explaining the outage to you than venting my frustrations), I have to say that I feel as if the situation was handled poorly.

I am not upset about the outage itself. I have been around computers long enough to know that things break. However, the outage could have been handled a great deal better.

First, the company deleted my initial support request without offering any explanation to why they wouldn’t reboot the sever (I know now they couldn’t). Second, they waited several hours to post anything about the outage on their forums. Third, they only posted two updates on the outage over the course of almost 24 hours. When the backup failed to restore the site, their best advice was to start from scratch.

I realized when I started using an unmanaged VPS that I was going to be responsible for my own actions and mistakes. There were plenty of ways I could destroy my own site and not get any support. However, I did not realize that support would falter so badly when the error was on their end as well.

In the end, I am going to be much more choosy about the hosts that I use from now on. I am actively seeking hosting recommendations though I may just stick with Media Temple for a while.

Conclusions

In the end, the reason I write this is that I know well there are a lot of Webmasters and bloggers who read this site. Though you come here for information about dealing with content theft and copyright issues, my hope is that maybe this experience can help in other ways.

I feel comfortable saying that Plagiarism Today will survive and overcome this problem. However, that is because of the wonderful readers and community that has developed around this site over the years.

Thank you all for your understanding and support, it has meant more to me than you probably realize.

I hope you have a great weekend.

41 Responses to Epic Fail…

  1. Hi Jonathan,

    Sorry that you had to go through all that. I am glad that I am not in the same league. I have now put up my blog finally and have started to post some articles too. They are not too important, and though I hope that nothing will go wrong, I hope to be able to handle it, if it happens in systematic manner. I have copy pasted your post on to a word document and saved it for posterity!

  2. Hi Jonathan,Sorry that you had to go through all that. I am glad that I am not in the same league. I have now put up my blog finally and have started to post some articles too. They are not too important, and though I hope that nothing will go wrong, I hope to be able to handle it, if it happens in systematic manner. I have copy pasted your post on to a word document and saved it for posterity!

  3. @Ramana Rajgopaul
    Ramana: Probably a good move. I don't think I'm going to have this kind of issue again but one never knows. On that note, if my SQL databases ever completely screw up, I know where to turn…

  4. @Ramana Rajgopaul -
    Ramana: Probably a good move. I don’t think I’m going to have this kind of issue again but one never knows. On that note, if my SQL databases ever completely screw up, I know where to turn…

  5. Sue says:

    Hello Jonathan. So sorry to hear of your recent troubles. Been there, done that, but without a recent backup. Over a month of posts had to be recreated. :(

    If you want to check out a company that gives such great, fast, polite and efficient service, I would highly recommend Known Host. There is not one single bad review at Webhostingtalk.com for these guys, and based on that, and the fact all the reviews were excellent in regards to their service, I went with them after problems with my previous shared hosting.

    Right now, they are running a great special (http://knownhost.com/specials.html) similar to the one I took, and I can tell you right now these guys are great. If you order cPanel or Plesk or Direct Admin ($5 more per month for the license), you get a fully managed VPS. When I say fully managed, I mean it. They go out of their way to help you. There is no reason for them to help with WordPress, but when they transferred my site over, there were issues with permissions and ownerships. They fixed it all.

    If I open a ticket, I swear these guys have it fixed before I send it. You cannot go wrong with the knowledgeable service these guys (and gals) provide. I asked about installing something, and the reply that it was done came in the same mail as the autoreply to the ticket, in a five minute check.

    If I sound like I'm a salesperson for them, it's because I'm so happy with them. Uptime since I moved there is 100%, and their forum has a place for updates and issues on each datacenter (they use three…you choose which you'd like). And it's kept very much up to date if an issue arises.

    Hope things work out for you, and do definitely check out knownhost.

  6. Sue says:

    Hello Jonathan. So sorry to hear of your recent troubles. Been there, done that, but without a recent backup. Over a month of posts had to be recreated. :(

    If you want to check out a company that gives such great, fast, polite and efficient service, I would highly recommend Known Host. There is not one single bad review at Webhostingtalk.com for these guys, and based on that, and the fact all the reviews were excellent in regards to their service, I went with them after problems with my previous shared hosting.

    Right now, they are running a great special (http://knownhost.com/specials.html) similar to the one I took, and I can tell you right now these guys are great. If you order cPanel or Plesk or Direct Admin ($5 more per month for the license), you get a fully managed VPS. When I say fully managed, I mean it. They go out of their way to help you. There is no reason for them to help with WordPress, but when they transferred my site over, there were issues with permissions and ownerships. They fixed it all.

    If I open a ticket, I swear these guys have it fixed before I send it. You cannot go wrong with the knowledgeable service these guys (and gals) provide. I asked about installing something, and the reply that it was done came in the same mail as the autoreply to the ticket, in a five minute check.

    If I sound like I’m a salesperson for them, it’s because I’m so happy with them. Uptime since I moved there is 100%, and their forum has a place for updates and issues on each datacenter (they use three…you choose which you’d like). And it’s kept very much up to date if an issue arises.

    Hope things work out for you, and do definitely check out knownhost.

    • @Sue
      Ouch. I can't imagine if I had had to recreate a whole month of posts. I'll just be glad that things went as smoothly as they did, count my blessings and move on. Things clearly could have been a LOT worse.

      Thank you for the hosting suggestion. I am considering a great deal of hosts right now. Known Host is definitely on the list. Right now I'm still weighing options. One of the big problems is that I want a host that can host all of the sites in my family, including my wife's and mine as well as some of my client's sites.

      So, right now I'm weighing my decision on a lot of variables but I should have an idea of where I am going in a week or two.

      Thank you very much for the suggestion!

  7. Sue says:

    Hello Jonathan. So sorry to hear of your recent troubles. Been there, done that, but without a recent backup. Over a month of posts had to be recreated. :(

    If you want to check out a company that gives such great, fast, polite and efficient service, I would highly recommend Known Host. There is not one single bad review at Webhostingtalk.com for these guys, and based on that, and the fact all the reviews were excellent in regards to their service, I went with them after problems with my previous shared hosting.

    Right now, they are running a great special (http://knownhost.com/specials.html) similar to the one I took, and I can tell you right now these guys are great. If you order cPanel or Plesk or Direct Admin ($5 more per month for the license), you get a fully managed VPS. When I say fully managed, I mean it. They go out of their way to help you. There is no reason for them to help with WordPress, but when they transferred my site over, there were issues with permissions and ownerships. They fixed it all.

    If I open a ticket, I swear these guys have it fixed before I send it. You cannot go wrong with the knowledgeable service these guys (and gals) provide. I asked about installing something, and the reply that it was done came in the same mail as the autoreply to the ticket, in a five minute check.

    If I sound like I’m a salesperson for them, it’s because I’m so happy with them. Uptime since I moved there is 100%, and their forum has a place for updates and issues on each datacenter (they use three…you choose which you’d like). And it’s kept very much up to date if an issue arises.

    Hope things work out for you, and do definitely check out knownhost.

  8. @Sue
    Ouch. I can't imagine if I had had to recreate a whole month of posts. I'll just be glad that things went as smoothly as they did, count my blessings and move on. Things clearly could have been a LOT worse.

    Thank you for the hosting suggestion. I am considering a great deal of hosts right now. Known Host is definitely on the list. Right now I'm still weighing options. One of the big problems is that I want a host that can host all of the sites in my family, including my wife's and mine as well as some of my client's sites.

    So, right now I'm weighing my decision on a lot of variables but I should have an idea of where I am going in a week or two.

    Thank you very much for the suggestion!

  9. @Sue -
    Ouch. I can’t imagine if I had had to recreate a whole month of posts. I’ll just be glad that things went as smoothly as they did, count my blessings and move on. Things clearly could have been a LOT worse.

    Thank you for the hosting suggestion. I am considering a great deal of hosts right now. Known Host is definitely on the list. Right now I’m still weighing options. One of the big problems is that I want a host that can host all of the sites in my family, including my wife’s and mine as well as some of my client’s sites.

    So, right now I’m weighing my decision on a lot of variables but I should have an idea of where I am going in a week or two.

    Thank you very much for the suggestion!

  10. Girl says:

    Ugh, what a nightmare! So sorry to hear this Jonathan, I can’t imagine how stressful this must have been. Good to see the site back up and running.

    • @Girl
      It was a nightmare but I learned a great deal from it and I am back up now. All in all, it has made me much stronger and a better Webmaster, in my opinion. However, I hope that I never have to showcase what I've learned…

  11. Girl says:

    Ugh, what a nightmare! So sorry to hear this Jonathan, I can’t imagine how stressful this must have been. Good to see the site back up and running.

  12. @Girl
    It was a nightmare but I learned a great deal from it and I am back up now. All in all, it has made me much stronger and a better Webmaster, in my opinion. However, I hope that I never have to showcase what I've learned…

  13. @Girl -
    It was a nightmare but I learned a great deal from it and I am back up now. All in all, it has made me much stronger and a better Webmaster, in my opinion. However, I hope that I never have to showcase what I’ve learned…

  14. Jonathan, sorry to hear that you had such issues. I am glad you were able to restore the site. Just shows the importance of keeping up to date on backups.

  15. Why backups are extremely important…

    Jonathan Bailey’s site, Plagiarism Today, appears to have had a few problems with his previous host.  Jonathan’s story just shows the extreme importance of keeping a complete backup of one’s online data and website.  He also points us to a…

  16. Why backups are extremely important…

    Jonathan Bailey’s site, Plagiarism Today, appears to have had a few problems with his previous host.  Jonathan’s story just shows the extreme importance of keeping a complete backup of one’s online data and website.  He also points us to a…

  17. @Dr. Mike Wendell
    These things happen sadly. It was my first and only major blow up in over 12 years of hosting Web sites, but it was bound to happen. Hopefully I'll get another 12 before it happens again.

  18. @Dr. Mike Wendell
    These things happen sadly. It was my first and only major blow up in over 12 years of hosting Web sites, but it was bound to happen. Hopefully I'll get another 12 before it happens again.

  19. @Dr. Mike Wendell -
    These things happen sadly. It was my first and only major blow up in over 12 years of hosting Web sites, but it was bound to happen. Hopefully I’ll get another 12 before it happens again.

  20. @Jonathan Bailey – It's lots of fun from the other end as well. Best money I ever spent was for three Apple XServe Raid backup boxes. Saved my butt many times over.

    As an aside, what's the plugin here that does the "reply – quote" links? It's very handy.

  21. @Jonathan Bailey – It’s lots of fun from the other end as well. Best money I ever spent was for three Apple XServe Raid backup boxes. Saved my butt many times over.

    As an aside, what’s the plugin here that does the “reply – quote” links? It’s very handy.

  22. @Dr. Mike Wendell -
    I’ll have to remember that. I just have a good old-fashioned external HD for that right now. Probably should look at upgrading.

    Regarding the plugin, it’s WP-Comment Remix. You can find it here: http://wordpress.org/extend/plugins/wp-comment-

    I might move the comments over to Disqus though. Do you have any thoughts on that?

  23. Jonathan,
    Wow, that sounds pretty sucktastic — glad you were able to come out of it relatively unscathed and with some new skills. I experienced my own bout of epic faildom (though it was completely, totally my fault — http://www.christinawarren.com/2008/04/05/when-your-blog-goes-down/) back in April and understand the frustration and importance of backups all too well.

    As for hosts — I’m with Media Temple and I’m quite happy (I’m on the (gs) and I manage my dad’s business site on a (dv)) with the performance (after some issues right after I joined with the Grid, they appear to have almost everything worked out and are transitioning to a new system and the support (supremely helpful and nice). Having said that, I have heard nothing but wonderful, fantastic, like mind-blowing things about Slicehost (http://www.slicehost.com) if you want to manage a VPS by yourself. They are just going to spit you at a command line (you choose the linux distro, you get the vanilla minimal install and then install whatever server or stack you want to use), but if you are comfortable with that, everyone I know who uses them is supremely happy.

  24. Jonathan,
    Wow, that sounds pretty sucktastic — glad you were able to come out of it relatively unscathed and with some new skills. I experienced my own bout of epic faildom (though it was completely, totally my fault — http://www.christinawarren.com/2008/04/05/when-…) back in April and understand the frustration and importance of backups all too well.

    As for hosts — I’m with Media Temple and I’m quite happy (I’m on the (gs) and I manage my dad’s business site on a (dv)) with the performance (after some issues right after I joined with the Grid, they appear to have almost everything worked out and are transitioning to a new system and the support (supremely helpful and nice). Having said that, I have heard nothing but wonderful, fantastic, like mind-blowing things about Slicehost (http://www.slicehost.com) if you want to manage a VPS by yourself. They are just going to spit you at a command line (you choose the linux distro, you get the vanilla minimal install and then install whatever server or stack you want to use), but if you are comfortable with that, everyone I know who uses them is supremely happy.

  25. @Christina Warren

    Somehow, I feel much better knowing that others have experienced this kind of epic fail before. Thank you very much for posting about your experience as well.

    I'm on MT right now, using the GS, the same as you. I'm relatively happy with the performance but I know that, if I combine all of the sites I have responsibility for, including at least one shopping cart site, I may start to go over the GPU limits. VPS will actually be a cost saver for me.

    I took a quick look at Slicehost and really liked what I saw. I'm going to do more looking up on them when I get back to the States (In England now, the reason your comments took so long to go up) and see if everyone fees as you do. However, the price is very fair and the TOS looks quite good.

    Thank you for the suggestion!

  26. @Christina Warren
    Somehow, I feel much better knowing that others have experienced this kind of epic fail before. Thank you very much for posting about your experience as well.

    I’m on MT right now, using the GS, the same as you. I’m relatively happy with the performance but I know that, if I combine all of the sites I have responsibility for, including at least one shopping cart site, I may start to go over the GPU limits. VPS will actually be a cost saver for me.

    I took a quick look at Slicehost and really liked what I saw. I’m going to do more looking up on them when I get back to the States (In England now, the reason your comments took so long to go up) and see if everyone fees as you do. However, the price is very fair and the TOS looks quite good.

    Thank you for the suggestion!

  27. Jonathan,

    Enjoy your trip! I was partially inspired by your post to put together a guide for making nightly backups of files on the Grid to Amazon S3. The post (which includes a generic backup script) is here http://www.christinawarren.com/2008/06/24/s3-backup-media-temple-gs/ — if you decide to stay with Media Temple (though it would work with any host).

    I don’t know what kind of solution you are using now, but I know that the $1.50 a month I spend for S3 (and I don’t just use it for my website, my photos are what take up most of that space) is worth the peace of mind, knowing I have a backup every day.

    • @Christina Warren
      This is a great idea! Thank you very much for sharing this. I don't think I'm going to stay with MT because my databases seem to be constantly in "burst" mode due to the volume of this site (even with Super Cache on in many cases) so a VPS will also be less expensive. However, I definitely could see how this script could be modified for any host, as you mentioned. I'm going to give it a closer look this weekend. Great work on this.

  28. Jonathan,

    Enjoy your trip! I was partially inspired by your post to put together a guide for making nightly backups of files on the Grid to Amazon S3. The post (which includes a generic backup script) is here http://www.christinawarren.com/2008/06/24/s3-ba… — if you decide to stay with Media Temple (though it would work with any host).

    I don’t know what kind of solution you are using now, but I know that the $1.50 a month I spend for S3 (and I don’t just use it for my website, my photos are what take up most of that space) is worth the peace of mind, knowing I have a backup every day.

  29. @Christina Warren -
    This is a great idea! Thank you very much for sharing this. I don’t think I’m going to stay with MT because my databases seem to be constantly in “burst” mode due to the volume of this site (even with Super Cache on in many cases) so a VPS will also be less expensive. However, I definitely could see how this script could be modified for any host, as you mentioned. I’m going to give it a closer look this weekend. Great work on this.

  30. [...] Since then, I have worked with over 2 dozen hosts (not counting copyright issues) and have set up a variety of sites and blogs for me, my friends and my clients. Most of my experiences have been good though I have, on a few occasions, been severely burned. [...]

  31. <a rel="nofollow" href="http://twitter.com/slicehost">@slicehost just out of curiosity, what does Slicehost have in place to prevent horror stories like this one? http://tinyurl.com/6lcfwd

  32. @slicehost just out of curiosity, what does Slicehost have in place to prevent horror stories like this one? http://tinyurl.com/6lcfwd

  33. [...] on June 12, disaster struck. My home page would not load and every attempt to access it resulted in a 500 error. I attempted to [...]

  34. [...] despite these space concerns, SAB is a good deal for my sites and, given my occasionally rough history with hosting, will help me sleep a lot better tonight.  Link to this page  Link to this page Copy [...]

  35. [...] This happened to me once. I visited my Web site only to be greeted with a nasty server error. After nearly a day of tortured waiting, I found out that my server was completely hosed and the host’s backups had failed. My only option was to rebuild the server from the ground up. [...]

  36. [...] promises to provide nightly backups of your site and, for the most part, they do. Though sometimes host backups fail, they do actually exist. The problem is that they probably aren’t ones you can [...]

Leave a Reply

STAY CONNECTED