Present Location: News >> Blog >> Outages and oopses?

Blog

> Outages and oopses?
Posted by prox, from North Brunswick, on May 24, 2008 at 19:56 local (server) time

This past week has been a bad one for dax, the server that hosts this website (and a number of other things).

Short summary (warning: full disclosure)

On Saturday (5/18), dax rebooted three times due to the primary (boot) SATA disk being detached from the OS.  It's not the first time this has happened, and I've mentioned it several times in the past.  Unfortunately, the third time it rebooted I was in the middle of a ports upgrade, and the pkgdb was in the process of being rewritten.  Basically, almost everything in /var/db/pkg was destroyed, leaving me to either reinstall all my ports at the risk of this happening again, or finally asking for a hardware replacement (whole server or just disk, it didn't matter to me).  I submitted a ticket to the Voxel folks (where dax is hosted), on Saturday evening, and they scheduled a boot disk swap (and install of FreeBSD 7.0, amd64) on late Sunday evening.

Just to be on the safe side, on Sunday I picked up a 250GB Western Digital USB hard disk, and managed to back up (with the help of HPN-SSH) all of the home directories on dax from /usr/home, which amounted to about 181GiB.  In retrospect, I'm thinking that this drive might have been the best thing I bought all year.  Normally I just back up code for my website, MySQL databases, /etc, and some things in /usr/local.

On Sunday evening (that lasted to almost 0300 on Monday), Voxel attempted an install of FreeBSD 7.0 onto a new 320GB Western Digital disk.  Apparently there were some technical difficulties, because not only was AHCI disabled in the BIOS for the installation to succeed, but the PXE boot image was labeled incorrectly and I got the i386 version instead of the requested amd64 one.  Unfortunately, I only realized this on Tuesday evening, when I saw i386 scroll across the screen way too many times when doing a make buildkernel.  I submitted another ticket, and turned in for the night.

Fast forward to Thursday, when I received a message indicating dax (amd64) was ready to go, but unfortunately my data disk (/usr/home) had been accidently newfs'ed.  Since I was sitting in the airport, I didn't scream too loudly, but asked if the AHCI setting could be checked on again.  It was indeed still set to IDE compatibility mode, and was switched to AHCI mode after some changes to /etc/fstab.  Strangely, FreeBSD detects the disks as /dev/ad0 and /dev/ad1 in IDE mode, but /dev/ad8 and /dev/ad12 in AHCI mode.  I suppose that's due to the fact that the AHCI controller is logically separate, but I'm not sure.  Either way, dax was ready to go.

As compensation for the loss of data, Voxel is automatically giving me two full months of service, free (I pay $120/mo).  Not bad, I would have had to pry a deal like this out of other hosting providers.  I will still recommend them any day.

I spent part of Friday and today restoring all the data (rsync is still going for a couple things…), but dax is back up with FreeBSD 7.0 amd64, and all services have been restored. I enabled the ULE scheduler in the kernel configuration, since it's supposed to help with SMP systems, and is similar to Ingo's O(1) scheduler on GNU/Linux.  I doubt I will notice a difference, though.

> Add Comment

New comments are currently disabled for this entry.