Present Location: News >> Blog >> JBOD death

Blog

> JBOD death
Posted by prox, from Charlotte, on March 30, 2008 at 22:53 local (server) time

For the past four years, I've kept all of my media files (music, etc.) on a single filesystem spread across a varying number of physical volumes, utilizing LVM 2.0 on Linux.  This JBOD array resided on atlantis, my old SuperMicro system with dual P3-800 CPUs.

The progression went the following, if I remember correctly:

Adding and removing drives was simple, a couple LVM commands and a lengthly resize2fs.  It worked well, although there was obviously no redundancy to speak of.  I kept backing up a good portion (80%) of the content on CD-Rs, then DVD-Rs, then DVD-R DLs, just in case.

Yesterday, I set off to replace the old array with 2x Western Digital 1TB green disks that were much quieter and supposedly didn't chew up as much power as 4x EIDE disks.  I had to recreate the filesystem, since ext3 doesn't support a 2TiB filesystem with a 1KiB block size.  I created the new logical volume and ext3 filesystem, and started to copy the data.

Unfortunately, both the 400GB (Seagate) and 250GB (Western Digital) disks decided to die 20% into the copy (I used rsync) process.  Great!  I started & stopped the rsync, remounted, rebooted, etc. for an hour or two, and realized that things weren't looking good.  Each time ext3 encountered an I/O error, it would invalidate the inode and cause it to be inaccessible, so rsync would log an error, and skip it, but not after creating the destination files filled with NULLs.

I tried running fsck.ext3 on the filesystem with -c, enabling the badblocks program.  Unfortunately, that process would have taken a good two weeks, judging from the rate of completion.

I started up rsync again, this time in only one directory (should have been a couple hundred GiB) and let it go all night.  It finished halfway through today, but by that time, the superblock(s) were corrupt, and the filesystem was toast.  I tried freezing the drives, but that just made them click louder.

I picked up _another_ Western Digital 1TB disk today, and decided to run LVM on top of RAID-5, just in case anything like this happens again (well, hoping I'd only lose one drive at once).  I am still in the process of copying and validating what made it onto the new filesystem - to the 750GB disk, so I can pull all three 1TB disks into a RAID-5 device, and put LVM on top.  I just hope the 750GB holds up for the process …

Moral of the story is: JBOD is bad.  Overall, I think I lost 40% of the files, according to a periodic ls -lR I run during system (boot disk) backups.

Comment by Alex on March 31, 2008 at 21:42 local (server) time

Have you considered ddrescue?  I have (luckily perhaps) recovered data from clicking disks, inoperable ipods, cdroms with scratches, all kinds of media.  It is relatively simple to use, the only tricky thing would be your target "dump" directory because ddrescue will output an image at whatever size your JBOD array is.  Oh, and it takes forever :p

Comment by Mark Kamichoff [Website] on April 01, 2008 at 02:24 local (server) time

Thanks for the suggestion, I wasn't aware of ddrescue, and I'll be sure to use it next time!


> Add Comment

New comments are currently disabled for this entry.