Mirroring RAID doesn’t help if you can’t boot!
We are down to the last 5G on the RAID server in the house. So I went to Fry’s today and picked up two spankin’ new SATA 160G drives and a new SATA controller.
I get home and diligently start work on the upgrade. We currently have two drives in there running RAID-1. This has worked quite well for us and I plan to just add these other drives and bring up another raid set also running RAID-1. I install the card and then realize there is no way I can get four drives into this case. What the was I thinking?
No big deal. I get another excuse to run to Fry’s and buy something. I need a new case. They have already closed for the night so I button up the case with the SATA controller installed, put the machine back in the server room, and flip it back on. I will add the drives when I have a proper case tomorrow.
Just to make sure it’s up I start pinging it from my mac. It doesn’t answer.
Kind of a PITA since I don’t have a monitor in the server room. I pick it up and move it in next to walter (my windows box). I steal walter’s keyboard mouse and monitor and boot up the server.
Hurm. Getting stuck at boot with a corrupted GRUB. Probably a problem with grub getting confused due to the new IDE controller.
I pop the IDE controller back out of the box. I will reinstall it and figure this all out when I get the new case.
Boot without the SATA controller. No dice. Still hung at GRUB.
Uhm… This is the RAID-1 server with everything important on it. Why will it not boot?
I boot up a knoppix cd. Everything looks fine. I mount up the master drive in non-raid and poke around. Then I find that the raid configuration file has been changed and the drives are not correct. hda became hde and hdc became hdg.
I guess when the new SATA controller was installed something in the Redhat/Fedora is magic and knows more than you so I am going to go and screw with your raid config department must have tried to automatically fix the raid set. Bad news is that GRUB just got confused. That drive is hda from BIOS calls, but as soon as linux tries to boot, it is hde.
I figured this out and felt quite a bit better. I fixed a copy of the config file on a local ramdisk. Started the raid set. Ran a fsck on the whole 115G and everything was fixed.
Now to fix grub. This part was easy, but I did something I forgot to do the first time: I configured both of the drives as bootable drives with a good grub in the MBR. If I had done this sooner I would have been in good shape.
While I understand the problem I add the controller back. Now it doesn’t boot all the way, but grub is fine. Turns out that linux prefers the offboard controllers as the primary controller. Adding a quick "ide=reverse" fixed that problem.
