Crash Bang Boom

For the most part when a vendor releases system updates, as end-users we expect the updates to work. I have been a Linux system administrator for many years, and can count the number of times an update hasn't worked properly on one hand.

Last week, I had strange patching experience. I patched the virtual machine that runs my web server and mail server. While installing the updates, everything ran as expected. I rebooted, and the server came up again. A day later, I noticed that the mail server had stopped receiving mail. I tried logging into the server, and I was unable to using ssh.

I created a new virtual machine, and temporarily added the old disks to the new virtual machine. I added the boot parameter to allow the serial console to work, and booted the original VM again. This time, I had the serial console up and running, and was able to see that I was not asked for a password when logging in. The server asked for my user id and never showed the password prompt. Now that I understood what was going on, I attempted to use an ssh key to log in. Once again, I used the new vm to copy my public key from another system to the old vm. I was still unable to log in, which points to an issue with PAM. Normally, I would roll-back the updates. Unfortunately being unable to log in, I was unable to do so.

I would almost consider this a comedy of errors. Normally when working with a virtual machine, the first thing you should do is take a snapshot. A snapshot is a moment in time copy of the server. Then if the patch fails, you can roll back to a working state. This time, I did not think it was necessary to create a snapshot.

So, now I have no choice but to rebuild the server. There was some good news, there was no data loss. There is an opportunity to improve how the previous server was configured.

So to quote Bullwinkle the Moose "This time, for sure!"