« Not something you see often | Main | Moving day »

The "server is down" checklist

My server was down earlier today for mysterious reasons. A switch at the data center was cycled and my box never came back on (the only one that didn't).

  • Login via serial console to verify network device is actually alive - check
  • Cycle the server for good measure - check
  • Cycle the switch again - check
  • Ping setup-knowledgable person on IM (idle) - check
  • Try several cell calls to various people with no good result - check
  • Get a hold of a person who might help - they're too busy right now - check
  • Watch the only person left who knows more than you about the setup (I know nothing) go to bed - check
  • Start pinging every IP in your range in desperation - bingo!
The damn gateway IP magically changed. Goddamnit.

Bang-head-on-wall *check*

Comments

Thing's that happens, however gw ip change are normally rip announced as i know ?
Fine place. Beautiful blue things. There 's at least one girl in geek's world, one and half with Elisa. Jogging is a good idea when we're seating most part of the day. ++

You mean people still use RIP these days?

Also, aren't RIP/BGP/et al normally only used for infrastructure-level routers and so on? If the router itself changes address, then the individual client systems on the network probably wouldn't know, since at that level things are usually announced via DHCP/bootp/etc. if they're even announced at all (and by the description, I'm guessing that Kasia's box was the only one which isn't configured by DHCP).

This turned out to be some wigged-out routers.

Kasia has a static /28 subnet, and two big Juniper routers provide a VRRP (Virtual Router Redundancy Protocol) group for her "virtual" gateway. Both the hard router gateways worked fine (one of which she found), but the virtual gateway was dorked. We rebuilt the VRRP group to fix it.