Nagios
I just sent this email off to a friend. I decided to save it here for future reference.
On my way out you commented that Nagios may be in place shortly. If that was in jest, no need to read further.
Otherwise, here’s a few lessons learned.
Upon install, the various default config files all have the different bits of the configuration bundled together. That eventually becomes managament hell.
Avoid it from the very beginning.
Here’s what I did in the situations that wound up working best for me.
(This is from memory, so don’t expect the syntax to be correct)
1) Nagios.cfg
All that file contains is a few static variables relating to the install. Paths to various bits and the like. It then contains several include lines:
include - checks.cfg
include - notifications.cfg
include - servers/*
2) checks.cfg
Contains configuration data on the various health checks that you have created.
3) notifications.cfg
Contains data on how to execute each of the notifications that you reference in the other bits.
4) servers/*
This is the neat thing. You can tell Nagios to include all .cfg files in a specific directory. In that dir, you have ‘www.domain.com.cfg’, ’smtp.domain.com.cfg’ and the like. In those files you have lines that reference health checks that are defined in ‘checks.cfg’. You can add new ones by cp’ing the files for similar hosts. You can remove hosts by just deleting a single file. This is the real key toward simple management of host monitoring via Nagios.
This nugget allow for easier automated addition of hosts. If you were really hardcore you could even write up a script that does a daily ping scan of the network. Any new hosts it finds would then be nmaped to gather some data on them and then have a .cfg file created for that host based on data gathered during that scan. That’d be neat.
That’s all I’ve got for right now, feel free to contact me with any questions you may have.