UsMan's WoRkSpAce

Wednesday, November 29, 2006

MONIT, unix system management utility

Open source UNIX system and monitoring management utility (MONIT)

* Monit is a handy utility for managing and monitoring processes, files and directories and devices on a UNIX system. It performs automatic maintenance, repairs with the ability to take pro-active steps and generate alerts. Configuration can be performed via a text file or a web interface. Auto actions include alert, restart, start, stop, exec and unmonitor. Monit can be integrated with open-source heatbeat package for monitoring service failures.

* monitrc is the configuration file, which contains a global and per-service sections. Processes and daemons can be monitored by their PID file and re-started or alerted to the administrator. Remote services can be monitored by a TCP or UDP connection. monit also understand most common internet protocols that helps to check responses instead of only open port checks. Local resource usage can also be tracked and action taken in response. Files, directories and devices can be monitored for checksum verifications, timestamp records, sizes, permissions and owernships. External programs can be executed in case of monitoring failures.

* A sample process check is as follows:
check process cron with pidfile /var/run/cron.pid
group system
start program = "/etc/init.d/cron start"
stop program = "/etc/init.d/cron stop"
if 5 restarts within 5 cycles then timeout
depends on cron_rc

* A sample file check is as follows:
check file cron_rc with path /etc/init.d/cron
group system
if failed checksum then unmonitor
if failed permission 755 then unmonitor
if failed uid root then unmonitor
if failed gid root then unmonitor

* A sample device check is as follows:
check device disk1 with path /dev/hda1
start = "/bin/mount /dev/hda1"
stop = "/bin/umount /dev/hda1"
if space usage > 90% then alert
if space usage > 99% then stop

* A sample protocol check is as follows:
check host www.test.com with address www.test.com
if failed port 80 protocol http and request "/help/about.txt"
then alert with mail-format { subject: test.com is down }
alert test@test.com

* A sample protocol test with send/expect strings is as follows:

check host tildeslash.com with address tildeslash.com
if failed port 80
send "GET / HTTP/1.0\r\nHost: tildeslash.com\r\n”
send “Connection: close\r\n"
expect "HTTP/[0-9\.]{3} 200 .*"
then ...

* It has a neat concept of service dependency, where an action, such as unmonitor can be escalated to a dependent service. Normally file and service checks are linked in this way. Option is specified by using 'depends on [service check]' token.

0 Comments:

Post a Comment

<< Home