Debian / Ubuntu - Monitor your server and avoid crashes of the server with Monit

Page 1 / 1
  • Published on : 20 December 2014 at 13:11 UTC
  • By Lionel Eppe

Monit is a monitoring program for Linux that allows you to :
- Monitor the desired services every x minutes
- Receive alerts when a service begins to consume too much CPU or RAM
- Restart the service when it starts to become unstable (like that, your services will not become inaccessible for a long time because they will be restarted automatically in case of problems)
- Monitor the partitions of your hard drive to receive an alert when the hard drive has not enough free space.

Tutorial tested on Ubuntu 12.04 and Debian 7.7.0.

To begin, install monit by typing this :

Code : Bash

apt-get install monit

To configure monit, you must create a file in the "/etc/monit/conf.d" folder.
Indeed, at the end of the "/etc/monit/monitrc" file which serves as documentation, you'll see the line "include /etc/monit/conf.d/*" at the bottom of the file.

To configure monit, create a config file like this :

Code : Bash

vi /etc/monit/conf.d/monit.cfg

Note : The script below is based on the documentation found in the "/etc/monit/monitrc" file.

Code : Bash

set daemon 120            # Check the services every 2 minutes (2*60s = 120s = 2min)
  with start delay 240    # Wait 4 minutes after the monit start for the first services verification.

# Log files
# Using Linux log files
# set logfile syslog facility log_daemon
# or a specific log file for monit.
set logfile /var/log/monit.log

# SMTP server to use for sending alerts by mails.
# Note : You can specify multiple SMTP servers separated by commas.
set mailserver localhost

# If SMTP servers are not working, you can store the alerts on the hard drive.
# slots 100 indicates that monit only stores the last 100 alerts.
set eventqueue
                    # if the mail server is down, we stock alerts
    basedir /var/monit              # storage directory
    slots 100

# You can define the format of mails that monit will send.
set mail-format {
	from: monit@$HOST
	subject: monit alert --  $EVENT $SERVICE
	message: $EVENT Service $SERVICE
		Date:        $DATE
		Action:      $ACTION
		Host:        $HOST
		Description: $DESCRIPTION
		
		Email sent by Monit.
}

# E-mail address of the server administrator
# If not, you will not be warned of problems with the components of mail server (mail server, POP3 and IMAP protocols, ...)
set alert sysadmin@gmail.com

# To only receive alerts, type "timeout", you must use this line instead of the previous one.
# (uncomment the line by removing the # at the beginning of the line).
#set alert sysadmin@gmail.com only on { timeout }

# Settings of the web interface of monit.
# By default, the web interface is available on the port "2812" with the credentials : admin / monit.
set httpd port 2812 and
	allow admin:monit


# Checking services
# In general, verification of a service will need its pid in order to stop or restart the service
# To find the pid file of a process, look for the file like this : find / -name [service name].pid
# Example : "find / -name apache2.pid" show this under Debian and Ubuntu : /run/apache2.pid # Apache check process apache with pidfile /run/apache2.pid # Commands to start and stop the desired service # with a delay of 60 seconds (1min) to start the service. start program = "/etc/init.d/apache2 start" with timeout 60 seconds stop program = "/etc/init.d/apache2 stop" # CPU Usage # If the service uses 60% of CPU for 4 minutes (2 cycles of 120s), we send an alert by email. if cpu > 60% for 2 cycles then alert # If the service uses 80% of CPU, we will restart the service. if cpu > 80% for 5 cycles then restart # If the service uses 200 MB of memory for 10 minutes, restart the service. if totalmem > 200.0 MB for 5 cycles then restart # If the service has over 250 children, we will restart the service. if children > 250 then restart # To test the server load over 16 minutes. if loadavg(5min) greater than 10 for 8 cycles then stop # Test if it is possible to connect to port 80 (http protocol) if failed host 127.0.0.1 port 80 protocol http #and request "/index.php" # lets try to access a specific page (remove the # to uncomment the line) then restart # the same for port 443 (SSL / https) if failed port 443 type tcpssl protocol http with timeout 15 seconds then restart # if the service has restarted 3 times in 10 minutes, there is probably a problem with the service # So, we stop the service in question. if 3 restarts within 5 cycles then timeout group server # MySQL check process mysqld with pidfile /var/run/mysqld/mysqld.pid group database start program = "/etc/init.d/mysql start" stop program = "/etc/init.d/mysql stop" # For MySQL, test if it is possible to connect to local MySQL server (127.0.0.1) on port 3306 if failed host 127.0.0.1 port 3306 then restart if 5 restarts within 5 cycles then timeout # SSH check process sshd with pidfile /var/run/sshd.pid group ssh start program = "/etc/init.d/ssh start" stop program = "/etc/init.d/ssh stop" # For SSH, port 22 (if you have changed it for safety, change it here too). if failed host 127.0.0.1 port 22 protocol ssh then restart if 5 restarts within 5 cycles then timeout # Postfix check process postfix with pidfile /var/spool/postfix/pid/master.pid group mail start program = "/etc/init.d/postfix start" stop program = "/etc/init.d/postfix stop" if failed port 25 protocol smtp then restart if 5 restarts within 5 cycles then timeout # FTP check process proftpd with pidfile /var/run/proftpd.pid start program = "/etc/init.d/proftpd start" stop program = "/etc/init.d/proftpd stop" if failed port 21 protocol ftp then restart if 5 restarts within 5 cycles then timeout # BIND check process bind9 with pidfile /var/run/named/named.pid group bind start program = "/etc/init.d/bind9 start" stop program = "/etc/init.d/bind9 stop" if failed port 53 then restart if 5 restarts within 5 cycles then timeout # POP3 check process pop3 with pidfile /var/run/courier/pop3d.pid group mail start program = "/etc/init.d/courier-pop start" stop program = "/etc/init.d/courier-pop stop" if failed port 110 then restart if 5 restarts within 5 cycles then timeout # IMAP check process imap with pidfile /var/run/courier/imapd.pid group mail start program = "/etc/init.d/courier-imap start" stop program = "/etc/init.d/courier-imap stop" if failed port 143 then restart if 5 restarts within 5 cycles then timeout
# POP3S
check process pop3s with pidfile /var/run/courier/pop3d-ssl.pid
    group mail
    start program = "/etc/init.d/courier-pop-ssl start"
    stop  program = "/etc/init.d/courier-pop-ssl stop"
    if failed port 995 type TCPSSL protocol pop for 5 cycles then restart
    if 3 restarts within 5 cycles then timeout
# IMAPS
check process imaps with pidfile /var/run/courier/imapd-ssl.pid
    group mail
    start program = "/etc/init.d/courier-imap-ssl start"
    stop  program = "/etc/init.d/courier-imap-ssl stop"
    if failed port 993 type TCPSSL protocol imap for 5 cycles then restart
    if 3 restarts within 5 cycles then timeout
# Courier (IMAP/POP3) Auth Deamon
check process courier-auth with pidfile /var/run/courier/authdaemon/pid
    group mail
    start program = "/etc/init.d/courier-authdaemon start"
    stop  program = "/etc/init.d/courier-authdaemon stop"
    if 3 restarts within 5 cycles then timeout

# Webmin
# Script found in the monit documentation : http://mmonit.com/wiki/Monit/ConfigurationExamples#webmin
check process webmin with pidfile /var/webmin/miniserv.pid
  group webmin
  start program = "/etc/init.d/webmin start"
  stop  program = "/etc/init.d/webmin stop"
  if failed host 127.0.0.1 port 10000 then restart
  if 5 restarts within 5 cycles then timeout
# Webmin (dependency)
check file webmin_rc with path /etc/init.d/webmin
  group webmin
  if failed checksum then unmonitor
  if failed permission 755 then unmonitor
  if failed uid root then unmonitor
  if failed gid root then unmonitor # Hard drive # Note : To list the partitions of your Linux server, type the command : lsblk
# The first server partition, will be "/dev/hd1" (IDE HDD) or "/dev/sda1" (SATA HDD). # # SDA1 check device sda1 with path /dev/sda1 group system if space usage > 85% then alert if space usage > 85% then alert

Once you have created your configuration file, check the monit configuration :

Code : Bash

service monit syntax

If nothing appears, your configuration is good. Otherwise, errors will be displayed.
If your setup is good, restart apache and monit.

Code : Bash

service monit restart
service apache2 restart

Once you restart the services, you have to wait for 4 minutes (the time set by the option : start with delay 240).
After 4 minutes, you get access to the web interface.

If you change the monit's configuration later, use the following command to save the changes without restarting completely monit :

Code : Bash

service monit reload

Oddly, the apache monitoring doesn't start automatically (even if you wait 10 minutes).
To solve the problem, connect to the monit web interface. By default : http://your-domain.com:2812

You will see that the apache status is : Not monitored.
Click "apache" in the list.

Then, click the "Enable monitoring" button at the bottom of the page.

You will receive a mail : monit alert -- Action done apache

And the status will become : Not monitored - monitor pending

Wait 5 minutes and its status will become "Running" unless apache has a problem.
In this case, you have received an email from monit.

To test monit, we have stopped "apache2" service using the Linux terminal. A few moments later, we received a monit alert email saying that the service was not running and had automatically restart the apache2 service.
Note : On the monit page, the service status was "does not exist" and after restarting the service, the status is again "Running". The status isn't updated in real time but every x time. (x is the interval "set daemon 120" indicated at the top of the configuration file).

If you have secured your server with iptables, allow the monit web interface port by adding this in the script of your firewall configuration :

Code : Bash

# monit
iptables -t filter -A INPUT -p tcp --dport 2812 -j ACCEPT

Once you change the configuration script of your firewall, don't forget to re-run this script.

Note : To learn more about the Linux firewall configuration, refer to our tutorial : Debian / Ubuntu - Securing your VPS or dedicated server by iptables