See www.zabbix.com for the official Zabbix site.

Escaping timeouts with atd

From Zabbix.org
Jump to: navigation, search

What problem is this trying to solve?

All item evaluations within Zabbix are limited by certain timeouts. Depending on the item type one or more timeouts apply. If your evaluation takes longer than you can afford keeping workers busy, you need to pre-process data. zabbix_sender lends itself to submit the pre-processed metrics in bulk afterwards. But what will trigger the pre-processing? If you use an agent item, you are bound to timeouts. If you use cronjobs, you lose the flexibility to change the configuration from the frontend.

One way around this problem is to use the at-daemon. The at daemon manages jobs to be executed at a specific time. A wrapper script can be used as an external script or UserParameter to add a specific job to the at daemon, thus escaping the timeout, as the daemon is actually going to fork the process. Programs are normally run as the user who added the job.

This approach consequently allows you to:

  • Safe execution of commands as the agent user without timeout limitations
  • Enabling and disabling the checks from the frontend
  • Control the interval from the frontend
  • Control the parameters from the frontend, if the wrapper script allows

Concept

Create a wrapper script that uses a case statement to only run pre-specified code, for the sake of security. For each defined case, a command (job) is echoed into the at command. The wrapper script needs to be extended for new capabilities.

This is a simple implementation for your reference: File:Atjob.sh

External check

  • Make sure atd is running and will start on boot
  • Place a wrapper script in the externalscripts directory of your Zabbix server
  • Add an external check item like atjob[apacheperf, {HOST.HOST}, http://{HOST.DNS}/server-status?auto]
  • Add your receiving trapper items

For a detailed example, see https://github.com/q1x/zabbix-templates/tree/master/scheduler

Agent

  • Make sure atd is running and will start on boot
  • Place a wrapper script in the agent user's home directory
  • Define a single UserParameter=atjob[*],/var/lib/zabbix/atjob $@
  • Restart the agent
  • Add an agent item like atjob[apacheperf, {HOST.HOST}, http://{HOST.DNS}/server-status?auto]
  • Add your receiving trapper items

Things to be aware about

Make sure the scheduled processes terminate at some point, otherwise you'll run into trouble.

atd in RHEL is faulty in at least 5.8 and 6.6 and may crash without leaving a trace. It should work fine in 7. This issue was solved upstream quite a while ago. You can work around this problem by rebuilding RHEL's atd package, applying the patch attached to the public version of the ticket. It might be a good idea to monitor if atd is still running!