See www.zabbix.com for the official Zabbix site.

Trigger examples

From Zabbix.org
Jump to: navigation, search

This site is intended to provide real life examples about what is possible with triggers. Every example should show a solution for a specific problem by using a certain principle in a trigger expression. The focus should be on the solution and not on the problems it solves.

Size dependent file system monitoring

Thanks to Low-Level-Discovery it's easy to monitor any count of individual file systems. But different sizes might need different considerations.

For instance where falling below 20% of free space for a 10 GByte file system might be worth to raise a warning, it might be nonsense to warn by falling below 20% for a 1 TByte file system. The idea here is to conditionally monitor a file system either for a specific percentage or a specific size of free space depending on a configurable file system size threshold.

This is achieved by having two trigger prototypes. One prototype considers percentage, when total disk size is below the threshold. The other prototype considers bytes, when total disk size is above the threshold.

User macros used

{$FS_SIZE_FREE}=5G
{$FS_SIZE_PFREE}=20
{$FS_SIZE_THRESHOLD}=25G

Trigger examples

Name: Free disk space is less than {$FS_SIZE_PFREE}% on volume {#FSNAME}
Expression: {host:vfs.fs.size[{#FSNAME},total].last()}<{$FS_SIZE_THRESHOLD} AND {host:vfs.fs.size[{#FSNAME},pfree].last()}<{$FS_SIZE_PFREE}

Name: Free disk space is less than {$FS_SIZE_FREE}Byte on volume {#FSNAME}
Expression: {host:vfs.fs.size[{#FSNAME},total].last()}>={$FS_SIZE_THRESHOLD} AND {host:vfs.fs.size[{#FSNAME},free].last()}<{$FS_SIZE_FREE}

Multiple triggers, one item, keep PROBLEM status until timeout

Sometimes an item gets different values where each may indicate a different issue but never a value that indicates that an issue is solved. Every trigger on that item should be active for a reasonable time even when other values arrive - but not for ever.

This might for instance often be the case for log based items.

The first expression makes sure each trigger falls back to OK status after a timeout, when the corresponding string doesn't occur anymore -- regardless whether other values come in or not.
The second expression behaves a bit different, but allows the usage of regular expressions. The difference is, a trigger would only fall back to OK state after the timeout, when no value comes in anymore.

User macros used

{$TIMEOUT}=15m

Trigger examples

Note: item is equivalent to host:log[/log/file,"2[0-9]{2} RC"]

Expression1: {item.count({$TIMEOUT},207 RC)}>0 AND {item.nodata({$TIMEOUT})}=0

Expression2: ({item.regexp("2[0-9]{2} RC")}=1 OR {TRIGGER.VALUE}=1) AND {item.nodata({$TIMEOUT})}=0

Multiple triggers, one item, keep PROBLEM status until timeout, but differently

This example also has trapper or log data in mind, but the goal is to not have any OK events, so we don't have to care about meaningless recovery events and notifications. What we want, is a problem event on every arrival of data to a particular item and have it disappear after a while.

I found this to be impossible to do within one trigger expression. Instead I created two triggers and dependency:

Expression1: {item.regexp(your_pattern)}=1 -- Multiple problem event generation set, depending on expression #2

Expression2: {item.nodata(your_time}}=1 -- "Not classified"

The drawback is, that you have to maintain two expressions per item. On the other hand, the necessary expressions are more simple and you can have many expressions like #1, depending on #2. This could be simplified by the use of the API or LLD. It is also suggested to set up a dashboard filter to hide the "Not classified" events.