See www.zabbix.com for the official Zabbix site.

Docs/action simulator

From Zabbix.org
Jump to: navigation, search

Zabbix action simulator

The action simulator is a draft extension to the Zabbix frontend, that only works for Zabbix Super Administrators. It's a community effort and not supported by the Zabbix company. The goal is, to clearly answer who is getting notifications in the case of a specific event. Notifications often don't turn out as expected, due to various reasons, as you will see. Therefore the action simulator offers additional information to debug such cases.

You can easily test this patch in a copy of the frontend in any environment. It will not change anything in your configuration and it will not send actual notifications or create events. There are versions for 2.0, 2.2 and 3.0. The 3.0 version does not apply to 2.4. It needs at least some changes to work there, but this hasn't been tried, since 2.4 is no longer supported by upstream.

Please consider to vote for ZBXNEXT-97!

Please don't blame the author for the quality of the code, as he is not a developer. Any help is warmly welcome! Feel free to contact the author on Freenode IRC (volter) or via e-mail (volker27<at>gmx.at).

It's not expected that this patch will ever be accepted upstream. Much rather, it is designed to fill the gap, until there is a proper solution.

How does it integrate?

The only visible modification is an additional column in the trigger configuration list. No database changes or re-compilation is necessary. You can just copy your frontend directory, apply the patch to the copy and configure httpd to serve this frontend in parallel. Click the link in this field to get a list of all notifications and operations caused in the case of an event! There's actually more useful information, but carry on!

List of triggers for a certain host, showing the additional column on the far right

Use cases

The action simulator covers the wish to "Test an action", but goes beyond. It's designed around the following use cases:

"Why on earth was I not notified?"

{house:on_fire.last() = 1} -- Zabbix knew the house was on fire, but sadly nobody else got to know before the fire brigade arrived and took care of the smoking remains. That's a somewhat unpleasant thought.

Did you ever set up an action and it didn't execute? Did you try to refine the action conditions and still weren't sure it will work? Debugging such a case is time-killing. You've got to be careful not to notify people with your debugging and careful not to break other things. If you fear you might have broken something, you'll probably have to test that on top!

Why did it fail? There are numerous possibilities:

  • Action is disabled
  • Action conditions don't match event for one reason or another
  • User is member of a disabled user group
  • User has no permissions to host
  • Desired user media not defined
  • Desired user media disabled
  • Severity settings for user media don't match event
  • Temporal settings for user media don't match event
  • Media type disabled system-wide
  • Operations not configured as thought

Looking up all of these details for a single trigger is tedious and prone to error.

The action simulator provides a complete view on actions, shedding light on all of the above mentioned. After clicking the "Actions?" button, a pop-up will open, that consists of a form on top and three collapsible tables below. The form allows you to specify some timings and a simple maintenance flag. It also provides some information about the trigger you chose. The first two tables cover notifications and remote commands caused by the matching actions. They cover all the possible problems associated with the execution of these operations. If an operation can not be conducted, it indicates why. The bottom table gives an overview of all available actions and the process of condition matching. It's easy to spot the culprit, that caused the action not to match. If the action conditions match, but an operation can not be executed, you'll find the details in the second table. That should give you all the information to set up everything properly. The author was able to find three different problems with the notification of a single trigger in his production environment and able to correct them in minutes.

Creating a list of notifications

Did your boss ever ask for a list of who gets notified on some events and how? With the standard interface, that might end up being a couple of hours work. Better make no mistake!

With the action simulator, you'd just click through the triggers in question and copy out the results. That's quick and easy. The action simulation is available via the API, so you should even be able to use it in your scripts. This is untried and undocumented though. Imagine having a complete list of all notifications you're sending. Now clean up and simplify your action conditions. Are you sure you didn't break something? Just create an up-to-date notification list and compare it to the previous!

Avoiding duplicate notifications

Are you sending our duplicate notifications and seeking for a way to avoid that in advance? The action audit log lets you find out which action caused the notification, but only in hindsight. The action simulator tells you which action would cause a notification and also, if it was due to a membership in some user group, or explicitly for a user.

How does it work?

Most of the code is a transcript from actions.c and escalator.c, using the API instead of raw SQL, wherever possible. Some parts are not necessary for the action simulator and thus were not adopted.

The disadvantage of this design is, that code must be synchronized between server and frontend. Ideally, this should be covered by comprehensive automatic testing.

On the other hand, this approach requires very little change in Zabbix -- almost none. If the frontend tried to utilize the server's code instead, that would be via the database. Virtual events and virtual notifications would be necessary and the debugging information should be included as well.

Pop-up window of the action simulator; The form has a Re-evaluate button to account for changes.

Known limitations and issues

Permanent limitations

  • Only works for Zabbix Super Admins

Current limitations

  • Sorting stills sucks, but not as hard
  • Maintenance is only considered in a "Yes or no" way, which doesn't reflect the actual inner workings

Bugs

Please check the release notes for known issues.

In question

  • Untested for aggregated items

Outlook and discussion

Various aspects are not covered yet. Among them:

Dependency

The trigger table shows a list of all triggers the selected trigger depends on. Having any of these in a problem state, will prevent the selected trigger from firing. That should be plain to see.

Recovery

Recovery messages are expected to have the same receivers as the original notifications, therefore there's no separate table. This assumption is most likely wrong for cases of escalation.