See www.zabbix.com for the official Zabbix site.

Docs/template guidelines

From Zabbix.org
Jump to: navigation, search

Zabbix Template Guidelines

Note: Please use the discussion page for questions, comments, suggestions.

Specification

General
  • Templates should be modular. Things which are possible to be separated in monitored environments should have separate templates. For example, ICMP ping always belongs in its own template. SSH, Apache and any other service monitoring also happens in separate templates.
  • Templates should be as universal as reasonably possible. For example, Apache template should check all popular possible daemon names in items, and verify all of them in the trigger.
  • Low level discovery should be used for all supported objects (network interfaces, mounted filesystems, SNMP trees...).
    • The LLD rule should have a large interval. One hour is a common choice.
    • Be extra careful with rules that could potentially find lots of entities - in that case, item prototypes should have even higher intervals.
  • Consider adding custom graphs for items that could be correlated.
    • Use colours that are easy to distinguish.
  • Consider adding host/templated screens.
  • All items, triggers, LLD rules and other configuration entities should be enabled by default to make the template useful out of the box. If an entity is very specific, an exception can be made and it could be disabled by default.
  • Consider using user macros for values users might want to finetune often enough (item parameters, trigger thresholds etc).
    • Consider using template specific prefix to avoid potential conflicts with other templates.
    • If user macros are used, define them in the template itself instead of using global macros - that way users get either the default values, or an example of what the macro names are. If global macros are used, they are not exported along with the template.
Naming
  • Template names start with "Template ", then comes the category, then template name itself (the specific part) - for example, "Template OS Linux", "Template OS AIX" or "Template DB MySQL".
    • Optionally, underscores could be used instead of spaces.
  • All names (group, template, item, trigger, graph, application, screen, discovery) use normal case inside the specific part - for example, "Template App Zabbix server".
  • All templates are added in a group called "Templates".
    • Optionally, additional groups may be used.
  • Where possible, "$1" references must be used in item and trigger names.
Items
  • Item polling intervals should be large. Most novice users are not aware of the performance implications of low intervals. Intervals below one minute should be avoided. If template has many items, consider using intervals of 5 minutes and higher.
  • Item keys should be as short as possible - they should not specify default parameters. net.if.in[eth0] instead of net.if.in[eth0,bytes].
  • If an item returns numeric codes or shorthand text, set the, for item value mapping. Also provide their meaning as a listing and SQL queries like in this example.
  • If you use scripts to gather values, use active items for userparameters.
    • If a script returns large amount of values or takes longer than a second to run, use Zabbix sender with a cron daemon or an at deamon].
  • If your template monitors a specific service, consider adding all items to a single application. If there are lots of items in your template, consider splitting them in multiple applications.
  • In a single template, use items of the same type unless there's a good reason not to (like using one item type to start a script that uses Zabbix sender).
  • For trapper items, always use item description to explain how the values are supposed to be populates - is it a userparameter, a cron job using Zabbix sender or something else.
Triggers
  • As many items as possible should have triggers attached to them.
  • Where it might be useful, triggers should have some description as well, explaining some more complicated expression, reason for choosing some threshold, informing the user of possible causes or suggesting possible remedies.
  • Trigger expressions should be reasonably flap-resistant - that is, not relying on the last value only but checking last 5 or 10 minutes instead. On the other hand, do not make the expressions overly complex - for example, do not use trigger hysteresis unless it really adds significant value.
    • If a trigger for a single spiking value is still needed, consider adding it with a very high threshold.
  • Trigger thresholds should be aimed at a mid-sized environment monitoring.
  • Trigger severities should be set to reasonable values.