See www.zabbix.com for the official Zabbix site.

Docs/specs/ZBXNEXT-1649

From Zabbix.org
Jump to: navigation, search

Better housekeeper

ZBXNEXT-1649

Status: v1.6

Owner: Alexei, Sasha

Summary

Existing implementation of housekeeper process has number of inefficiencies:

  • in case if housekeeper is disabled it won't delete session related information from the database
  • there is no fine grained control on what tables shouldn't be excluded. It might be useful for table partitioning.
  • housekeeper generates too many SQL statements
  • it should process one table after another (history->history_uint->etc) instead of grouping operations on per-itemid. In this case DB cache will be used with much better efficiency thus greatly reducing disk IO and seek operations.
  • inefficient record-by-record removal of event related information

The proposed functionality should resolve all these problems.

Interface changes

Configuration parameter DisableHousekeeping will not be supported. Housekeeper related options are moved to Zabbix interface.

Administration->General->Housekeeper

  • The form will be redesigned to contain additional options. Related fields will be grouped in frames similar to the host interface tables.
  • A "Reset" button must be added to reset all form values to defaults.

New design:

 Events and alerts
   Enable housekeeping [X]
   Keep trigger data for (in days) [NNNNN]   # next four parameters enabled if previous parameter is checked; min/max = 1/99999
   Keep internal data for (in days) [NNNNN]
   Keep network discovery data for (in days) [NNNNN]
   Keep auto-registration data for (in days) [NNNNN]
      
 IT services
   Enable housekeeping [X]
   Keep data for (in days) [NNNNN]   # enabled if previous parameter is checked; min/max = 1/99999
      
 Audit
   Enable housekeeping [X]
   Keep data for (in days) [NNNNN]   # enabled if previous parameter is checked; min/max = 1/99999
      
 User sessions
   Enable housekeeping [X]
   Keep data for (in days) [NNNNN]   # enabled if previous parameter is checked; min/max = 1/99999
      
 History
   Enable housekeeping [X]
   Override item history period [X]    # enabled if previous parameter is checked
   Keep data for (in days) [NNNNN]     # enabled if previous two parameters are checked and enabled; min/max = 0/99999
      
 Trends
   Enable housekeeping [X]
   Override item trends period [X]    # enabled if previous parameter is checked
   Keep data for (in days) [NNNNN]    # enabled if previous two parameters are checked and enabled; min/max = 0/99999

The following field labels should be used in validation and audit messages:

  • Keep trigger event and alert data for (in days)
  • Keep internal event and alert data for (in days)
  • Keep network discovery event and alert data for (in days)
  • Keep auto-registration event and alert data for (in days)
  • Keep IT service data for (in days)
  • Keep audit data for (in days)
  • Keep user session data for (in days)
  • Keep history data for (in days)
  • Keep trend data for (in days)

Configuration -> Hosts -> Item configuration form

In case if history or trends are overwritten or disabled on system level, a warning message should be displayed:

 Keep history (in days) Overridden by global housekeeper settings (#### day/days). # in normal text color
 Keep trends (in days) Overridden by global housekeeper settings (#### day/days).  # in normal text color

The "global housekeeper settings" is a link to Administration -> General -> Housekeeper.

The warning is only displayed for items on hosts.

Graphs

The logic around the existing ZBX_HISTORY_DATA_UPKEEP constant should remain as is.

Server side changes

Configuration parameters

Support of configuration parameter DisableHousekeeping will be removed.

Performance improvements

When housekeeper process is executed for the first time, the following or similar structure will be created in memory:

 itemid, valuetype, minclock
 where valuetype is one of ITEM_VALUE_TYPE_FLOAT, ITEM_VALUE_TYPE_STR, ITEM_VALUE_TYPE_LOG,
                           ITEM_VALUE_TYPE_UINT64, ITEM_VALUE_TYPE_TEXT
 minclock is a minimum timestamp stored in corresponding historical (history_<valuetype>, trends_*) table)

It should probably be sorted by valuetype+itemid.

If there is historical information in table with different valuetype, the structure will contain several records with the same itemid.

In case if valuetype is changed in table 'items', Zabbix will update the structure accordingly.

On second and following runs housekeeper will not execute SQL 'select' statements to get minclock, it will use information from the structure instead.

Minclock will be updated to the timestamp of the removed time period - 1 regardless of existence of actual data in the historical table.

Housekeeper will remove information from one historical table for all item IDs (sorted by itemid for better performance on DB side), then from another and so on. It should greatly improve efficiency of cache use of the DB server.

Removal of events and alerts

Events should be removed before alerts as it is now.

Events and alerts should be removed by executing single SQL statement for each table with the same limit as for historical tables, see function delete_history(). Acknowledges will be automatically removed since they have DELETE CASCADE constraint.

Processing of table 'housekeeper'

If Remove 'table' data is not set and we have data in the corresponding table the data will stay intact and corresponding record in table 'housekeeper' will not be removed.

Processing of zero-history

It is allowed to set history and trends to 0 days. In this case Zabbix Server should not add historical information regardless of per-item or global mode for data storage periods.

Configuration files

Zabbix server configuration file must be updated to remove DisableHousekeeping.

Database changes

New fields for table 'config':

 FIELD        |hk_events_mode             |t_integer |'1'  |NOT NULL    |ZBX_SYNC # 0 - do not remove, 1 - remove
 FIELD        |hk_events_trigger          |t_integer |'365'|NOT NULL    |ZBX_SYNC
 FIELD        |hk_events_internal         |t_integer |'365'|NOT NULL    |ZBX_SYNC
 FIELD        |hk_events_discovery        |t_integer |'365'|NOT NULL    |ZBX_SYNC
 FIELD        |hk_events_autoreg          |t_integer |'365'|NOT NULL    |ZBX_SYNC
 FIELD        |hk_services_mode           |t_integer |'1'  |NOT NULL    |ZBX_SYNC
 FIELD        |hk_services                |t_integer |'365'|NOT NULL    |ZBX_SYNC
 FIELD        |hk_audit_mode              |t_integer |'1'  |NOT NULL    |ZBX_SYNC
 FIELD        |hk_audit                   |t_integer |'365'|NOT NULL    |ZBX_SYNC
 FIELD        |hk_sessions_mode           |t_integer |'1'  |NOT NULL    |ZBX_SYNC
 FIELD        |hk_sessions                |t_integer |'365'|NOT NULL    |ZBX_SYNC
 FIELD        |hk_history_mode            |t_integer |'1'  |NOT NULL    |ZBX_SYNC
 FIELD        |hk_history_global          |t_integer |'0'  |NOT NULL    |ZBX_SYNC # 0 - per item, 1 - global settings
 FIELD        |hk_history                 |t_integer |'90' |NOT NULL    |ZBX_SYNC
 FIELD        |hk_trends_mode             |t_integer |'1'  |NOT NULL    |ZBX_SYNC
 FIELD        |hk_trends_global           |t_integer |'0'  |NOT NULL    |ZBX_SYNC # 0 - per item, 1 - global settings
 FIELD        |hk_trends                  |t_integer |'365'|NOT NULL    |ZBX_SYNC

Remove existing fields 'config.alert_history' and 'config.event_history'.

Values of 'config.hk_events_trigger', 'config.hk_events_internal', 'config.hk_events_discovery' and 'config.hk_events_autoreg' must be set to maximum of 'config.alert_history' and 'config.event_history'.

Upgrade patch must set all values of fields 'config.*_mode' to '0'.

New translation strings

  • Overridden by global housekeeper settings
  • Overridden by
  • global housekeeper settings
  • Enable housekeeping
  • Keep data for (in days)
  • Events and alerts
  • User sessions
  • Keep trigger data for (in days)
  • Keep internal data for (in days)
  • Keep network discovery data for (in days)
  • Keep auto-registration data for (in days)
  • Override item history period
  • Override item trends period
  • Keep trigger event and alert data for (in days)
  • Keep internal event and alert data for (in days)
  • Keep network discovery event and alert data for (in days)
  • Keep auto-registration event and alert data for (in days)
  • Keep IT service data for (in days)
  • Keep audit data for (in days)
  • Keep user session data for (in days)
  • Keep history data for (in days)
  • Keep trend data for (in days)

Documentation

  • What's new
  • Upgrade notes
    • Pay attention to removed DisableHousekeeping parameter
  • Zabbix Manual

Q/A

 Q. Does it affect Proxy housekeeper process?
 A. No.

Questions

  • Currently the events table secondary index consists only of clock field. If the event housekeeping is separated by event source, should we add source field to secondary index?

Discussed topics

  • HousekeepingFrequency will stay in the configuration file since it is used by Proxy
  • Housekeeper parameters are not disabled in the item configuration form to allow changing them before enabling housekeeper on global level - otherwise housekeeper could remove data before the configuration is changed. The parameters must be controlled even if housekeeper is disabled for items state.

ChangeLog

1.1

  • Added more granularity for events: separate options for internal, discovery, trigger and autoregistration events
  • Better warning messages in item configuration form

1.2

  • Added more details about ZBX_HISTORY_DATA_UPKEEP
  • Updated translation strings

1.3

  • Added 'Overridden by global housekeeper settings.' to translation strings

1.4

  • Renamed field name "hk_events_autoregistration" => "hk_events_autoreg" for consistency with tables "autoreg_host" and "proxy_autoreg_host"

1.5

  • Added hk_history_external
  • Added new translation string History data available for graphs (in days, 0 - always use trends)
  • Added new form field: History data available for graphs (in days, 0 - always use trends) [NNNNN]

1.6

  • Changed form design and labels
  • Removed hk_history_external
  • Updated the graph logic to remain without changes