See www.zabbix.com for the official Zabbix site.

Docs/specs/ZBXNEXT-1689

From Zabbix.org
Jump to: navigation, search

Limit number of update operations for table 'items'

ZBXNEXT-1689

Status: v1.3

Owner: Alexei, Sasha

Summary

Currently large number of SQL update operations for table 'items' is the major performance bottleneck. For each new value Zabbix updates several fields in table 'items': 'lastvalue', 'lastclock', 'lastns', 'prevvalue', 'prevorgvalue', 'lastlogsize', 'mtime'.

The proposed functionality will eliminate this bottleneck.

Server changes

Existing fields 'items.lastvalue', 'items.lastclock', 'items.lastns', 'items.prevvalue' and 'items.prevorgvalue' will be dropped.

Value cache will be used to store lastvalue and prevvalue fields. The last/prev value fields from internal structures must be dropped and the value cache must be used instead.

Configuration cache will be used to store lastclock, lastns (stored as lastts) and prevorgvalue (stored as lastrawvalue) fields. As those fields are used only for delta item processing they must be stored in a separate hashset instead of being added to existing items cache.

Processing of each field:

  • lastvalue, prevvalue: added to shared memory and updated on receipt of new value, this will be already handled by value cache.
  • prevorgvalue, lastclock, lastns: added to shared memory and updated on receipt of a new value; same logic as if we added a new item for processing

Queue related internal checks

The queue related internal checks will use information from the shared memory instead of accessing the database.

Interface changes

The functionality affects four major components of the interface: Monitoring -> Overview, Monitoring -> Latest Data, Monitoring -> Web and Administration -> Queue.

Monitoring -> Overview, Monitoring -> Web and Monitoring -> Latest data

The information for these pages and also for history related macros (last and previous values, last timestamp) will be retrieved from the historical tables using Zabbix API instead of calls to item.get.

Macros

History related macros ({ITEM.VALUE}, {ITEM.LASTVALUE}, etc) should use values retrieved from the historical tables.

Administration->Queue

The queue will be retrieved directly from Zabbix server using JSON protocol. The information will be available only if Zabbix server is running.

In case if connection to the server cannot be established a standard message box "Unable to connect to Zabbix server" will be displayed and no data tables will be shown.

JSON request implementation details

There will be new ZBX_PROTO_TAG_REQUEST defined as ZBX_PROTO_VALUE_GET_QUEUE ('queue.get'). Value of ZBX_PROTO_TAG_TYPE will specify type of returned information:

 'overview': number of delayed items per item per delay period
 'overview by proxy': number of delayed items per proxy per delay period
 'details': list of delayed items

Server will validate permissions by receiving session id ("sid"). The server will not return queue if it is not a valid session, user is not logged in or it is not a super admin.

Request-response for the 'overview' call:

REQUEST

  {
     "request":"queue.get",
     "sid":"02c412e863883395f4f52fc5e9f2efc9",
     "type":"overview"
  }

RESPONSE

  {
      "response":"success",
      "data":[
      {
          "itemtype":0,
          "delay5":5,
          "delay10":10,
          "delay30":30,
          "delay60":30,
          "delay300":30,
          "delay600":600
      },
      {
          "itemtype":1,
          "delay5":5,
          "delay10":10,
          "delay30":30,
          "delay60":30,
          "delay300":30,
          "delay600":600
      },
   # same for all item types that have any queue - if there is no queue for some item type, it is omitted from the response
   # ...
      ]
  }

Request-response for the 'overview by proxy' call:

REQUEST

  {
     "request":"queue.get",
     "sid":"02c412e863883395f4f52fc5e9f2efc9",
     "type":"overview by proxy"
  }

RESPONSE

  {
      "response":"success",
      "data":[
      {
          "proxyid":0,  # 0 for server stats
          "delay5":5,
          "delay10":10,
          "delay30":30,
          "delay60":30,
          "delay300":30,
          "delay600":600
      },
  # For all proxies having queue. Proxies (and server) without queue aren't included in this list
      {
          "proxyid":12345,
          "delay5":5,
          "delay10":10,
          "delay30":30,
          "delay60":30,
          "delay300":30,
          "delay600":600
      }
      ]
  }

Request-response for the 'details' call:

REQUEST

  {
     "request":"queue.get",
     "sid":"02c412e863883395f4f52fc5e9f2efc9",
     "type":"details"
  }

RESPONSE

  {
      "response":"success",
      "data":[
  # For all delayed items sorted by 'nextcheck'. Maximum 501 records.
      {
          "itemid":123,
          "nextcheck":23423434554
      },
  # ...
      ]
  }

Request-response for an invalid session id:

REQUEST

  {
     "request":"queue.get",
     "sid":"xxxxxxxxxxxxxxxx",
     "type":"overview"
  }

RESPONSE

  {
      "response":"failed",
      "info":"Permission denied."
  }

Request-response for an invalid request type:

REQUEST

  {
     "request":"queue.get",
     "sid":"02c412e863883395f4f52fc5e9f2efc9",
     "type":"xxxxxxx"
  }

RESPONSE

  {
      "response":"fail",
      "info":"Unsupported request type."
  }

API changes

Item related API calls should be modified. Field 'items.prevorgvalue' will no longer be supported. Retrieval of 'items.lastvalue', 'items.lastclock', 'items.lastns' and 'items.prevvalue' will be reimplemented to get data from history tables.

API must have a single entry point for getting historical information so that future developments (like support of Cassandra) can be easily integrated.

'with_historical_items' parameter will be removed from host.get and hostgroup.get.

Actions when item housekeeper interval set to 0

It will be possible to set item update interval to '0' as it is now. In this case:

  • prev, prevorg and current values will be stored in shared memory
  • information will not be saved in history
  • latest values will be missing in Overview and Latest data
  • macro {ITEM.LASTVALUE} will work on server side, but won't work on the front-end
  • a patch is required to update history period to 1 if set to 0 (otherwise there will be regressions since some of macros, latest data and overview won't show any data)

Translation strings

  • Unable to connect to Zabbix Server
  • Scheduled check

Database changes

These fields will be dropped: items.lastvalue, items.lastclock, items.prevvalue, items.lastns and items.prevorgvalue.

We need a patch to update item history housekeeper interval from 0 to 1.

Documentation

  • What's new
  • Upgrade notes
    • Queue info only being accessible when server runs
    • Item history interval will be set to '1' if equal to '0'
  • API docs

Test cases

  • Performance related
    • Check performance degradation of Monitoring -> Latest data, Monitoring -> Overview and Monitoring -> Web
    • Set new record for NVPS :)
  • Value macros in map labels and graph titles still work

Other decisions

  • Processing of 'items.lastlogsize' and 'items.mtime' will not be changed.
  • Filtering of items and hosts that have history data will be based on queries from history. No flag items.has_history (or similar) will be introduced.

Topics discussed

  • We also lose the ability to tell when an unsupported item has last been updated. The information is available from internal events if really needed.
  • After server restart items using delta pre-processing will be calculated only on receipt of a second value since we do not have items.prevorgvalue

To be discussed

  • The message box "Unable to connect to Zabbix server" should also contain information about configured server ip/dns and port like "status of zabbix" does (it will be available to superadmins only).

ChangeLog

  • 1.1
    • 'with_historical_items' parameter will be removed from host.get and hostgroup.get
  • 1.2
    • changed the server queue.get response format to return the value in the 'value' property instead of 'data'
  • 1.3
    • added session id into the queue request