See www.zabbix.com for the official Zabbix site.

Docs/specs/ZBXNEXT-2442

From Zabbix.org
Jump to: navigation, search

Parallel sending of alerts

ZBXNEXT-2442

Status: v1.0

Owner: Sandis Neilands

Summary

Timely and reliable delivery of alerts is one of the key features of Zabbix. The current design of the alert delivery is neither scalable nor reliable. The purpose of this ZBXNEXT is to evaluate various possible solutions to these problems.

Specification

Current alerter's design

Zabbix server starts a single alerter process (hardcoded) which handles the alerts sequentially.

The alerter process has a 3-stage mainloop. Timeouts are hardcoded for each media. E-mail, SMS, Jabber, etc. have different timeouts.

Alerter-current-mainloop.png

Risks to be reduced

  • One "slow" media type halts alert delivery to other media types.
  • Alerter's throughput is poor due to sequential sending of alerts (most time spent waiting for responses from servers).
  • Hang in 3rd-party library (gnuTLS, iksemel), user script or alerter itself halts alert sending indefinitely.
  • When the alerter is already loaded it doesn't make sense to sleep for 30 seconds at the end of the loop.

Other requirements

  • Solution should be backwards compatible by default.
  • Some media types (SMS, some user scripts) do not support parallel alert delivery. This should be handled somehow.
  • The users/servers (think Jira) expect the alerts to arrive in order.
  • [OPTIONAL] Some servers deliberately throttle responses in case of too many new connections too fast sending rate. Explicit sending rate control might be needed.

The following are probably out-of-scope for this ZBXNEXT?

  • [OPTIONAL] Current, hardcoded timeouts are not appropriate for all cases. User configurable timeouts could be implemented.
  • [OPTIONAL] Alerter ignores script return codes. They could be used to mark delivery of alert as failed if script failed.
  • [OPTIONAL] Alerter does not track the state of the media. If it failed then further attempts on that media type could be delayed.

Details

The goals of this ZBXNEXT are conflicting. On the alerter's side we want to send out the alerts as fast as possible. In contrast the users want (and in some cases - need) to receive the alerts in order.

These goals can be bridged by splitting alert delivery based on the following tuple:

  • media (e-mail, Jabber, script, etc.);
  • media type (particular server/script);
  • recipient.

Example: at any point in time Zabbix should not simultaneously send alerts to the same e-mail address, however it can simultaneously send out alerts to different e-mail addresses.

DB access

If we want to remain backwards compatible (e.g. still store retry count and error in DB) then even if we make alerter completely parallel its throughput will be limited by the access speed/throughput to the DB.

SELECT

Benchmarks didn't reveal any noticeable problems with SELECT statement even with 100000+ unsent alerts (query time ~1 - 2 seconds).

UPDATE

Benchmarks show that updating each alert individually (current design) is very slow. To speed this up we can concatenate multiple updates in a single DB request as is done in other places in Zabbix (see uses of DBexecute_overflowed_sql()). The results improved somewhat but were not very reliable. In production the database would be already loaded and could be accessed via (slow) network connection.

System:

  • DB: MySQL;
  • OS: 64bit Linux Mint;
  • CPU: 4-core desktop class Intel Xeon;
  • RAM: 4G.
  • Connection to Zabbix: Unix domain socket.

Setup:

  • Number of unsent alerts: 100000+.

Results:

  • Separate update for each alert: ~8000 alerts/minute updated.
  • Multiple update statements in a single DB request: 9000 - 26000 updated.
    • Changed ZBX_MAX_SQL_SIZE to 32K to avoid massive 'slow query' logs.
  • During this test the system load is around 2. Slow queries still occasionally occur.

Process model

To minimize impact on database and memory we propose to split the alerter in two parts.

  • Alert Queue Manager - distributes work between workers.
  • StartAlerterWorkers=n workers - send alerts.

IPC model between Queue Manager and Workers

We evaluated risks and performance characteristics of several IPC options:

  • client-server protocol over TCP, stream Unix domain socket, FIFOs;
  • SysV shared memory (as is done in many other places in Zabbix);
  • 3rd party MQ architecture;
  • SysV, POSIX queues.

By using Unix domain sockets we can avoid some risks associated with SysV shared memory. The performance is sufficient. We created a small proof-of-concept client-server echo protocol implementation. The server sends back the clients' requests verbatim.

File:Unixsock 7.zip

On desktop class 4-core Intel Xeon CPU running 64bit Linux Mint we go the following results.

Test 1. Request size: 1KB (+few bytes header), interval between requests in client: 1 second, number of clients: 100.

  • Server uses ~1% CPU as read from htop.
  • Server handles 6000 reqs/min.
  • No delays in communication.

Test 2. Same as test 1, but with 1000 clients.

  • Server uses ~10% CPU as read from htop.
  • Server handles 60000 reqs/min.
  • No delays in communication.

Test 3. Same as test 2, but interval between request: 0.

  • Server uses 90% CPU.
  • Server handles ~17 000 000 reqs/min.
  • Clients use < 1% CPU.

Queue Manager's internal workings

If we go with the Unix socket IPC instead of shared memory model then we can use whichever heap-allocated container data structures we want/have as they won't be shared.

We need to store:

  • currently handled tuples (worker, media, media type, recipient);
  • media type statistics (for rate control);
  • unsent alerts read from the DB;

The queue manager would respond to requests from workers (get work, report result). The workers back off exponentially (to limit of 30 seconds) if there is no work available currently.

UI changes

Two additional fields for media type configuration form:

  • Maximum Connections (range: 0 - 1000, default: 1).
  • Minimum interval between messages (range: integer, default: 0).

Configuration changes

  • StartAlertWorkers=n (range: 0 - 1000, default 1).
  • DBBulkUpdateSize (range: 1K - 64K).

API changes

Add max coonnections and min interval fields to media type object and 3 of it's methods:

  • media_type.create()
  • media_type.get()
  • media_type.update()

Translation strings

Max connections and min interval fields must be translated. We have to agree on the names first.

Database changes

Add max connections and min interval fields to 'media_type' table.

Documentation

Media types

API

Server configuration

Internal Monitoring

General

ChangeLog

  • v1.0 initial revision

See also...