See www.zabbix.com for the official Zabbix site.
- 1 Monitoring fast-growing log files
- 2 Summary
- 3 Approaches to handling fast-growing log files
- 4 Current rate limit mechanism
- 5 Proposed modification with "maxdelay" parameter
- 6 Changes in item parameters
- 7 New functionality: counting matching lines in log files
- 8 Front-end changes
- 9 API changes
- 10 Translation strings
- 11 Database changes
- 12 Documentation
- 13 ChangeLog
Monitoring fast-growing log files
Status: Work in progress
Owner: Andris Mednis
logrt items work well with slowly growing log files. Every line matching the pattern is sent to server.
This method is not suitable for fast growing log files. Sending a massive amount of log lines to Zabbix server overloads the server and a backend database.
New options should be added for
logrt items for processing fast-growing log files without overload.
Also, new items for counting number of matching lines should be added - in some use-cases number of lines is preferred instead of lines themselves.
Approaches to handling fast-growing log files
When a log file grows faster than a monitoring system can process it there are two options:
- Continue processing at a rate the monitoring system can handle and accept the growing time lag in a hope that eventually the system will catch up with the most recent lines in the log.
Advantage: no lines are ignored.
Drawback: important lines may be analyzed after long time, alerts generated too late.
This is the current mode of operation in
Overload protection is implemented as described in Current rate limit mechanism.
- Ignore some older lines to keep up with the most recent lines in the log.
Advantage: immediate attention to most recent lines.
Drawback: ignored lines may be important but corresponding alerts are not generated.
In addition to overload protection an "ignore" mechanism is proposed as described in Proposed modification with "maxdelay" parameter.
Zabbix should be modified to allow both approaches.
Current rate limit mechanism
logrt items use two rate limits on number of log file lines:
- To prevent overloading Zabbix server and network resources Zabbix agent does not send more than "maxlines" of a log file per second.
Maximum number of lines allowed to send to server for one update interval:
s_count = maxlines * update_interval
maxlines value: min = 1, default = 20, max = 1000
If maxlines parameter is not specified the default value provided by "MaxLinesPerSecond" parameter in the agent configuration file is used.
- To prevent overloading monitored host CPU and I/O if log file grows too fast Zabbix agent does not process more than
p_count = 4 * s_count
lines per one update interval.
So, maximum configurable amount is 1000 lines/s sent to server and 4000 line/s analyzed by agent.
These limits will not be changed.
Proposed modification with "maxdelay" parameter
- If maxdelay > 0 is specified, then in each check collect additional data:
- number_of_processed_bytes from log files,
- t_proc, "wall-clock" time spent on processing.
The t_proc should include all time between successive checks of the item.
One part of t_proc - time spent by agent on reading the log files and analyzing their lines - will become known during the current check.
The other part of t_proc - time spent on sending results and metadata to server and time used for checking other items - will become available only during the next successive check.
- At the beginning of the next check calculate:
- processing speed (bytes per second) as
v = number_of_processed_bytes / t_proc
- current delay (seconds) as
t_del = number_of_remaining_bytes / v
- processing speed (bytes per second) as
- If 0 < t_del <= maxdelay then delay is acceptable, proceed with analyzing log file from current position.
- If t_del > maxdelay then ignore lines by "jumping" over them bytes_to_jump ahead
bytes_to_jump = number_of_remaining_bytes * (t_del - maxdelay) / t_del
Most likely we will "land" somewhere in the middle of line. Search the end of line, do not analyze it.
Note that we do not even read ignored lines into buffer but calculate approximate position to jump to in a file.
The fact that skipping of log file lines took place should be logged in agent log file at LOG_LEVEL_WARNING level.
For items dealing with log rotation (e.g.
logrt) the number_of_processed_bytes, number_of_remaining_bytes and t_proc should be calculated over all log files selected for current check. "Jumping" over ignored lines may result in "landing" into other log file.
Changes in item parameters
Add a new parameter "maxdelay" to
"maxdelay" parameter is the maximum acceptable delay in seconds.
- 0 - (default) standard behaviour, never ignore records.
- > 0 - older lines may be ignored if necessary for keeping up with most recent lines in a fast growing log file.
The new parameter is optional.
New functionality: counting matching lines in log files
In some cases user is not interested in sending every matching log file line to Zabbix but in a number of matching lines.
Add two new items
logrt.count to implement "count" functionality for log files:
where maxproclines - max number of lines per second to be analyzed in agent.
maxproclines value: min = 1, default = 80, max = 4000.
If maxproclines parameter is not specified the default value is set to 4 * MaxLinesPerSecond (parameter in the agent configuration file).
Value of items
logrt.count will be number of matching lines for the configured update interval of the item.
logrt send data only when matching lines have been detected, the new items
logrt.count will send data on every update interval, even if log file has not changed ("0" in this case).
logrt, the new items
logrt.count will not advance current position in log file in case of error when sending result to server until communication is restored, unless maxdelay > 0 is specified. Note that this can affect
logrt.count results: for example, one check finds 100 matching lines in a log file, but due to a communication problem this result cannot be sent to server. In the next check the agent counts the same 100 matching lines and also 70 new matching lines and communication is restored. The agent now sends count = 170 as if they were found in one check.
If maxdelay > 0 is specified and a "jump" over log file lines takes place and the result of
logrt.count check cannot be sent to server, the position after "jump" is kept and the result is discarded.
log, the new item
log.count will send NOTSUPPORTED if the log file does not exist or is not accessible.
logrt, the new item
logrt.count will NOT send NOTSUPPORTED if log file(s) does not exist or is not accessible as it could be a result of rotation.
logrt by adding a new parameter "maxdelay".
Add two new items
logrt.count as described above.
Number of matching lines since the last check of the item. Returns integer
Number of matching lines since the last check of the item with log rotation support. Returns integer
- What's new section
- Item types, Zabbix agent - describe new "maxdelay" parameter for
logrtitems, describe new items
- Log file monitoring - describe how "maxdelay" parameter and skipping of lines work.