See www.zabbix.com for the official Zabbix site.

Docs/specs/ZBXNEXT-300

From Zabbix.org
Jump to: navigation, search

IPMI discrete sensors support

ZBXNEXT-300

Status: 1.0

Owner: Andris Mednis

Summary

Currently Zabbix IPMI sensor support is limited to threshold (analog) sensors. Support for IPMI discrete sensors should be added.

Specification

  • Zabbix server should be able to read states of discrete sensors and store them in a database.
  • Trigger expressions should be able to test bits of discrete sensors states.

Details

  • IPMI discrete sensor state can consist of up to 15 bits. The meaning of bits for a particular sensor is determined by its "Event/Reading Type Code" and (for some sensors) by "Sensor Type Code". Some bits can be meaningless for a particular sensor. Zabbix server will read all meaningful bits, assign 0 to unreadable (meaningless) bits and store sensor state as an integer number into database.
  • For testing bits in trigger expressions, a new function band() will be added:
 FUNCTION	Parameter(s)  		     Supported value types
 -----------------------------------------------------------------
 band          1st - the 1st parameter     int
                     of function last(),
               2nd - mask (mandatory),     64-bit unsigned int (0 - 18446744073709551615)
               3rd - the 2nd parameter
                     of function last().
 Returns the value of bitwise AND of item value and the mask.
  • Example of trigger expression.

Sensor "Power Unit Stat" has "Event/Reading Type Code" 0x6f and "Sensor Type Code" 0x9. In IPMI 2.0 specifications document [IPMI SPecs], Table 42-1 we see that code 0x6f means a discrete sensor, which states are to be decoded according Table 42-3. In Table 42-3 for sensor type 0x9 one sees that offset 00h means "PowerOff/Power Down". In other words if the least significant bit is 1, then server is powered off. So, we should use mask 1 to test this bit. The trigger expression could be

 {www.zabbix.com:Power Unit Stat.band(#1,1)}>0

The expression will be TRUE, if the server is powered down.

  • Function count() will be enhanced by adding "band" to supported operators.

When used with "band" operator, the second parameter of function count() is mandatory and can take one of two forms:

  1. Number_to_compare_with/Mask
    For example, to count values having "1011" (in binary) in the 4 least significant bits, use Number_to_compare_with = 11 (in decimal) and Mask = 15 (in decimal).
    Calculated item example: count("Power_Unit_Stat",600,11/15,band)
  2. Mask
    If Number_to_compare_with and Mask are equal, only Mask can be specified.
    For example, to count values having "11" (in binary) in the 2 least significant bits, use Mask = 3 (in decimal).
    Calculated item example: count("Power_Unit_Stat",600,3,band)

The Number_to_compare_with/Mask syntax works in count(), it is not necessary in trigger expressions. For example, to make a trigger activated when a value has "1011" (in binary) in the 4 least significant bits, use Mask = 15 (in decimal) and compare to 11: {test_db:Power_Unit_Stat.band(0,15)}=11

  • It is up to users and template developers to correctly understand discrete sensor readings and write trigger expressions. Zabbix documentation will help with examples and explanation how to work with discrete sensors.
  • To make understanding sensors easier, Zabbix server will write details about discovered sensors on a host into server log if DebugLevel=4 and at least one IPMI sensor item is configured for the host. For example:
$ grep "Added sensor" zabbix_server.log
8358:20130318:111122.170 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:7 id:'CATERR' reading_type:0x3 ('discrete_state') type:0x7 ('processor') full_name:'(r0.32.3.0).CATERR'
8358:20130318:111122.170 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:15 id:'CPU Therm Trip' reading_type:0x3 ('discrete_state') type:0x1 ('temperature') full_name:'(7.1).CPU Therm Trip'
8358:20130318:111122.171 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:17 id:'System Event Log' reading_type:0x6f ('sensor specific') type:0x10 ('event_logging_disabled') full_name:'(7.1).System Event Log'
8358:20130318:111122.171 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:17 id:'PhysicalSecurity' reading_type:0x6f ('sensor specific') type:0x5 ('physical_security') full_name:'(23.1).PhysicalSecurity'
8358:20130318:111122.171 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:14 id:'IPMI Watchdog' reading_type:0x6f ('sensor specific') type:0x23 ('watchdog_2') full_name:'(7.7).IPMI Watchdog'
8358:20130318:111122.171 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:16 id:'Power Unit Stat' reading_type:0x6f ('sensor specific') type:0x9 ('power_unit') full_name:'(21.1).Power Unit Stat'
8358:20130318:111122.171 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:16 id:'P1 Therm Ctrl %' reading_type:0x1 ('threshold') type:0x1 ('temperature') full_name:'(3.1).P1 Therm Ctrl %'
8358:20130318:111122.172 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:16 id:'P1 Therm Margin' reading_type:0x1 ('threshold') type:0x1 ('temperature') full_name:'(3.2).P1 Therm Margin'
8358:20130318:111122.172 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:13 id:'System Fan 2' reading_type:0x1 ('threshold') type:0x4 ('fan') full_name:'(29.1).System Fan 2'
8358:20130318:111122.172 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:13 id:'System Fan 3' reading_type:0x1 ('threshold') type:0x4 ('fan') full_name:'(29.1).System Fan 3'
8358:20130318:111122.172 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:14 id:'P1 Mem Margin' reading_type:0x1 ('threshold') type:0x1 ('temperature') full_name:'(7.6).P1 Mem Margin'
8358:20130318:111122.172 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:17 id:'Front Panel Temp' reading_type:0x1 ('threshold') type:0x1 ('temperature') full_name:'(7.6).Front Panel Temp'
8358:20130318:111122.173 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:15 id:'Baseboard Temp' reading_type:0x1 ('threshold') type:0x1 ('temperature') full_name:'(7.6).Baseboard Temp'
8358:20130318:111122.173 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:9 id:'BB +5.0V' reading_type:0x1 ('threshold') type:0x2 ('voltage') full_name:'(7.1).BB +5.0V'
8358:20130318:111122.173 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:14 id:'BB +3.3V STBY' reading_type:0x1 ('threshold') type:0x2 ('voltage') full_name:'(7.1).BB +3.3V STBY'
8358:20130318:111122.173 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:9 id:'BB +3.3V' reading_type:0x1 ('threshold') type:0x2 ('voltage') full_name:'(7.1).BB +3.3V'
8358:20130318:111122.173 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:17 id:'BB +1.5V P1 DDR3' reading_type:0x1 ('threshold') type:0x2 ('voltage') full_name:'(7.1).BB +1.5V P1 DDR3'
8358:20130318:111122.173 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:17 id:'BB +1.1V P1 Vccp' reading_type:0x1 ('threshold') type:0x2 ('voltage') full_name:'(7.1).BB +1.1V P1 Vccp'
8358:20130318:111122.174 Added sensor: host:'192.168.1.12:623' id_type:0 id_sz:14 id:'BB +1.05V PCH' reading_type:0x1 ('threshold') type:0x2 ('voltage') full_name:'(7.1).BB +1.05V PCH'

Changes in frontend

 {test_db:Power_Unit_Stat.band(#1,4)}=1

complaining

 "Cannot implode expression "{test_db:Power_Unit_Stat.band(#1,4)}=1". Incorrect trigger function "band" provided in expression. Unknown function."
  • Modify frontend's trigger expression constructor to support function band() with menu items:
   'Bitwise AND of last (most recent) T value and mask is = N'
   'Bitwise AND of last (most recent) T value and mask is NOT N'

and items to enter:

    Last of (T)
    Mask         <--- Mandatory field for band.
    Time shift
    N
  • Modify frontend's trigger expression constructor to support operator band in count() function:

If one of "Number of successfully retrieved values V (which fulfill operator O) for period T ...." menu items is selected, allow entering band as operator O. Parameter V in this case should accept values like:

    uint64/uint64 (e.g. 12345678901/9876554422) <--- Number_to_compare_with/Mask
    uint64 (e.g. 12345678901) <--- Mask.

It would be good to show some explanation for V: "Number_to_compare_with/Mask" or "Mask".

  • Frontend should allow using function band() in calculated items, e.g. band("Power_Unit_Stat",#1,1) and

operator band in function count() in calculated items, e.g. count("Power_Unit_Stat",#10,1/7,band). Apparently this already works without modification.

Documentation

  • Add description of function band() to "Supported trigger functions".
  • Update description of function count() with band operator and bitmask.
  • Add examples and explanation of IPMI discrete sensors to "IPMI checks".

Open issues

  • There are also IPMI chassis status parameters, for example:
    • Cooling/fan fault detected,
    • Drive Fault,
    • Chassis intrusion active,
    • "Power-On Hours" counter.

(For a complete list of parameters see Table 28-3 in [IPMI Specs])

Chassis parameters support is not in scope of this work.

  • Support of event-only sensors is not in scope of this work.

References

[IPMI Specs] IPMI Intelligent Platform Management Interface Specification, v2.0 at http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/second-gen-interface-spec-v2.pdf

ChangeLog

  • N/A