See www.zabbix.com for the official Zabbix site.

Zabbix IT services – an often underrated feature

From Zabbix.org
Jump to: navigation, search

When I looked at Zabbix IT services the first time I underestimated its potential.
(No, I haven’t had read the online documentation carefully at that time which would certainly have had enlightened me sooner)


I considered Zabbix IT services just as a correlation of Zabbix trigger events in a hierarchical way. So I tried to map a couple of services and its applications according to their dependencies, considered potential redundancies and finally added all related Zabbix triggers for CPU, file systems, processes, logs, etc. Well, I ended up with a nice hierarchy of monitored services and applications. But it was a lot of manual work. Even worse, new Zabbix triggers may appear later and have to be added to Zabbix IT services manually. At last it even didn’t provided a significant informational benefit.


I quickly considered Zabbix IT services as useless for larger scopes and decided to not spend further time on it. Later on I realized that I completely misunderstood its use case, which in fact is a key feature of Zabbix to me now.


Zabbix IT services is about service availability and not about everything that might affect a service.


Zabbix encourages to identify all kind of conditions. This leads to having a lot of Zabbix triggers which tell you the status quo in many respects. There might be Zabbix triggers in PROBLEM state not impacting a service, as well as multiple Zabbix triggers as cause of a single impacted service. Well, and Zabbix triggers that are unrelated to services at all. Even though one can make Zabbix triggers depend on each other, they normally don’t allow to identify quickly which customers or services are impacted. Sure a Zabbix trigger name may indicate this but there could likely be many Zabbix triggers in PROBLEM state. Finally there’s no clear view and one might need to do some filtering to get the complete picture – if possible at all.


That’s where Zabbix IT services kicks in.


When thinking of services provided to internal or external customers, then it’s actually about business services. For business services one does actually not care of CPU, file systems, processes, log files or any application issues as long as the service operates correctly. The only thing that matters is, does the offered service work or does it not? To check a business service one just has to implement a few checks that do typical transactions to the very same interface a customer accesses. In fact as less but meaningful checks as possible. By doing this one can create a Zabbix IT services hierarchy that instantly shows which business service is impacted and which customers are affected by this – irrespective of the root cause.


The following schema is a very simplified representation of the model I’ve chosen for me:

IT Service
|
|-Customer impact
| |
| |-External customers
| | |
| | |-ACME Corp.
| | | |
| | | |-Foo
| | | | |
| | | | |-*SOFT DEPENDENCY* Service impact :: Foo :: ACME Corp.
| | | |
| | | |-Bar
| | |   |
| | |   |-*SOFT DEPENDENCY* Service impact :: Bar :: ACME Corp.
| | |
| | |-Snakeoil Ltd.
| | | |
| | | |-Foo
| | | | |
| | | | |-*SOFT DEPENDENCY* Service impact :: Foo :: Snakeoil Ltd.
| | | |
| | | |-Baz
| | |   |
| | |   |-*SOFT DEPENDENCY* Service impact :: Baz :: Snakeoil Ltd.
| | |
| | |+Umbrella S.A.
| |
| |+In-house customers
|
|-Service impact
  |
  |-Foo
  | |
  | |-ACME Corp.
  | | |
  | | |-Transactions (logical AND)
  | | | |
  | | | |-*TRIGGER* Server 1 Foo Transactions for ACME Corp.
  | | | |
  | | | |-*TRIGGER* Server 2 Foo Transactions for ACME Corp.
  | | |
  | | |-User interface
  | |   |
  | |   |-*TRIGGER* Foo Web scenario for ACME Corp.
  | |
  | |+Snakeoil Ltd.
  | |
  | |+Umbrella S.A.
  |
  |+Bar
  |
  |-Baz
    |
    |-Snakeoil Ltd.
      |
      |-API
      | |
      | |-*TRIGGER* Baz external check for Snakeoil Ltd.
      |
      |-User interface
        |
        |-*TRIGGER* Baz Web scenario for Snakeoil Ltd.


By this one can get an overview of business service problems quickly. Either in a service oriented context with option to drill down over customers to service components or in a customer oriented context with option to drill down to related services.


When I once was asked: “What do you need to set-up proper monitoring for my service in Zabbix?” I used to only request information about involved hosts, application processes, logs etc. to create appropriate Zabbix triggers. Now after understanding Zabbix IT services better I changed my questioning to a rather business process oriented perspective:


  1. What are the IT services (business processes) offered to or used by customer(s)?
    This could be for instance a Web User Interface, an API or any kind of interface that is either used by a customer or within a customer’s business process. IT services are likely subject to a Service Level Agreement.
  2. What are the applications (or other IT services) each IT services is built of?
    This could be web applications for instance based on Java Servlet Pages, Enterprise Java beans or any other application that sends/receives/uses business process related data.
  3. What are the relations resp. dependencies between such applications (or IT services)?
    This could mean the sequence of applications or services involved for a business process (part) as well as any inter-connection like data buses between applications.
  4. What are the physical or virtual platforms the applications are deployed on?
    This could be any kind of server or container (Tomcat, JBoss, JVM, OS, Hypervisors, …) as well as physical components, devices or appliances.
    Such platforms may also be based on each other and such may appear like applications too. But here an application is always related to representing or implementing business logic and processing.
  5. What are the potential relations resp. dependencies between platforms?
    This could for instance mean a JBoss AS running in a JVM running in an OpenVZ container hosted on an OpenVZ hardware node installed on a physical server.
  6. What are the auxiliary IT-/services, applications, platforms?
    This could be a database, remote storage, file transfer service, scheduling system, etc.
  7. What are the expected business process functions and data flows under consideration of components and its relations identified previously?


As one can see the order is top-down from a business process perspective instead of being bottom-up or inventory driven in the sense of “take everything what is involved and see what one could monitor there”.


Doing it this way doesn’t necessarily end up with a complete picture, cover all measurements and thresholds nor provide a cookbook. These questions can possibly also not always be considered completely separated. Anyway this way one makes sure to focus on the respective business service and all its related aspects first. In the end one should be able to obtain all relevant information necessary to create a proper monitoring incl. Zabbix maps and Zabbix IT services. So, in fact what are we actually going to monitor and why do we have to focus on what?


Nevertheless, the bottom-up approach is still valid and should not be neglected. Only both approaches together build a complete monitoring.


Cheers!