See www.zabbix.com for the official Zabbix site.

Higher performant partitioning in PostgreSQL

From Zabbix.org
Jump to: navigation, search

Summary

This procedure of partitioning is aimed to be:

  • most efficient on table row inserts
  • as short and/but simple as possible
  • predictable

What this procedure is not aimed to be:

  • an ultimate partitioning procedure/framework (take a look at PG Partition Manager for that)
  • feature-rich
  • foolproof

There are lots of nice procedures out there that partition Zabbix history and trends tables in PostgreSQL [1] [2] [3] [4] [5] [6]. Most of them make use of EXTRACT likely in connection with date conversion within an insert trigger function. That way the integer based clock column is interpreted for assembling a proper insert statement.
Both tasks (interpret and assemble) are done for any row that is going to be inserted. This procedure is about avoiding this.

Note: Some procedures make use of partition maintenance/creation within the insert trigger function as well. That is a very bad practice as it could lead to a race condition and should be avoided [7] - especially in larger Zabbix setups.

The idea behind this procedure is basically to partition by the clock column as is. There will always be exactly three partitions considered for row inserts in a fixed order:

  1. Current partition
  2. Next partition
  3. Previous partition

Because of this fact partitioning by lower ranges than delayed data may be inserted should be prevented.
E.g, setting a partition range significant lower than ProxyOfflineBuffer or lower than the time data may reside in Zabbix history cache before being written to database.

The insert trigger functions is completely static. No interpretation or dynamic statement assembling.

Note: Rows that are not matched by conditions of the trigger function resp. don't belong into one of the three partitions, are inserted into the master table like it would be the case without having a trigger function.

Example of weekly partitioning:

BEGIN
  IF TG_OP = 'INSERT' THEN 
    IF NEW.clock >= 1398297600 AND NEW.clock < 1398902400 THEN 
      INSERT INTO history_p1398297600 VALUES (NEW.*); 
    ELSIF NEW.clock >= 1398902400 AND NEW.clock < 1399507200 THEN 
      INSERT INTO history_p1398902400 VALUES (NEW.*);
    ELSIF NEW.clock >= 1397692800 AND NEW.clock < 1398297600 THEN 
      INSERT INTO history_p1397692800 VALUES (NEW.*);
    ELSE
      RETURN NEW;
    END IF;
  END IF; 
  RETURN NULL; 
END

Beside taking care to have the next partition available in time, the insert trigger function needs to be updated regulary:

  • at the earliest after a partition switch to assure current data is matched by first condition statement
  • at the latest before the next partition switch to assure next partition range will be part of the condition statements

For this task there are two main functions: partition_maintenance() and zabbix_partition_maintenance()

The function zabbix_partition_maintenance() is actually only a wrapper function which invokes partition_maintenance() for every Zabbix history and trends table.

The function partition_maintenance() does three things:

  1. Create missing partitions
  2. Update the insert trigger function
  3. Optionally remove obsolete partitions

One aim was to be predictable. By that it was meant to allow the main functions being executed in a 'dry-run' mode.
Setting 'dry-run' to TRUE lets any DDL being printed/logged instead of being executed. By the way, every DDL is stored in a variable first and is then either executed or printed/logged. This assures that things are really done the way as it is stated when having 'dry-run' enabled.

Usage

Prerequisites

  • Most of all, being familiar with partitioning in general [8] and with partitioning in PostgreSQL in particular [9][10].
  • History and trends child tables are accessed by one database user only.
  • Empty history and trends tables, already partitioned tables or manual preparation/migration of data
  • and probably more...

Setup

  • Create partitions and trigger functions with desired partitioning range:
SELECT zabbix_partition_maintenance('1 week');
  • Create insert trigger for every history and trends table as follows:
Note: The name of the procedure to execute (trigger function) is made up of <master_table> + 'part_trig_func()'
CREATE TRIGGER history_part_trig      BEFORE INSERT ON history      FOR EACH ROW EXECUTE PROCEDURE history_part_trig_func();
CREATE TRIGGER history_log_part_trig  BEFORE INSERT ON history_log  FOR EACH ROW EXECUTE PROCEDURE history_log_part_trig_func();
CREATE TRIGGER history_str_part_trig  BEFORE INSERT ON history_str  FOR EACH ROW EXECUTE PROCEDURE history_str_part_trig_func();
CREATE TRIGGER history_text_part_trig BEFORE INSERT ON history_text FOR EACH ROW EXECUTE PROCEDURE history_text_part_trig_func();
CREATE TRIGGER history_uint_part_trig BEFORE INSERT ON history_uint FOR EACH ROW EXECUTE PROCEDURE history_uint_part_trig_func();
CREATE TRIGGER trends_part_trig       BEFORE INSERT ON trends       FOR EACH ROW EXECUTE PROCEDURE trends_part_trig_func();
CREATE TRIGGER trends_uint_part_trig  BEFORE INSERT ON trends_uint  FOR EACH ROW EXECUTE PROCEDURE trends_uint_part_trig_func();
  • Run zabbix_partition_maintenance() with proper arguments on regular basis after every partition switch.
This can be achieved easily by using pgAgent, Cron or whatever scheduling system is preferred.

Functions

Main functions

Main functions are intended to be invoked directly. Either manually or automatically.

zabbix_partition_maintenance()

Invokes partition_maintenance() for every Zabbix history and trends tables.

Function arguments
Mandatory arguments
partition_range Range used for partitioning.
Should probably not be significant lower than a day.
Examples: '604800', '1 week' or '7 days'.
See EXTRACT, date_part for syntax.
Optional arguments
history_number_to_keep Number of history partitions to keep.
When used should probably not be set below 1.
Defaults to NULL what skips removing of history partitions.
trends_number_to_keep Number of trends partitions to keep.
When used should probably not be set below 1.
Defaults to NULL what skips removing of trends partitions.
number_to_drop Number of partitions to remove.
Counting starts before number_to_keep partitions in chronological descending order.
Defaults to 0 what removes no partition disregarding number_to_keep
object_owner User or role that will own created partitions and the insert trigger function.
Defaults to 'zabbix'.
Additional grants on master table are not considered.
dry_run Enables 'dry-run' mode when set to TRUE.
Defaults to FALSE.
When enabled no DDL is executed but printed/logged.
Function code
CREATE OR REPLACE FUNCTION zabbix_partition_maintenance(VARCHAR, INTEGER DEFAULT NULL, INTEGER DEFAULT NULL, INTEGER DEFAULT 0, VARCHAR DEFAULT 'zabbix', BOOLEAN DEFAULT FALSE)
RETURNS void AS
$BODY$
DECLARE
  partition_range        VARCHAR := $1;
  history_number_to_keep INTEGER := $2;
  trends_number_to_keep  INTEGER := $3;
  number_to_drop         INTEGER := $4;
  object_owner           VARCHAR := $5;
  dry_run                BOOLEAN := $6;
BEGIN
  PERFORM partition_maintenance('history',      partition_range, object_owner, history_number_to_keep, number_to_drop, dry_run);
  PERFORM partition_maintenance('history_log',  partition_range, object_owner, history_number_to_keep, number_to_drop, dry_run);
  PERFORM partition_maintenance('history_str',  partition_range, object_owner, history_number_to_keep, number_to_drop, dry_run);
  PERFORM partition_maintenance('history_text', partition_range, object_owner, history_number_to_keep, number_to_drop, dry_run);
  PERFORM partition_maintenance('history_uint', partition_range, object_owner, history_number_to_keep, number_to_drop, dry_run);
  
  PERFORM partition_maintenance('trends',      partition_range, object_owner, trends_number_to_keep, number_to_drop, dry_run);
  PERFORM partition_maintenance('trends_uint', partition_range, object_owner, trends_number_to_keep, number_to_drop, dry_run);
END
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

ALTER FUNCTION zabbix_partition_maintenance(VARCHAR, INTEGER, INTEGER, INTEGER, VARCHAR, BOOLEAN)
OWNER TO zabbix;
Example

In the following example zabbix_partition_maintenance() is called to:

  • create missing partitions (previous, current, and next partition) according to weekly partitioning
  • update trigger functions according to do weekly partitioning
  • keep 10 history partitions
  • keep all trends partitions
  • remove the very next obsolete partitions only
  • finally do nothing at all but show which DDL statements would be executed
zabbix=# \d+ history
                                 Table "public.history"
 Column |     Type      |        Modifiers        | Storage | Stats target | Description
--------+---------------+-------------------------+---------+--------------+-------------
 itemid | bigint        | not null                | plain   |              |
 clock  | integer       | not null default 0      | plain   |              |
 value  | numeric(16,4) | not null default 0.0000 | main    |              |
 ns     | integer       | not null default 0      | plain   |              |
Indexes:
    "history_1" btree (itemid, clock)
Triggers:
    history_part_trig BEFORE INSERT ON history FOR EACH ROW EXECUTE PROCEDURE history_part_trig_func()
Child tables: history_p1393459200,
              history_p1394064000,
              history_p1394668800,
              history_p1395273600,
              history_p1395878400,
              history_p1396483200,
              history_p1397088000,
              history_p1397692800,
              history_p1398297600,
              history_p1398902400,
              history_p1399507200,
              history_p1400112000
Has OIDs: no

zabbix=# \set VERBOSITY 'terse'
zabbix=# SELECT zabbix_partition_maintenance('1 week', 10, NULL, 1, 'zabbix', TRUE);
NOTICE:  Child table 'history_p1398902400' already exists. Skipping.
NOTICE:  Child table 'history_p1399507200' already exists. Skipping.
NOTICE:  Child table 'history_p1400112000' already exists. Skipping.
INFO:  CREATE OR REPLACE FUNCTION history_part_trig_func()
       RETURNS trigger AS
       $$
       BEGIN
         IF TG_OP = 'INSERT' THEN
           IF NEW.clock >= 1399507200 AND NEW.clock < 1400112000 THEN
             INSERT INTO history_p1399507200 VALUES (NEW.*);
           ELSIF NEW.clock >= 1400112000 AND NEW.clock < 1400716800 THEN
             INSERT INTO history_p1400112000 VALUES (NEW.*);
           ELSIF NEW.clock >= 1398902400 AND NEW.clock < 1399507200 THEN
             INSERT INTO history_p1398902400 VALUES (NEW.*);
           ELSE
             RETURN NEW;
           END IF;
         END IF;
         RETURN NULL;
       END $$
       LANGUAGE plpgsql VOLATILE
       COST 100
INFO:  ALTER FUNCTION history_part_trig_func() OWNER TO zabbix;
INFO:  DROP TABLE history_p1393459200

--- SNIP --- SNAP --- OUTPUT CUT --- SNIP --- SNAP ---

partition_maintenance()

Creates partitions for previous, current and next partition range of a parent table and updates its insert trigger function.
Optionally removes obsolete partitions.

Function arguments
Mandatory arguments
master_table History or trends table to consider for partitioning.
partition_range Range used for partitioning.
Should probably not be significant lower than a day.
Examples: '604800', '1 week' or '7 days'.
See EXTRACT, date_part for syntax.
object_owner User or role that will own created partitions and the insert trigger function.
Additional grants on master table or trigger function are not considered.
Optional arguments
number_to_keep Number of partitions to keep.
When used should probably not be set below 1.
Defaults to NULL what skips removing of partitions.
number_to_drop Number of partitions to remove.
Counting starts before number_to_keep in chronological descending order.
Defaults to 1 what leads to remove the first obsolete partition only.
When set to 0 no partition is removed disregarding number_to_keep
dry_run Enables 'dry-run' mode when set to TRUE.
Defaults to FALSE.
When enabled no DDL is executed but printed/logged.
Function code
CREATE OR REPLACE FUNCTION partition_maintenance(VARCHAR, VARCHAR, VARCHAR, INTEGER DEFAULT NULL, INTEGER DEFAULT 1, BOOLEAN DEFAULT FALSE)
RETURNS void AS
$BODY$
DECLARE
  master_table    VARCHAR := $1;
  partition_range VARCHAR := $2;
  object_owner    VARCHAR := $3;
  number_to_keep  INTEGER := $4;
  number_to_drop  INTEGER := $5;
  dry_run         BOOLEAN := $6;
  range_interval  INTEGER := EXTRACT(EPOCH FROM partition_range::INTERVAL);
  now             INTEGER := EXTRACT(EPOCH FROM NOW())::INT;
 
  current_range     INTEGER := now / range_interval * range_interval;  
  current_partition VARCHAR := FORMAT('%s_p%s', master_table, current_range);
  current_check_min INTEGER := current_range;
  current_check_max INTEGER := current_check_min + range_interval;
 
  previous_range     INTEGER := current_range - range_interval;
  previous_partition VARCHAR := FORMAT('%s_p%s', master_table, previous_range);
  previous_check_min INTEGER := previous_range;
  previous_check_max INTEGER := previous_check_min + range_interval;
 
  next_range     INTEGER := current_range + range_interval;
  next_partition VARCHAR := FORMAT('%s_p%s', master_table, next_range);
  next_check_min INTEGER := next_range;
  next_check_max INTEGER := next_check_min + range_interval;
 
  function_name VARCHAR := FORMAT('%s_part_trig_func()', master_table);
 
  create_function VARCHAR := FORMAT(
    'CREATE OR REPLACE FUNCTION %s
         RETURNS trigger AS
       $$ 
       BEGIN
         IF TG_OP = ''INSERT'' THEN 
           IF NEW.clock >= %s AND NEW.clock < %s THEN 
             INSERT INTO %I VALUES (NEW.*); 
           ELSIF NEW.clock >= %s AND NEW.clock < %s THEN 
             INSERT INTO %I VALUES (NEW.*);
           ELSIF NEW.clock >= %s AND NEW.clock < %s THEN 
             INSERT INTO %I VALUES (NEW.*);
           ELSE
             RETURN NEW;
           END IF;
         END IF; 
         RETURN NULL; 
       END $$
       LANGUAGE plpgsql VOLATILE
       COST 100',
  function_name,
  current_check_min,
  current_check_max,
  current_partition,
  next_check_min,
  next_check_max,
  next_partition,
  previous_check_min,
  previous_check_max,
  previous_partition);
 
  alter_function VARCHAR := FORMAT('ALTER FUNCTION %s OWNER TO %s;', function_name, object_owner);
BEGIN
  IF NOT EXISTS (SELECT 1 FROM pg_user WHERE usename = object_owner) THEN
    RAISE EXCEPTION 'User ''%'' does not exist.', object_owner;
  ELSIF NOT EXISTS (SELECT 1 FROM pg_tables WHERE tablename = master_table) THEN
    RAISE EXCEPTION 'Master table ''%'' does not exist.', master_table;
  ELSE
    PERFORM add_partition(master_table, previous_partition, previous_check_min, previous_check_max, object_owner, dry_run);
    PERFORM add_partition(master_table, current_partition,  current_check_min,  current_check_max,  object_owner, dry_run);
    PERFORM add_partition(master_table, next_partition,     next_check_min,     next_check_max,     object_owner, dry_run);
 
    IF dry_run = FALSE THEN
      EXECUTE create_function;
      EXECUTE alter_function;
    ELSE
      RAISE INFO '%', create_function;
      RAISE INFO '%', alter_function;
    END IF;
 
    IF number_to_keep IS NOT NULL THEN
      PERFORM remove_partitions(master_table, current_range, range_interval, number_to_keep, number_to_drop, dry_run);
    END IF;
  END IF;
END
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

ALTER FUNCTION partition_maintenance(VARCHAR, VARCHAR, VARCHAR, INTEGER, INTEGER, BOOLEAN)
OWNER TO zabbix;

Helper functions

Helper functions are needed by main functions but could be invoked directly as well.

add_partition()

Adds a single child table (partition)

Function arguments
Mandatory arguments
master_table History or trends table to consider for partitioning.
child_table Table that inherits columns from master table
check_min clock values greater or equal than this are allowed to be inserted
check_max clock values lower than this are allowed to be inserted
object_owner User or role that will own the created partition.
Additional grants on master table are not considered.
Optional arguments
dry_run Enables 'dry-run' mode when set to TRUE.
Defaults to FALSE.
When enabled no DDL is executed but printed/logged.
Function code
CREATE OR REPLACE FUNCTION add_partition(VARCHAR, VARCHAR, INTEGER, INTEGER, VARCHAR, BOOLEAN DEFAULT FALSE)
RETURNS void AS
$BODY$
DECLARE
  master_table VARCHAR := $1;
  child_table  VARCHAR := $2;
  check_min    INTEGER := $3;
  check_max    INTEGER := $4;
  table_owner  VARCHAR := $5;
  dry_run      BOOLEAN := $6;
  
  create_table VARCHAR := FORMAT(
    'CREATE TABLE %I (
         CHECK (clock >= %s AND clock < %s),
         LIKE %I
         INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING COMMENTS)',
    child_table,
    check_min,
    check_max,
    master_table);
    
  alter_table_owner   VARCHAR := FORMAT('ALTER TABLE %I OWNER TO %I', child_table, table_owner);
  alter_table_inherit VARCHAR := FORMAT('ALTER TABLE %I INHERIT %I', child_table, master_table);
BEGIN
  IF EXISTS (SELECT 1 FROM pg_tables WHERE tablename = child_table ) THEN
    RAISE NOTICE 'Child table ''%'' already exists. Skipping.', child_table;
  ELSE
    IF dry_run = false THEN
      EXECUTE create_table;
      EXECUTE alter_table_owner;
      EXECUTE alter_table_inherit;
    ELSE
      RAISE INFO '%', create_table;
      RAISE INFO '%', alter_table_owner;
      RAISE INFO '%', alter_table_inherit;
    END IF;
  END IF;
END
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

ALTER FUNCTION add_partition(VARCHAR, VARCHAR, INTEGER, INTEGER, VARCHAR, BOOLEAN)
OWNER TO zabbix;

remove_partitions()

Removes up to number_to_drop partitions before number_to_keep partitions.
Note: Removing partitions will stop early when a table does not exist.
Function arguments
Mandatory arguments
master_table History or trends table to remove partitions from.
current_interval Partition range to consider as current.
See evaluation of now variable in #partition_maintenance()
range_interval Range in seconds used for partitioning.
Should correspond to range_interval used for partition creation.
number_to_keep Number of partitions to keep.
When used should probably not be set below 1.
number_to_drop Number of partitions to remove.
Counting starts before number_to_keep partitions in chronological descending order.
When set to 0 no partition is removed disregarding number_to_keep
Optional arguments
dry_run Enables 'dry-run' mode when set to TRUE.
Defaults to FALSE.
When enabled no DDL is executed but printed/logged.
Function code
CREATE OR REPLACE FUNCTION remove_partitions(VARCHAR, INTEGER, INTEGER, INTEGER, INTEGER, BOOLEAN DEFAULT FALSE)
RETURNS void AS
$BODY$
DECLARE
  master_table   VARCHAR := $1;
  current_range  INTEGER := $2;
  range_interval INTEGER := $3;
  number_to_keep INTEGER := $4;
  number_to_drop INTEGER := $5;
  dry_run        BOOLEAN := $6;
  max_range      INTEGER := current_range - range_interval * number_to_keep;
  min_range      INTEGER := max_range - range_interval * (number_to_drop - 1);
  child_table    VARCHAR;
  drop_table     VARCHAR;
BEGIN
  FOR range_to_drop IN REVERSE max_range .. min_range BY range_interval LOOP
    child_table := FORMAT('%s_p%s', master_table, range_to_drop);

    IF EXISTS (SELECT 1 FROM pg_tables WHERE tablename = child_table) THEN
      drop_table := FORMAT('DROP TABLE %I', child_table);
      
      IF dry_run = false THEN
        EXECUTE drop_table;
      ELSE
        RAISE INFO '%', drop_table;
      END IF;
    ELSE
      RAISE INFO 'Child table ''%'' does not exist. Stopping.', child_table;
      EXIT;
    END IF;
  END LOOP;
END
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

ALTER FUNCTION remove_partitions(VARCHAR, INTEGER, INTEGER, INTEGER, INTEGER, BOOLEAN)
OWNER TO zabbix;

Benchmark


Note: Not yet confirmed to be correct

  • Single inserts consist of 10000 rows inserted by dedicated insert statements.
  • Bulk inserts consist of one insert statement for 10000 rows.
  • The table has the format of the history table in Zabbix.
  • Inserted data has sequential increasing clock values and a mix of values for the other columns.
  • Resulting values are the average from six runs in a row for every trigger<->insert variation.

Since runtime includes forking of psql and connecting to the database, values make only sense in relation to each other:

-bash-4.1$ time /usr/bin/psql -q -f history.sql mydb

real    0m1.475s
user    0m0.105s
sys     0m0.095s

Variation A: Without insert trigger but with check constraint on table

Variation B:

EXECUTE 'INSERT INTO public.' || TG_TABLE_NAME || '_w' || TO_CHAR(TO_TIMESTAMP(NEW.clock),'WW') || ' VALUES ' || NEW.*

Variation C:

EXECUTE 'INSERT INTO public.' || TG_TABLE_NAME || '_w' || TO_CHAR(TO_TIMESTAMP(NEW.clock),'WW') || ' SELECT ($1).*' USING NEW

Variation D:

IF NEW.clock >= 1399507200 AND NEW.clock < 1400112000 THEN INSERT INTO history_p1399507200 VALUES (NEW.*)

Variation E: Same as D but for 2nd conditional block resp. next partition.

Variation F: Same as D but for 3rd conditional block resp. for previous partition.

Results
Variation Single inserts Bulk insert
A ~1.160s ~0.200s
B ~2.218s ~0.950s
C ~2.345s ~1.0425
D ~1.489s ~0.532s
E ~1.522s ~0.519s
F ~1.508s ~0.530s

Appendix

Cron example

Simple example for Cron based maintenance.

Maintenance script maintainZbxDbPartitions.sh

FIXME: missing leading tabulators for here-document lines

#!/bin/bash

# . ~/lib/pgIsInRecovery.func
pgIsInRecovery () {
  local IN_RECOVERY=$(psql -t -c 'SELECT pg_is_in_recovery()')
  case ${IN_RECOVERY// /} in
  ( t ) return 0 ;;
  ( f ) return 1 ;;
  esac
}

# . ~/lib/printUsage.func
printUsage () {
  cat <<- EOT
        Usage: ${0} runme

        EOT
}

printSql () {
  cat <<- SQL
        \set VERBOSITY 'terse'

        SELECT zabbix_partition_maintenance('1 week', 11, NULL, 1);

        \quit
        SQL
}

main () {
  case $1 in
  ( runme )
    if ! pgIsInRecovery
    then
      printSql | /usr/bin/psql zabbix
    else
      echo "PostgreSQL cluster is in recovery. Exiting."
    fi
    ;;
  ( * )
    printUsage
    return 1
    ;;
  esac
}

main "$@"
exit $?

Crontab

Example crontab:

SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/var/lib/pgsql

# Example of job definition:
# .---------------- minute (0 - 59)
# |  .------------- hour (0 - 23)
# |  |  .---------- day of month (1 - 31)
# |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
# |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# |  |  |  |  |
# *  *  *  *  * command to be executed

  7  3  *  *  4 ${HOME}/bin/maintainZbxDbPartitions.sh runme &> /var/log/pgsql/maintainZbxDbPartitions.log
# 7 10  *  *  4 ${HOME}/bin/backupZbxDbPartitions.sh runme   &> /var/log/pgsql/backupZbxDbPartitions.log
#23 18  *  *  * ${HOME}/bin/backupZbxDb.sh runme             &> /var/log/pgsql/backupZbxDb.log

References

  1. Docs/howto/postgresql_partitioning_by_range
  2. Docs/howto/zabbix2_postgresql_autopartitioning
  3. Docs/howto/zabbix2_postgresql_partitioning
  4. Partition tables by range, Zabbix Wiki
  5. Partitioning in PostgreSQL, ZABBIX Forums
  6. Another way for PostgreSQL partitioning, ZABBIX Forums
  7. Partition type 'id-static' without automatic partition creation #19, GitHub keithf4/pg_partman issues
  8. Partition (database), Wikipedia
  9. Partitioning, PostgreSQL Documentation
  10. Table partitioning, PostgreSQL Wiki