zabbix事件与触发器的基本原理
以zabbix 3.4为例。
每个zabbix触发器由一个唯一的trigger id进行标识,触发器条件满足的时候,zabbix生成事件。例如cpu利用率连续5分钟大于90%是一个条件,根据这个条件可以定义一个触发器。cpu利用率的数据在zabbix的术语中叫做一个item监控项。zabbix监控大量的item,例如cpu,磁盘,网络的利用率,ping状态,web服务可用性等等。
触发器有且只有两种状态,“Ok”表示正常,“Problem”表示出现问题,超出了规定的阈值。当触发器的状态变化的时候,一个event发生了。进入Problem状态的触发器,就是一个zabbix problem。
用来应对一个event的动作叫做action,一个action是一个操作及其结果,例如发送邮件通知。
当有监控item满足触发条件,就会生成触发器事件。事件一旦恢复,并不会更新events表的value字段,而是在event_recovery表中生成一条记录。在event_recovery表中,可以看到一个event事件对应了一个recovery event恢复事件。
zabbix API调用 problem.get 可以获取当前的未解决告警,也就是那些处于Problem状态的未恢复的触发器。
problem表结构定义
MariaDB [zabbix]> describe problem;
+---------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------------+------+-----+---------+-------+
| eventid | bigint(20) unsigned | NO | PRI | NULL | |
| source | int(11) | NO | MUL | 0 | |
| object | int(11) | NO | | 0 | |
| objectid | bigint(20) unsigned | NO | | 0 | |
| clock | int(11) | NO | | 0 | |
| ns | int(11) | NO | | 0 | |
| r_eventid | bigint(20) unsigned | YES | MUL | NULL | |
| r_clock | int(11) | NO | MUL | 0 | |
| r_ns | int(11) | NO | | 0 | |
| correlationid | bigint(20) unsigned | YES | | NULL | |
| userid | bigint(20) unsigned | YES | | NULL | |
+---------------+---------------------+------+-----+---------+-------+
获得当前的problem。
# curl -s -S -X POST -d { "jsonrpc": "2.0", "method": "problem.get", "params": { "output": [ "extend", "clock", "source", "object" ], "selectAcknowledges": "extend", "selectTags": "extend", "recent": "true", "sortfield": ["eventid"], "sortorder": "DESC" }, "id": 27, "auth": "25311982e31f1b0a6815489e84d82a1c" } -H Content-type: application/json-rpc http://10.10.144.21:80/zabbix/api_jsonrpc.php
{"jsonrpc":"2.0","result":[{"eventid":"90548","clock":"1508918765","source":"0","object":"0","acknowledges":[],"tags":[]},{"eventid":"15","clock":"1508217852","source":"0","object":"0","acknowledges":[],"tags":[]}],"id":27}
从数据库查看problem。
MariaDB [zabbix]> select * from problem;
+---------+--------+--------+----------+------------+-----------+-----------+------------+-----------+-----------+--------+
| eventid | source | object | objectid | clock | ns | r_eventid | r_clock | r_ns | correlationid | userid |
+---------+--------+--------+----------+------------+-----------+-----------+------------+-----------+-----------+--------+
| 15 | 0 | 0 | 13496 | 1508217852 | 654551752 | NULL | 0 | 0 |
NULL | NULL |
| 90548 | 0 | 0 | 15353 | 1508918765 | 555842555 | NULL | 0 | 0 |
NULL | NULL |
| 125948 | 0 | 0 | 13468 | 1509145833 | 731969958 | 125980 | 1509164493 | 757545190 | NULL | 0 |
| 125949 | 0 | 0 | 13491 | 1509145387 | 368586554 | 125963 | 1509164439 | 306719861 | NULL | 0 |
| 125950 | 0 | 0 | 15247 | 1509145397 | 402078177 | 125964 | 1509164439 | 412572051 | NULL | 0 |
| 125951 | 0 | 0 | 15264 | 1509145378 | 343791816 | 125965 | 1509164439 | 265666274 | NULL | 0 |
| 125952 | 0 | 0 | 15298 | 1509145400 | 435134528 | 125966 | 1509164439 | 458663016 | NULL | 0 |
| 125953 | 0 | 0 | 15327 | 1509145409 | 481315206 | 125967 | 1509164439 | 493946984 | NULL | 0 |
| 125954 | 0 | 0 | 15348 | 1509145376 | 340446199 | 125968 | 1509164439 | 262846921 | NULL | 0 |
| 125955 | 0 | 0 | 13470 | 1509145835 | 756466476 | 125998 | 1509164555 | 977155157 | NULL | 0 |
| 125956 | 0 | 0 | 13560 | 1509145824 | 655417514 | 125997 | 1509164544 | 934757217 | NULL | 0 |
| 125957 | 0 | 0 | 13472 | 1509145837 | 765698913 | 125982 | 1509164497 | 774275470 | NULL | 0 |
| 125958 | 0 | 0 | 13474 | 1509145839 | 783852885 | 125984 | 1509164499 | 776906869 | NULL | 0 |
| 125959 | 0 | 0 | 13483 | 1509145848 | 53121952 | 125985 | 1509164508 | 798965064 | NULL | 0 |
| 125960 | 0 | 0 | 13484 | 1509145849 | 60008025 | 125986 | 1509164509 | 801412685 | NULL | 0 |
| 125961 | 0 | 0 | 13471 | 1509145836 | 759186160 | 125981 | 1509164496 | 772208321 | NULL | 0 |
| 125962 | 0 | 0 | 13473 | 1509146738 | 270746156 | 125983 | 1509164498 | 775775544 | NULL | 0 |
| 125969 | 0 | 0 | 13479 | 1509164439 | 526244125 | 125999 | 1509164564 | 84841234 | NULL | 0 |
| 129000 | 0 | 0 | 13498 | 1509182541 | 322822853 | 129011 | 1509182601 | 538799948 | NULL | 0 |
+---------+--------+--------+----------+------------+-----------+-----------+------------+-----------+-----------+--------+
19 rows in set (0.01 sec)
zabbix后台的problem代码逻辑
/usr/share/zabbix/include/classes/mvc/CRouter.php 是zabbix mvc架构的总体路线图。
action是请求request。
control 用来处理action。
view 用于生成页面,HTML, CSV, JSON 等内容。
layout 用于渲染render页面。
例如
action widget.problems.view
control CControllerWidgetProblemsView
layout layout.widget
view monitoring.widget.problems.view
沿着这个思路,就可以知道获取problem数据是在CControllerWidgetProblemsView。
/usr/share/zabbix/include/classes/api/services/CProblem.php
CProblem从zabbix数据库problem表取数据。
触发器和事件举例。
查看触发器id为15353的触发器。
MariaDB [zabbix]> select * from triggers where triggerid='15353';
+-----------+-------------+-----------------------------------+-----+--------+-------+----------+------------+----------+-------+------------+------+-------+-------+---------------+---------------------+------------------+-----------------+--------------+
| triggerid | expression | description | url | status | value | priority | lastchange | comments | error | templateid | type | state | flags | recovery_mode | recovery_expression | correlation_mode | correlation_tag | manual_close |
+-----------+-------------+-----------------------------------+-----+--------+-------+----------+------------+----------+-------+------------+------+-------+-------+---------------+---------------------+------------------+-----------------+--------------+
| 15353 | {16631}>300 | Too many processes on {HOST.NAME} | | 0 | 1 | 2 | 1508918765 | | | 10190 | 0 | 0 | 0 | 0 | | 0 | | 0 |
+-----------+-------------+-----------------------------------+-----+--------+-------+----------+------------+----------+-------+------------+------+-------+-------+---------------+---------------------+------------------+-----------------+--------------+
上面的触发器表达式是{16631}>300。16631是functionid。
MariaDB [zabbix]> describe functions;
+------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------------+------+-----+---------+-------+
| functionid | bigint(20) unsigned | NO | PRI | NULL | |
| itemid | bigint(20) unsigned | NO | MUL | NULL | |
| triggerid | bigint(20) unsigned | NO | MUL | NULL | |
| function | varchar(12) | NO | | | |
| parameter | varchar(255) | NO | | 0 | |
+------------+---------------------+------+-----+---------+-------+
MariaDB [zabbix]> select * from functions where functionid='16631';
+------------+--------+-----------+----------+-----------+
| functionid | itemid | triggerid | function | parameter |
+------------+--------+-----------+----------+-----------+
| 16631 | 28621 | 15353 | avg | 5m |
+------------+--------+-----------+----------+-----------+
MariaDB [zabbix]> select itemid,hostid,name,key_,description from items where itemid='28621';
+--------+--------+---------------------+------------+-----------------------------------------+
| itemid | hostid | name | key_ | description |
+--------+--------+---------------------+------------+-----------------------------------------+
| 28621 | 10259 | Number of processes | proc.num[] | Total number of processes in any state. |
+--------+--------+---------------------+------------+-----------------------------------------+
hostid 10259对应host gb21。
triggerid是唯一的,例如15353触发器是“ Too many processes on {HOST.NAME}”。这个触发器,应用到了host gb21。
[[email protected] vmtest]# ps -ef|wc -l
336
所以满足触发器条件。
下面是一条已解决告警。
MariaDB [zabbix]> describe events;
+--------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------------------+------+-----+---------+-------+
| eventid | bigint(20) unsigned | NO | PRI | NULL | |
| source | int(11) | NO | MUL | 0 | |
| object | int(11) | NO | | 0 | |
| objectid | bigint(20) unsigned | NO | | 0 | |
| clock | int(11) | NO | | 0 | |
| value | int(11) | NO | | 0 | |
| acknowledged | int(11) | NO | | 0 | |
| ns | int(11) | NO | | 0 | |
+--------------+---------------------+------+-----+---------+-------+
8 rows in set (0.00 sec)
MariaDB [zabbix]> select * from events where eventid='90504';
+---------+--------+--------+----------+------------+-------+--------------+-----------+
| eventid | source | object | objectid | clock | value | acknowledged | ns |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
| 90504 | 0 | 0 | 15353 | 1508766001 | 1 | 0 | 989854339 |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
下面是一条未解决告警。也就是说,触发器处于Problem状态。
MariaDB [zabbix]> select * from events where eventid='90548';
+---------+--------+--------+----------+------------+-------+--------------+-----------+
| eventid | source | object | objectid | clock | value | acknowledged | ns |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
| 90548 | 0 | 0 | 15353 | 1508918765 | 1 | 0 | 555842555 |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
MariaDB [zabbix]> describe triggers;
+---------------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+---------------------+------+-----+---------+-------+
| triggerid | bigint(20) unsigned | NO | PRI | NULL | |
| expression | varchar(2048) | NO | | | |
| description | varchar(255) | NO | | | |
| url | varchar(255) | NO | | | |
| status | int(11) | NO | MUL | 0 | |
| value | int(11) | NO | MUL | 0 | |
| priority | int(11) | NO | | 0 | |
| lastchange | int(11) | NO | | 0 | |
| comments | text | NO | | NULL | |
| error | varchar(2048) | NO | | | |
| templateid | bigint(20) unsigned | YES | MUL | NULL | |
| type | int(11) | NO | | 0 | |
| state | int(11) | NO | | 0 | |
| flags | int(11) | NO | | 0 | |
| recovery_mode | int(11) | NO | | 0 | |
| recovery_expression | varchar(2048) | NO | | | |
| correlation_mode | int(11) | NO | | 0 | |
| correlation_tag | varchar(255) | NO | | | |
| manual_close | int(11) | NO | | 0 | |
+---------------------+---------------------+------+-----+---------+-------+
MariaDB [zabbix]> select count(*) from events where objectid='15353';
+----------+
| count(*) |
+----------+
| 847 |
+----------+
当触发条件满足的时候,zabbix生成一个触发器事件event。
MariaDB [zabbix]> select count(*) from events where object='0' and value='1' and objectid='15353';
+----------+
| count(*) |
+----------+
| 424 |
+----------+
未解决的告警有424条?其实不然,还要参考 event_recovery 表。
MariaDB [zabbix]> describe event_recovery;
+---------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------------+------+-----+---------+-------+
| eventid | bigint(20) unsigned | NO | PRI | NULL | |
| r_eventid | bigint(20) unsigned | NO | MUL | NULL | |
| c_eventid | bigint(20) unsigned | YES | MUL | NULL | |
| correlationid | bigint(20) unsigned | YES | | NULL | |
| userid | bigint(20) unsigned | YES | | NULL | |
+---------------+---------------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
MariaDB [zabbix]> select * from events where object='0' and value='1' and objectid='15353';
+---------+--------+--------+----------+------------+-------+--------------+-----------+
| eventid | source | object | objectid | clock | value | acknowledged | ns |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
......
| 90504 | 0 | 0 | 15353 | 1508766001 | 1 | 0 | 989854339 |
| 90548 | 0 | 0 | 15353 | 1508918765 | 1 | 0 | 555842555 |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
424 rows in set (0.01 sec)
以90504和90548为例。
MariaDB [zabbix]> select * from event_recovery where eventid='90548';
Empty set (0.00 sec)
90504事件已经解决。
MariaDB [zabbix]> select * from event_recovery where eventid='90504';
+---------+-----------+-----------+---------------+--------+
| eventid | r_eventid | c_eventid | correlationid | userid |
+---------+-----------+-----------+---------------+--------+
| 90504 | 90522 | NULL | NULL | NULL |
+---------+-----------+-----------+---------------+--------+
1 row in set (0.00 sec)
MariaDB [zabbix]> select * from events where eventid='90522';
+---------+--------+--------+----------+------------+-------+--------------+-----------+
| eventid | source | object | objectid | clock | value | acknowledged | ns |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
| 90522 | 0 | 0 | 15353 | 1508766301 | 0 | 0 | 524849764 |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
1 row in set (0.00 sec)
90548事件没有解决。
MariaDB [zabbix]> select * from events where eventid='90548';
+---------+--------+--------+----------+------------+-------+--------------+-----------+
| eventid | source | object | objectid | clock | value | acknowledged | ns |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
| 90548 | 0 | 0 | 15353 | 1508918765 | 1 | 0 | 555842555 |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
1 row in set (0.00 sec)