zabbix监控磁盘RAID的discover模板
通常,我们对硬盘当前的状态不太好确定,一般通过机房人员巡检来完成,有没有通过软件的方式来检查确定这个问题呢。MegaCli就可以做到,一般通过 MegaCli 的“Media Error Count”和“Other Error Count”这两个数值来确定阵列中磁盘是否有问题。
Medai Error Count 表示磁盘可能错误,可能是磁盘有坏道,这个值不为0值得注意,数值越大,危险系数越高; Other Error Count 表示磁盘可能存在松动,可能需要重新再插入;
关于MegaCLI的详细文档,请参考http://www.ttlsa.com/tools/megacli-tool-query-raid-status/
发现脚本
#!/bin/bash ###raid_id_discover.sh ###wuhf### num=0 RAID_stats() { DISK=($(sudo /usr/local/MegaCli/MegaCli64 -pdlist -aALL | grep "Slot Number" | awk -F":" '{print $2}')) printf '{\n\t"data":[\n' for key in ${DISK[@]};do if [[ "${#DISK[@]}" -gt "$num" && "$num" -ne "$((${#DISK[@]}-1))" ]];then printf "\t\t{\"{#RAID_ID}\":\"$key\"},\n" let "num++" elif [[ "$((${#DISK[@]}-1))" -eq "$num" ]];then printf "\t\t{\"{#RAID_ID}\":\"$key\"}\n" fi done printf '\t]\n}\n' } RAID_stats
键值设置
#raid.conf UserParameter=raid_discover,bash /usr/local/zabbix/libexec/raid_id_discover.sh UserParameter=raid_degraded,sudo /usr/local/MegaCli/MegaCli64 -AdpAllInfo -aALL -NoLog | grep "Degraded" |awk '{print $NF}' UserParameter=raid_failed_disks,sudo /usr/local/MegaCli/MegaCli64 -AdpAllInfo -aALL -NoLog | grep "Failed Disks" |awk '{print $NF}' UserParameter=raid_MEC[*],sudo /usr/local/MegaCli/MegaCli64 -PDList -aAll -NoLog | grep -A 8 "Slot Number: $1" | grep "Media Error Count" | awk '{print $NF}' UserParameter=raid_OEC[*],sudo /usr/local/MegaCli/MegaCli64 -PDList -aAll -NoLog | grep -A 8 "Slot Number: $1" | grep "Other Error Count" | awk '{print $NF}'
权限设置
chmod 755 /usr/local/zabbix/libexec/raid_id_discover.sh chown zabbix.zabbix /usr/local/zabbix/libexec/raid_id_discover.sh chown zabbix.zabbix /usr/local/zabbix/etc/zabbix_agentd.conf.d/raid.conf echo "zabbix ALL=(root) NOPASSWD:ALL" >> /etc/sudoers sed -i 's/^Defaults.*.requiretty/#Defaults requiretty/' /etc/sudoers
模板导入
说明:
要理解模板首先要了解MegaCLI命令的详情,这个百度教程有很多;
我提供的模板是在zabbix-3.0的环境上运行的,低版本可能不兼容,只要理解了键值的意义自己可以自定义模板;
转载于:https://blog.51cto.com/wuhf2015/1737763