使用Python检测其他程序是否卡主了
一、需求场景:
大量的windows上运行着某任务的Java程序,程序会出现莫名卡主。卡主时候系统资源利用率都正常,程序也只是卡主不会有任何错误提示,往往业务出现了问题再回头去查时候才会发现,不然就要派人定期巡检。
二、想法:
观察到程序卡主以后日志也停止输出,可以定期对比日志和系统时间,超过规定的阀值视为异常,即刻重启该程序(哎~有各种问题却无人维护的程序还真不少,只能贴狗皮膏药吧)
三、实践:
# -*- coding: utf-8 -*- import os import time import datetime import subprocess from dateutil.parser import parse def error1(fun): def error2(doc): try: fun(doc) except Exception as reason: print(reason) return error2 def read_log(log_file): tag = -1 flag = True with open(log_file, 'rb') as f: lines = f.readlines() while flag: try: last_line = str(lines[tag], encoding='gb2312') print('获取到第%s行的数据为:\n%s' % (tag, last_line)) log_time = last_line.split(",")[0] print('提取行数据中的时间为:%s' % (log_time)) time_format = time.strptime(log_time, "%Y-%m-%d %H:%M:%S") print('时间格式正确!') flag = False except Exception as reason: print(reason) tag -= 1 return log_time def contrast_time(log_time): system_time = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S') print('获取系统时间:%s' % (system_time)) a = parse(log_time) b = parse(system_time) interval = (b - a).total_seconds() print("时间间隔为:%s秒" % (interval)) return interval def restart_script(interval, timeout, proc_name, cmd_file): current_time = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S') f = open('log.txt', 'a+') print('%s 执行重启' % (current_time), file=f) f.close() os.system('taskkill /IM %s /F' % (proc_name)) os.system('taskkill /IM cmd.exe /F') time.sleep(1) subprocess.Popen("%s" % (cmd_file)) @error1 def main(intervals): print('检查的文件为:%s' % (log_file)) log_time = read_log(log_file) interval = contrast_time(log_time) if interval > timeout: print("检测到超时,将重启脚本....") restart_script(interval, timeout, proc_name, cmd_file) else: print("检测到时间间隔在规定范围内") print('==============================================') time.sleep(intervals) if __name__ == '__main__': intervals = 300 timeout = 300 proc_name = 'java.exe' log_file = 'C:\\mailboxcode\\log\\log_' cmd_file = 'C:\\check_log\\start_run.bat' while True: main(intervals)
四、运行演示:
如果最后一行数据不包含时间,将会继续往上找一行。