解析和提取日志文件中两个时间对象之间的值
问题描述:
我正在尝试编写自定义日志解析器。 日志文件如下:我现在面临解析和提取日志文件中两个时间对象之间的值
09:57:25Host_Name Trace 00000
<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcd
some string ---
SQ->
09:57:25Host_Name Trace 00000
<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcd
some string ---
D-->
SQ->
09:57:28Host_Name Trace 00000
<MessageLogTraceRecord Time="2017-04-13T09:57:28.1393344+00:00" abcd
some string ---
D-->
SQ->
09:58:28Host_Name Trace 00000
<MessageLogTraceRecord Time="2017-04-13T09:58:28.1393344+00:00" abcd
some string ---
D-->
SQ->
The goal is to have json output in following format
[{'host_name': host_name, 'time': '2017-04-13T09:58:28.1393344+00:00', 'msg
: '<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcd
some string ---
D-->
SQ->'}, {'host_name': host_name, 'time': '2017-04-13T09:58:28.1393344+00:00', 'msg
: '<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcd
some string ---
D-->
SQ->'}]
问题是让两个时间目标和时间之间的价值。
下面我想:
jsonlist = []
jsonout = {}
li = [i.strip().split() for i in open(filepath).readlines()]
start_index, end_index=0,0
msg = ''
with open(filepath, 'r') as f:
for index, line in enumerate(f):
if start_index !=0 and end_index!=0:
result = list(itertools.chain.from_iterable(li[start_index: end_index]))
msg = ''.join(str(x) for x in result)
jsonoutput['message'] = msg.replace('"', '\\').strip()
jsonoutput['time'] = msg.
start_index, end_index = 0,0
try:
if start_index !=0:
if parser(line.split()[0].split('Host_Name')[0]):
end_index = index
else:
start_index = index
我不能够获得时间价值和正确的味精。在做什么更好的办法任何建议将是非常有益的
答
我写我自己的代码:根据您所提供的数据,在VAR final
看起来像
import json
import re
def logs(file_path):
"""
:param file_path: path to your log file, example: /home/user/my_file.log
"""
msg = ''
final = []
our_log = open(file_path, 'r')
log_lines = our_log.readlines()
for line in log_lines:
time = re.search("^[\d]+:[\d]+:[\d]+", line)
if time:
if msg:
final[-1].update(msg=msg)
msg = ''
time = time.group(0)
host_name = re.search(time + '(.*)' + ' Trace', line).group(1)
# If you need the time like "09:57:25", instead of "'2017-04-13T09:57:25.1393344+00:00"
# then uncomment the line below
# info = dict(time=time, host_name=host_name)
# and comment the one below
info = dict(host_name=host_name)
final.append(info)
else:
# and also comment the next 3 lines
if 'Time="' in line:
time = re.search('Time="' + '(.*)' + '"', line).group(1)
final[-1].update(time=time)
msg += line.strip()
final[-1].update(msg=msg) # adds message for the last time-section
json_out = json.dumps(final)
:
[{'msg': '<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcdsome string ---SQ->', 'time': '2017-04-13T09:57:25.1393344+00:00', 'host_name': 'Host_Name'}, {'msg': '<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcdsome string ---D-->SQ->', 'time': '2017-04-13T09:57:25.1393344+00:00', 'host_name': 'Host_Name'}, {'msg': '<MessageLogTraceRecord Time="2017-04-13T09:57:28.1393344+00:00" abcdsome string ---D-->SQ->', 'time': '2017-04-13T09:57:28.1393344+00:00', 'host_name': 'Host_Name '}, {'msg': '<MessageLogTraceRecord Time="2017-04-13T09:58:28.1393344+00:00" abcdsome string ---D-->SQ->', 'time': '2017-04-13T09:58:28.1393344+00:00', 'host_name': 'Host_Name '}]