如何解析特殊的日期格式在不同的列与大熊猫
问题描述:
与特殊的日期格式XLS数据,如:如何解析特殊的日期格式在不同的列与大熊猫
start day(utc) start time(utc)
20160401 100
20160401 200
20160401 300
20160401 400
20160401 500
我想分析它作为格式2016年4月1日凌晨1点, 我用熊猫读了表;
parse = lambda x: datetime.strptime(str(x), '%Y%m%d %H')
content=pd.read_excel(filepath,skiprows=1,
na_values=['nan',-9999.0,9999.0,
'-9999.0 -',-99,'-99.000 -',-999],
parse_cols=[1,2,3,4,5,6,7,8,9,10,11,12,14],
header=None, parse_dates = [0,1],
index_col = 0,
date_parser=parse)
但发生错误。它显示:
File "D:\Anaconda2\lib\_strptime.py", line 332, in _strptime
(data_string, format))
ValueError: time data '100' does not match format '%Y%m%d'
我该如何处理它?
答
您可以使用to_timedelta
,因为100必要的鸿沟:
content=pd.read_excel(filepath,skiprows=1,
na_values=['nan',-9999.0,9999.0,
'-9999.0 -',-99,'-99.000 -',-999],
parse_cols=[1,2,3,4,5,6,7,8,9,10,11,12,14],
header=None, parse_dates = [0],
index_col = 0)
df.index = df.index + pd.to_timedelta(df['start time(utc)']/100., unit='h')
df = df.drop('start time(utc)', axis=1)
如果没有必要的(时间是0,1,2..23
)变化parse_dates = [0,1]
到parse_dates = [[0,1]]
:
样品:
import pandas as pd
from pandas.compat import StringIO
temp=u"""start day(utc);start time(utc);a
20160401;1;1
20160401;2;7
20160401;3;7
20160401;4;5
20160401;5;3"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
parse = lambda x: datetime.strptime(x, '%Y%m%d %H')
df = pd.read_csv(StringIO(temp), sep=";",
parse_dates = [[0,1]],
index_col = 0,
date_parser=parse)
print (df)
a
start day(utc)_start time(utc)
2016-04-01 01:00:00 1
2016-04-01 02:00:00 7
2016-04-01 03:00:00 7
2016-04-01 04:00:00 5
2016-04-01 05:00:00 3
print (df.index)
DatetimeIndex(['2016-04-01 01:00:00', '2016-04-01 02:00:00',
'2016-04-01 03:00:00', '2016-04-01 04:00:00',
'2016-04-01 05:00:00'],
dtype='datetime64[ns]', name='start day(utc)_start time(utc)', freq=None)
我用上面的方法解决了它。谢谢:) –
很高兴能帮助,祝你好运! – jezrael