将日期时间格式转换为Unix时间戳Pandas
我试图将正常的日期时间转换为熊猫中的unix时间戳。同时寻找一些样本我只能找到一个例子here,但我不能在我的上下文中使用。该数据集没有标题,最后的2 columns
需要转换UNIX time stamp
并与前3列一起生成新的输出。将日期时间格式转换为Unix时间戳Pandas
1466f7b93975983f6e292a8a4faaa4b2,1619b4d0d283c0dddb17d24a359a3b49,36db348cde68592a31d502366fc52932,2010-03-08 17:09:00.472544,2010-03-12 16:09:58.122987
367c13356a5d22158f0ae56977134e2c,eedb7d0714796b64767a8710ea3844a7,925476200929fd346ea312cbe9a046fe,2010-03-08 17:08:29.174236,2010-03-12 16:09:58.122987
edf6b1e4f67b0e8a5080d299c9f9aeb2,7cb7681b90388a7522d0f06578591567,ffde0649a72ded8e33522c503a4d5cbe,2010-03-08 17:08:22.030524,2010-03-12 16:09:58.122987
6bb2ad8bc78897e99072d4d76cf0f19c,b644947ac4db03bdb518cfa71765f8c8,eb25089d396c06255cbb5f1bad801cc4,2010-03-08 17:07:55.819137,2010-03-12 16:09:58.122987
输入文件拥有数百万行,只有少数我已经发布在这里。 任何建议将是有价值的。在此先感谢
您可以先read_csv
然后将最后两列转换为除以10**9
。对于写入文件时使用to_csv
:
import pandas as pd
import numpy as np
import io
temp=u"""1466f7b93975983f6e292a8a4faaa4b2,1619b4d0d283c0dddb17d24a359a3b49,36db348cde68592a31d502366fc52932,2010-03-08 17:09:00.472544,2010-03-12 16:09:58.122987
367c13356a5d22158f0ae56977134e2c,eedb7d0714796b64767a8710ea3844a7,925476200929fd346ea312cbe9a046fe,2010-03-08 17:08:29.174236,2010-03-12 16:09:58.122987
edf6b1e4f67b0e8a5080d299c9f9aeb2,7cb7681b90388a7522d0f06578591567,ffde0649a72ded8e33522c503a4d5cbe,2010-03-08 17:08:22.030524,2010-03-12 16:09:58.122987
6bb2ad8bc78897e99072d4d76cf0f19c,b644947ac4db03bdb518cfa71765f8c8,eb25089d396c06255cbb5f1bad801cc4,2010-03-08 17:07:55.819137,2010-03-12 16:09:58.122987"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp),
header=None, #no header in csv
names=['a','b','c','d', 'e'], #set custom column names
parse_dates=['d','e']) #parse columns d, e to datetime
print df
a b \
0 1466f7b93975983f6e292a8a4faaa4b2 1619b4d0d283c0dddb17d24a359a3b49
1 367c13356a5d22158f0ae56977134e2c eedb7d0714796b64767a8710ea3844a7
2 edf6b1e4f67b0e8a5080d299c9f9aeb2 7cb7681b90388a7522d0f06578591567
3 6bb2ad8bc78897e99072d4d76cf0f19c b644947ac4db03bdb518cfa71765f8c8
c d \
0 36db348cde68592a31d502366fc52932 2010-03-08 17:09:00.472544
1 925476200929fd346ea312cbe9a046fe 2010-03-08 17:08:29.174236
2 ffde0649a72ded8e33522c503a4d5cbe 2010-03-08 17:08:22.030524
3 eb25089d396c06255cbb5f1bad801cc4 2010-03-08 17:07:55.819137
e
0 2010-03-12 16:09:58.122987
1 2010-03-12 16:09:58.122987
2 2010-03-12 16:09:58.122987
3 2010-03-12 16:09:58.122987
df['d'] = df['d'].astype(np.int64) // 10**9
df['e'] = df['e'].astype(np.int64) // 10**9
print df
a b \
0 1466f7b93975983f6e292a8a4faaa4b2 1619b4d0d283c0dddb17d24a359a3b49
1 367c13356a5d22158f0ae56977134e2c eedb7d0714796b64767a8710ea3844a7
2 edf6b1e4f67b0e8a5080d299c9f9aeb2 7cb7681b90388a7522d0f06578591567
3 6bb2ad8bc78897e99072d4d76cf0f19c b644947ac4db03bdb518cfa71765f8c8
c d e
0 36db348cde68592a31d502366fc52932 1268068140 1268410198
1 925476200929fd346ea312cbe9a046fe 1268068109 1268410198
2 ffde0649a72ded8e33522c503a4d5cbe 1268068102 1268410198
3 eb25089d396c06255cbb5f1bad801cc4 1268068075 1268410198
df.to_csv('filename', header=None, index=False)
Unix的日期时间正好是自1月1日的秒数,从1970年正确
所以要保证转换日期:
def dt2ut(dt):
epoch = pd.to_datetime('1970-01-01')
return (dt - epoch).total_seconds()
然后
import pandas as pd
import numpy as np
import io
temp=u"""1466f7b93975983f6e292a8a4faaa4b2,1619b4d0d283c0dddb17d24a359a3b49,36db348cde68592a31d502366fc52932,2010-03-08 17:09:00.472544,2010-03-12 16:09:58.122987
367c13356a5d22158f0ae56977134e2c,eedb7d0714796b64767a8710ea3844a7,925476200929fd346ea312cbe9a046fe,2010-03-08 17:08:29.174236,2010-03-12 16:09:58.122987
edf6b1e4f67b0e8a5080d299c9f9aeb2,7cb7681b90388a7522d0f06578591567,ffde0649a72ded8e33522c503a4d5cbe,2010-03-08 17:08:22.030524,2010-03-12 16:09:58.122987
6bb2ad8bc78897e99072d4d76cf0f19c,b644947ac4db03bdb518cfa71765f8c8,eb25089d396c06255cbb5f1bad801cc4,2010-03-08 17:07:55.819137,2010-03-12 16:09:58.122987"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), header=None, names=['a','b','c','d', 'e'])
df['d'] = df['d'].apply(dt2ut).astype(np.int64)
df['e'] = df['e'].apply(dt2ut).astype(np.int64)
我尝试比较解决方案,但我们的输出是不同的......你能检查你的解决方案吗? – jezrael
我的appologies,转换功能上的错字。属性'秒'应该是方法'total_seconds()' – piRSquared
非常感谢你的解决方案。 :) –
太感谢你了..让我与文件读取运行它,并回信。 –
再次感谢您的编辑。我是熊猫初学者..通过阅读CSV文件的方式不是问题,但写入输出文件是;) –
没问题,我编辑解决方案。 – jezrael