如何使用层次结构索引来保存和检索Pandas数据框?

问题描述:

我需要创建并保存带有分层索引的Pandas数据框。在下面我创建两个数据框,然后连接它们以创建一个具有分层索引的新数据框。如何使用层次结构索引来保存和检索Pandas数据框?

data1 = np.random.rand(5,5) 
data2 = np.random.rand(5,5) 
df1 = pd.DataFrame(data1, columns = ['a', 'b', 'c', 'd', 'e'], index=['i1', 'i2', 'i3', 'i4', 'i5']) 
df2 = pd.DataFrame(data2, columns = ['a', 'b', 'c', 'd', 'e'], index=['i1', 'i2', 'i3', 'i4', 'i5']) 

df = pd.concat([df1, df2], keys=['first', 'second']) 

print "Original Data frame" 
print df 

# Save to file. 
df.to_csv('test') 

# Read from file. 
df_new = pd.DataFrame.from_csv('test') 

print "Saved Data frame" 
print df_new 

下面是输出,我得到的,

Original Data frame 
        a   b   c   d   e 
first i1 0.926553 0.180306 0.182887 0.783061 0.832914 
     i2 0.899054 0.130367 0.615534 0.965580 0.669495 
     i3 0.931004 0.425528 0.068938 0.166522 0.714399 
     i4 0.082365 0.587194 0.993864 0.187864 0.066035 
     i5 0.668671 0.294744 0.136317 0.358732 0.529674 
second i1 0.916310 0.361423 0.700380 0.386119 0.273667 
     i2 0.102542 0.454106 0.565760 0.259323 0.104743 
     i3 0.410280 0.379986 0.288921 0.177819 0.919343 
     i4 0.447279 0.113711 0.032273 0.335358 0.717824 
     i5 0.995781 0.356817 0.146785 0.972401 0.169360 

Saved Data frame 
     Unnamed: 1   a   b   c   d   e 
first   i1 0.926553 0.180306 0.182887 0.783061 0.832914 
first   i2 0.899054 0.130367 0.615534 0.965580 0.669495 
first   i3 0.931004 0.425528 0.068938 0.166522 0.714399 
first   i4 0.082365 0.587194 0.993864 0.187864 0.066035 
first   i5 0.668671 0.294744 0.136317 0.358732 0.529674 
second   i1 0.916310 0.361423 0.700380 0.386119 0.273667 
second   i2 0.102542 0.454106 0.565760 0.259323 0.104743 
second   i3 0.410280 0.379986 0.288921 0.177819 0.919343 
second   i4 0.447279 0.113711 0.032273 0.335358 0.717824 
second   i5 0.995781 0.356817 0.146785 0.972401 0.169360 

当我这个新的数据帧保存到一个CSV文件(“测试”),并读回,我失去了分层索引。有没有办法将数据保存到文件中,这样当我读回数据时,我会保留分层索引?

以另一种方式保存它,而不是使用csv。例如泡菜:

df.to_pickle('dataframe.pickle') 

这保留了分级索引。你读它又来了:

pd.read_pickle('dataframe.pickle') 

大熊猫有几个IO方法,你可以在documentation读到它们。

您可以:

重置索引和数据帧保存到CSV,阅读它从CSV回来,然后 设置索引回到原来的(就地)。

df 
Out[11]: 
        a   b   c   d   e 
first i1 0.935478 0.455757 0.607418 0.850291 0.704326 
     i2 0.675752 0.339017 0.999949 0.508480 0.888817 
     i3 0.463371 0.803389 0.048469 0.599697 0.423603 
     i4 0.935294 0.933699 0.843289 0.182535 0.255847 
     i5 0.321236 0.120010 0.647876 0.000517 0.032592 
second i1 0.172044 0.691660 0.799164 0.194785 0.302880 
     i2 0.432988 0.511229 0.451268 0.203145 0.560563 
     i3 0.442584 0.771483 0.839945 0.716374 0.533183 
     i4 0.167898 0.962646 0.152245 0.400280 0.210355 
     i5 0.736365 0.511057 0.256672 0.619250 0.790739 

df.reset_index() 
Out[12]: 
    level_0 level_1   a   b   c   d   e 
0 first  i1 0.935478 0.455757 0.607418 0.850291 0.704326 
1 first  i2 0.675752 0.339017 0.999949 0.508480 0.888817 
2 first  i3 0.463371 0.803389 0.048469 0.599697 0.423603 
3 first  i4 0.935294 0.933699 0.843289 0.182535 0.255847 
4 first  i5 0.321236 0.120010 0.647876 0.000517 0.032592 
5 second  i1 0.172044 0.691660 0.799164 0.194785 0.302880 
6 second  i2 0.432988 0.511229 0.451268 0.203145 0.560563 
7 second  i3 0.442584 0.771483 0.839945 0.716374 0.533183 
8 second  i4 0.167898 0.962646 0.152245 0.400280 0.210355 
9 second  i5 0.736365 0.511057 0.256672 0.619250 0.790739 

df.reset_index().to_csv('test.csv', index=False) 
df3 = pd.read_csv('test.csv') 
df3.set_index(['level_0', 'level_1'], inplace=True) 

>>> df3 
Out[15]: 
         a   b   c   d   e 
level_0 level_1             
first i1  0.935478 0.455757 0.607418 0.850291 0.704326 
     i2  0.675752 0.339017 0.999949 0.508480 0.888817 
     i3  0.463371 0.803389 0.048469 0.599697 0.423603 
     i4  0.935294 0.933699 0.843289 0.182535 0.255847 
     i5  0.321236 0.120010 0.647876 0.000517 0.032592 
second i1  0.172044 0.691660 0.799164 0.194785 0.302880 
     i2  0.432988 0.511229 0.451268 0.203145 0.560563 
     i3  0.442584 0.771483 0.839945 0.716374 0.533183 
     i4  0.167898 0.962646 0.152245 0.400280 0.210355 
     i5  0.736365 0.511057 0.256672 0.619250 0.790739