如何循环熊猫中特定列的列表值?

问题描述:

我有一个熊猫数据框,其中第一列是列表值。我想循环每个列表的每个str值,并且下一列的值将包含在一起。如何循环熊猫中特定列的列表值?

例如:

tm = pd.DataFrame({'author':[['author_a1','author_a2','author_a3'],['author_b1','author_b2'],['author_c1','author_c2']],'journal':['journal01','journal02','journal03'],'date':pd.date_range('2015-02-03',periods=3)}) 
tm 

    author        date   journal 
0 [author_a1, author_a2, author_a3] 2015-02-03 journal01 
1 [author_b1, author_b2]    2015-02-04 journal02 
2 [author_c1, author_c2]    2015-02-05 journal03 

我想这样的:

author  date   journal 
0 author_a1 2015-02-03 journal01 
1 author_a2 2015-02-03 journal01 
2 author_a3 2015-02-03 journal01 
3 author_b1 2015-02-04 journal02 
4 author_b2 2015-02-04 journal02 
5 author_c1 2015-02-05 journal03 
6 author_c2 2015-02-05 journal03 

我已经使用了复杂的方法来解决这个问题。有没有使用熊猫的简单高效的方法?

author_use = [] 
date_use = [] 
journal_use = [] 

for i in range(0,len(tm['author'])):  
    for m in range(0,len(tm['author'][i])): 
     author_use.append(tm['author'][i][m]) 
     date_use.append(tm['date'][i]) 
     journal_use.append(tm['journal'][i]) 

df_author = pd.DataFrame({'author':author_use, 
         'date':date_use, 
         'journal':journal_use,       
         }) 

df_author 

我想你可以通过嵌套listsstr.len和平板值由chain使用numpy.repeat由legths重复值:

from itertools import chain 

lens = tm.author.str.len() 

df = pd.DataFrame({ 
     "date": np.repeat(tm.date.values, lens), 
     "journal": np.repeat(tm.journal.values,lens), 
     "author": list(chain.from_iterable(tm.author))}) 

print (df) 

     author  date journal 
0 author_a1 2015-02-03 journal01 
1 author_a2 2015-02-03 journal01 
2 author_a3 2015-02-03 journal01 
3 author_b1 2015-02-04 journal02 
4 author_b2 2015-02-04 journal02 
5 author_c1 2015-02-05 journal03 
6 author_c2 2015-02-05 journal03 

另一个numpy解决方案:

df = pd.DataFrame(np.column_stack((tm[['date','journal']].values.\ 
    repeat(list(map(len,tm.author)),axis=0) ,np.hstack(tm.author))), 
    columns=['date','journal','author']) 

print (df) 
        date journal  author 
0 2015-02-03 00:00:00 journal01 auther_a1 
1 2015-02-03 00:00:00 journal01 auther_a2 
2 2015-02-03 00:00:00 journal01 auther_a3 
3 2015-02-04 00:00:00 journal02 auther_b1 
4 2015-02-04 00:00:00 journal02 auther_b2 
5 2015-02-05 00:00:00 journal03 auther_c1 
6 2015-02-05 00:00:00 journal03 auther_c2 
+0

'类型错误:不能根据规则'safe''将dtype('int64')的数组数据转换为dtype('int32')有什么问题? @jezrael –

+0

这个问题是与样品或与真实数据? – jezrael

+0

此问题与示例。 –