乘以不同长度的数据帧

问题描述:

我有两个数据帧:两个数据帧都有5列,但第一列有100行,第二列有一列。我应该将第一个数据帧的每一行乘以第二行的这一行,然后总结每一行中列的值,并在第6个新列的“乘法和”中总结这个值。我已经看到“np.dot”操作,但我不确定我是否可以将它应用到数据框中。另外,我正在寻找pythonic/pandas操作或方法,如果可以从头开始替换一点点粗糙的代码,请提前感谢。建议乘以不同长度的数据帧

+4

给予的代码和数据的例子有助于我们回答更快。 – tfv

我想你可以通过values,他们多次和最后sum转换DataFramesnumpy arrays

import pandas as pd 
import numpy as np 

np.random.seed(1) 
df1 = pd.DataFrame(np.random.randint(10, size=(1,5))) 
df1.columns = list('ABCDE') 
print df1 
    A B C D E 
0 5 8 9 5 0 

np.random.seed(0) 
df2 = pd.DataFrame(np.random.randint(10,size=(10,5))) 
df2.columns = list('ABCDE') 
print df2 
    A B C D E 
0 5 0 3 3 7 
1 9 3 5 2 4 
2 7 6 8 8 1 
3 6 7 7 8 1 
4 5 9 8 9 4 
5 3 0 3 5 0 
6 2 3 8 1 3 
7 3 3 7 0 1 
8 9 9 0 4 7 
9 3 2 7 2 0 
print df2.values * df1.values 
[[25 0 27 15 0] 
[45 24 45 10 0] 
[35 48 72 40 0] 
[30 56 63 40 0] 
[25 72 72 45 0] 
[15 0 27 25 0] 
[10 24 72 5 0] 
[15 24 63 0 0] 
[45 72 0 20 0] 
[15 16 63 10 0]] 

df = pd.DataFrame(df2.values * df1.values) 
df['sum'] = df.sum(axis=1) 
print df 
    0 1 2 3 4 sum 
0 25 0 27 15 0 67 
1 45 24 45 10 0 124 
2 35 48 72 40 0 195 
3 30 56 63 40 0 189 
4 25 72 72 45 0 214 
5 15 0 27 25 0 67 
6 10 24 72 5 0 111 
7 15 24 63 0 0 102 
8 45 72 0 20 0 137 
9 15 16 63 10 0 104 

定时

In [1185]: %timeit df2.mul(df1.ix[0], axis=1) 
The slowest run took 5.07 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 287 µs per loop 

In [1186]: %timeit pd.DataFrame(df2.values * df1.values) 
The slowest run took 6.31 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 98 µs per loop 

您可能正在寻找这样的事情:

import pandas as pd 
import numpy as np 

df1 = pd.DataFrame({ 'A' : [1.1,2.7, 3.4], 
        'B' : [-1.,-2.5, -3.9]}) 

df1['sum of multipliations']=df1.sum(axis = 1) 


df2 = pd.DataFrame({ 'A' : [2.], 
        'B' : [3.], 
        'sum of multipliations' : [1.]}) 

print df1 
print df2 

row = df2.ix[0] 
df5=df1.mul(row, axis=1) 
df5.loc['Total']= df5.sum() 
print df5