Python在变量的所有可能组合中创建新列/属性

问题描述:

如果我有一个包含变量A,B,C,D,E的数据框我如何创建一个for或while循环来创建变量,使用所有变量和所有可能的数学运算符(+, - ,/,*)的现有变量?Python在变量的所有可能组合中创建新列/属性

为了从数据帧去与以下变量:

ABCDE

到一个这样的:

ABCDE A + B A + C A + d A + EA * B ...

解决方案

你想用numpy

import numpy as np 

vars = [1, 2, 3] 

np.concatenate([ 
     np.add.outer(vars, vars), 
     np.subtract.outer(vars, vars), 
     np.multiply.outer(vars, vars), 
     np.divide.outer(vars, vars)]).flatten() 

的样子:

array([ 2, 3, 4, 3, 4, 5, 4, 5, 6, 0, -1, -2, 1, 0, -1, 2, 1, 
     0, 1, 2, 3, 2, 4, 6, 3, 6, 9, 1, 0, 0, 2, 1, 0, 3, 
     1, 1]) 

说明

# All possible additions 
print np.add.outer(vars, vars) 

# All possible subtractions 
print np.subtract.outer(vars, vars) 

# All possible multiplications 
print np.multiply.outer(vars, vars) 

# All possible divisions 
print np.divide.outer(vars, vars) 

是这样的:

[[2 3 4] 
[3 4 5] 
[4 5 6]] 

[[ 0 -1 -2] 
[ 1 0 -1] 
[ 2 1 0]] 

[[1 2 3] 
[2 4 6] 
[3 6 9]] 

[[1 0 0] 
[2 1 0] 
[3 1 1]] 

您可以使用producteval评估每一个可能的COMBI国家。这些都是使用字典理解保存的,然后与原始数据连接在一起。

from itertools import product 

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4], 'C': [5, 6]}) 

transformations = {"".join(p): eval("df.loc[:, '{0}'] {1} df.loc[:, '{2}']".format(*p)) 
        for p in product(df, list('+-/*'), df)} 
transformations = pd.concat([df, pd.DataFrame(transformations)], axis=1) 

>>> transformations 
    A B C A+A A+B A+C A-A A-B A-C A/A ... C+C C-A C-B C-C C/A C/B C/C C*A C*B C*C 
0 1 3 5 2 4 6 0 -2 -4 1 ... 10 4 2 0 5 2 1 5 15 25 
1 2 4 6 4 6 8 0 -2 -4 1 ... 12 4 2 0 3 2 1 12 24 36 

[2 rows x 39 columns] 
+0

不鼓励使用'eval' – bluesmonk

循环并不是最有效的方法。不过,我认为在这种情况下,使用它们是有必要的。

因此,假设你已经说过要使用除迭代(循环)和没有第三方库之外的其他任何东西进行元素操作。我们可以使用发电机有效地(或尽可能有效地)做到这一点。这是我怎么会去一下:

data = [1, 2, 3, 4, 5] 

operations = { 
    "+": lambda x,y: (value+y for value in x), 
    "-": lambda x,y: (value-y for value in x), 
    "*": lambda x,y: (value*y for value in x), 
    "/": lambda x,y: (value/y for value in x), 
} 

calculations = (
    (key, val, oper(data, val)) 
    for key, oper in sorted(operations.items()) 
    for val in data 
) 

现在让我们来显示结果:

for item in calculations: 
    op, val, res = item 
    preped_res = str.join(
     ' | ', ["{:^6.2g}".format(val) for val in res] 
    ) 

    print(" {} {} {:.2g} = | {} |".format(data, op, val, preped_res)) 

这将是这样的:

[1, 2, 3, 4, 5] * 1 = | 1 | 2 | 3 | 4 | 5 | 
[1, 2, 3, 4, 5] * 2 = | 2 | 4 | 6 | 8 | 10 | 
[1, 2, 3, 4, 5] * 3 = | 3 | 6 | 9 | 12 | 15 | 
[1, 2, 3, 4, 5] * 4 = | 4 | 8 | 12 | 16 | 20 | 
[1, 2, 3, 4, 5] * 5 = | 5 | 10 | 15 | 20 | 25 | 
[1, 2, 3, 4, 5] + 1 = | 2 | 3 | 4 | 5 | 6 | 
[1, 2, 3, 4, 5] + 2 = | 3 | 4 | 5 | 6 | 7 | 
[1, 2, 3, 4, 5] + 3 = | 4 | 5 | 6 | 7 | 8 | 
[1, 2, 3, 4, 5] + 4 = | 5 | 6 | 7 | 8 | 9 | 
[1, 2, 3, 4, 5] + 5 = | 6 | 7 | 8 | 9 | 10 | 
[1, 2, 3, 4, 5] - 1 = | 0 | 1 | 2 | 3 | 4 | 
[1, 2, 3, 4, 5] - 2 = | -1 | 0 | 1 | 2 | 3 | 
[1, 2, 3, 4, 5] - 3 = | -2 | -1 | 0 | 1 | 2 | 
[1, 2, 3, 4, 5] - 4 = | -3 | -2 | -1 | 0 | 1 | 
[1, 2, 3, 4, 5] - 5 = | -4 | -3 | -2 | -1 | 0 | 
[1, 2, 3, 4, 5]/1 = | 1 | 2 | 3 | 4 | 5 | 
[1, 2, 3, 4, 5]/2 = | 0.5 | 1 | 1.5 | 2 | 2.5 | 
[1, 2, 3, 4, 5]/3 = | 0.33 | 0.67 | 1 | 1.3 | 1.7 | 
[1, 2, 3, 4, 5]/4 = | 0.25 | 0.5 | 0.75 | 1 | 1.2 | 
[1, 2, 3, 4, 5]/5 = | 0.2 | 0.4 | 0.6 | 0.8 | 1 | 

希望这回答了你的问题。但如果有什么我错过了,请随时告诉我,我们将看到我们能做些什么。

如果我明白你的问题所在,reindexitertools是你的朋友

In [21]: import pandas as pd 

In [22]: import numpy as np 

In [23]: df = pd.DataFrame({'a':np.arange(5), 'b':np.arange(5), 'c':np.arange(5) 
    ...: }) 

In [24]: df 
Out[24]: 
    a b c 
0 0 0 0 
1 1 1 1 
2 2 2 2 
3 3 3 3 
4 4 4 4 

In [26]: operations = ['*', '/', '-', '+'] 

In [30]: new_columns = list(''.join([a,b,c]) for a,b,c in itertools.product(s1,s2,s1) if a!=c) # joins the permutations of the three elements, and returns those where the first is not repeated i.e. a*a, b-b, etc. You can remove the last if to get all of them 

In [31]: new_columns 
Out[31]: 
['a*b', 
'a*c', 
'a/b', 
'a/c', 
'a-b', 
'a-c', 
'b*a', 
'b*c', 
'b/a', 
'b/c', 
'b-a', 
'b-c', 
'c*a', 
'c*b', 
'c/a', 
'c/b', 
'c-a', 
'c-b'] 

In [33]: df.reindex(columns=[*df.columns, *new_columns], fill_value=np.nan) # rewrites the df using by unpacking the existing columns, and also the new columns. Fill the new empty places with `NaN`. 
Out[33]: 
    a b c a*b a*c a/b a/c a-b a-c b*a ... b/a b/c b-a b-c c*a \ 
0 0 0 0 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN 
1 1 1 1 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN 
2 2 2 2 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN 
3 3 3 3 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN 
4 4 4 4 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN 

    c*b c/a c/b c-a c-b 
0 NaN NaN NaN NaN NaN 
1 NaN NaN NaN NaN NaN 
2 NaN NaN NaN NaN NaN 
3 NaN NaN NaN NaN NaN 
4 NaN NaN NaN NaN NaN 

您可以通过任何你想改变功能''.join()并获得相同的结果。