Python在变量的所有可能组合中创建新列/属性
问题描述:
如果我有一个包含变量A,B,C,D,E的数据框我如何创建一个for或while循环来创建变量,使用所有变量和所有可能的数学运算符(+, - ,/,*)的现有变量?Python在变量的所有可能组合中创建新列/属性
为了从数据帧去与以下变量:
ABCDE
到一个这样的:
ABCDE A + B A + C A + d A + EA * B ...
答
解决方案
你想用numpy
import numpy as np
vars = [1, 2, 3]
np.concatenate([
np.add.outer(vars, vars),
np.subtract.outer(vars, vars),
np.multiply.outer(vars, vars),
np.divide.outer(vars, vars)]).flatten()
的样子:
array([ 2, 3, 4, 3, 4, 5, 4, 5, 6, 0, -1, -2, 1, 0, -1, 2, 1,
0, 1, 2, 3, 2, 4, 6, 3, 6, 9, 1, 0, 0, 2, 1, 0, 3,
1, 1])
说明
# All possible additions
print np.add.outer(vars, vars)
# All possible subtractions
print np.subtract.outer(vars, vars)
# All possible multiplications
print np.multiply.outer(vars, vars)
# All possible divisions
print np.divide.outer(vars, vars)
是这样的:
[[2 3 4]
[3 4 5]
[4 5 6]]
[[ 0 -1 -2]
[ 1 0 -1]
[ 2 1 0]]
[[1 2 3]
[2 4 6]
[3 6 9]]
[[1 0 0]
[2 1 0]
[3 1 1]]
答
您可以使用product
和eval
评估每一个可能的COMBI国家。这些都是使用字典理解保存的,然后与原始数据连接在一起。
from itertools import product
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4], 'C': [5, 6]})
transformations = {"".join(p): eval("df.loc[:, '{0}'] {1} df.loc[:, '{2}']".format(*p))
for p in product(df, list('+-/*'), df)}
transformations = pd.concat([df, pd.DataFrame(transformations)], axis=1)
>>> transformations
A B C A+A A+B A+C A-A A-B A-C A/A ... C+C C-A C-B C-C C/A C/B C/C C*A C*B C*C
0 1 3 5 2 4 6 0 -2 -4 1 ... 10 4 2 0 5 2 1 5 15 25
1 2 4 6 4 6 8 0 -2 -4 1 ... 12 4 2 0 3 2 1 12 24 36
[2 rows x 39 columns]
答
循环并不是最有效的方法。不过,我认为在这种情况下,使用它们是有必要的。
因此,假设你已经说过要使用除迭代(循环)和没有第三方库之外的其他任何东西进行元素操作。我们可以使用发电机有效地(或尽可能有效地)做到这一点。这是我怎么会去一下:
data = [1, 2, 3, 4, 5]
operations = {
"+": lambda x,y: (value+y for value in x),
"-": lambda x,y: (value-y for value in x),
"*": lambda x,y: (value*y for value in x),
"/": lambda x,y: (value/y for value in x),
}
calculations = (
(key, val, oper(data, val))
for key, oper in sorted(operations.items())
for val in data
)
现在让我们来显示结果:
for item in calculations:
op, val, res = item
preped_res = str.join(
' | ', ["{:^6.2g}".format(val) for val in res]
)
print(" {} {} {:.2g} = | {} |".format(data, op, val, preped_res))
这将是这样的:
[1, 2, 3, 4, 5] * 1 = | 1 | 2 | 3 | 4 | 5 |
[1, 2, 3, 4, 5] * 2 = | 2 | 4 | 6 | 8 | 10 |
[1, 2, 3, 4, 5] * 3 = | 3 | 6 | 9 | 12 | 15 |
[1, 2, 3, 4, 5] * 4 = | 4 | 8 | 12 | 16 | 20 |
[1, 2, 3, 4, 5] * 5 = | 5 | 10 | 15 | 20 | 25 |
[1, 2, 3, 4, 5] + 1 = | 2 | 3 | 4 | 5 | 6 |
[1, 2, 3, 4, 5] + 2 = | 3 | 4 | 5 | 6 | 7 |
[1, 2, 3, 4, 5] + 3 = | 4 | 5 | 6 | 7 | 8 |
[1, 2, 3, 4, 5] + 4 = | 5 | 6 | 7 | 8 | 9 |
[1, 2, 3, 4, 5] + 5 = | 6 | 7 | 8 | 9 | 10 |
[1, 2, 3, 4, 5] - 1 = | 0 | 1 | 2 | 3 | 4 |
[1, 2, 3, 4, 5] - 2 = | -1 | 0 | 1 | 2 | 3 |
[1, 2, 3, 4, 5] - 3 = | -2 | -1 | 0 | 1 | 2 |
[1, 2, 3, 4, 5] - 4 = | -3 | -2 | -1 | 0 | 1 |
[1, 2, 3, 4, 5] - 5 = | -4 | -3 | -2 | -1 | 0 |
[1, 2, 3, 4, 5]/1 = | 1 | 2 | 3 | 4 | 5 |
[1, 2, 3, 4, 5]/2 = | 0.5 | 1 | 1.5 | 2 | 2.5 |
[1, 2, 3, 4, 5]/3 = | 0.33 | 0.67 | 1 | 1.3 | 1.7 |
[1, 2, 3, 4, 5]/4 = | 0.25 | 0.5 | 0.75 | 1 | 1.2 |
[1, 2, 3, 4, 5]/5 = | 0.2 | 0.4 | 0.6 | 0.8 | 1 |
希望这回答了你的问题。但如果有什么我错过了,请随时告诉我,我们将看到我们能做些什么。
答
如果我明白你的问题所在,reindex
和itertools
是你的朋友
In [21]: import pandas as pd
In [22]: import numpy as np
In [23]: df = pd.DataFrame({'a':np.arange(5), 'b':np.arange(5), 'c':np.arange(5)
...: })
In [24]: df
Out[24]:
a b c
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
In [26]: operations = ['*', '/', '-', '+']
In [30]: new_columns = list(''.join([a,b,c]) for a,b,c in itertools.product(s1,s2,s1) if a!=c) # joins the permutations of the three elements, and returns those where the first is not repeated i.e. a*a, b-b, etc. You can remove the last if to get all of them
In [31]: new_columns
Out[31]:
['a*b',
'a*c',
'a/b',
'a/c',
'a-b',
'a-c',
'b*a',
'b*c',
'b/a',
'b/c',
'b-a',
'b-c',
'c*a',
'c*b',
'c/a',
'c/b',
'c-a',
'c-b']
In [33]: df.reindex(columns=[*df.columns, *new_columns], fill_value=np.nan) # rewrites the df using by unpacking the existing columns, and also the new columns. Fill the new empty places with `NaN`.
Out[33]:
a b c a*b a*c a/b a/c a-b a-c b*a ... b/a b/c b-a b-c c*a \
0 0 0 0 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN
1 1 1 1 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN
2 2 2 2 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN
3 3 3 3 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN
4 4 4 4 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN
c*b c/a c/b c-a c-b
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
您可以通过任何你想改变功能''.join()
并获得相同的结果。
不鼓励使用'eval' – bluesmonk