在for循环上实现大量输入

问题描述:

我一直在尝试改进我的代码(使用numba和multiprocessing),但我无法完全理解它,因为我的函数有很多参数。在for循环上实现大量输入

我已经与其它功能(见下文)简化它...

由于每个代理(一个类的实例)是相互独立的这些动作,我想与Pool更换for

所以我会得到一个大的功能pooling(),我会打电话,并通过代理

from multiprocessing import Pool 

p = Pool(4) 
p.map(pooling, list(agents)) 

但是,你在哪里我补充一点,池功能所需的所有参数列表?

因为它是:

def check_demographics(month, my_agents, families, firms, year, mortality_men, mortality_women, fertility, state_id): 

    dummy = list(my_agents) 
    d = str(state_id.iloc[0]) 

# Place where I would like to replace the LOOP. All below would be a function 

    for agent in dummy: 

     if agent.get_region_id()[:2] == d: 

      # Brithday 
      if month % 12 == agent.month - 1: 
       agent.update_age() 

      # Mortality probability 
      if agent.get_gender() == 'Male': 
       prob = mortality_men[mortality_men['age'] == agent.get_age()][year].iloc[0] 

      # When gender is Female 
      else: 
       # Extract specific agent data to calculate mortality 'Female' 
       prob = mortality_women[mortality_women['age'] == agent.get_age()][year].iloc[0] 

      # Give birth decision 
       age = agent.get_age() 
       if 14 < age < 50: 
        pregnant(agent, fertility, year, families, my_agents) 

      # Mortality procedures 
      if fixed_seed.random() < prob: 
       mortal(my_agents, my_graveyard, families, agent, firms) 

这是我的计划消费函数的时间最多。 和@jit帮助不大。

谢谢一堆

+1

注意:全局变量或参数''my_graveyard''丢失。 –

+0

的确,谢谢。 –

是的,有很多参数!考虑使用一个类。

那么,因为Pool.map只支持一个可迭代的参数,所以您需要将所有内容组合在一起。我建议你使用“Facade”模式:一个中间类,用于存储所有需要的参数,并有一个方法(我称之为check),没有参数(这是一种方法)。

class Facade(object): 
    def __init__(self, agent, d, families, fertility, firms, month, mortality_men, mortality_women, my_agents, 
       my_graveyard, year): 
     self.agent = agent 
     self.d = d 
     self.families = families 
     self.fertility = fertility 
     self.firms = firms 
     self.month = month 
     self.mortality_men = mortality_men 
     self.mortality_women = mortality_women 
     self.my_agents = my_agents 
     self.my_graveyard = my_graveyard 
     self.year = year 

    def check(self): 
     (agent, d, families, fertility, firms, 
     month, mortality_men, mortality_women, 
     my_agents, my_graveyard, year) = (
      self.agent, self.d, self.families, self.fertility, self.firms, 
      self.month, self.mortality_men, self.mortality_women, 
      self.my_agents, self.my_graveyard, self.year) 
     if agent.get_region_id()[:2] == d: 

      # Brithday 
      if month % 12 == agent.month - 1: 
       agent.update_age() 

      # Mortality probability 
      if agent.get_gender() == 'Male': 
       prob = mortality_men[mortality_men['age'] == agent.get_age()][year].iloc[0] 

      # When gender is Female 
      else: 
       # Extract specific agent data to calculate mortality 'Female' 
       prob = mortality_women[mortality_women['age'] == agent.get_age()][year].iloc[0] 

       # Give birth decision 
       age = agent.get_age() 
       if 14 < age < 50: 
        pregnant(agent, fertility, year, families, my_agents) 

      # Mortality procedures 
      if fixed_seed.random() < prob: 
       mortal(my_agents, my_graveyard, families, agent, firms) 

注:我重构实在是太丑了,但我想保持变量名不变清晰度。

然后你的循环可能是类似的东西:

def check_demographics(month, my_agents, families, firms, 
         year, mortality_men, mortality_women, 
         fertility, state_id, my_graveyard): 
    d = str(state_id.iloc[0]) 
    pool = Pool(4) 
    facades = [Facade(agent, d, families, fertility, firms, 
         month, mortality_men, mortality_women, 
         my_agents, my_graveyard, year) 
       for agent in my_agents] 
    pool.map(Facade.check, facades) 

你说,每个代理是相互独立的,但是,在分析环路后,我看你需要的药物的完整列表(的my_agents参数)。 Facade班很明显。因此,您的代理列表不得更改,并且每个代理的内部状态必须在循环期间冻结。

+0

非常好,谢谢。但是你正确的是'my_agents'的变化。这就是为什么我创建了一个新的'list(agents)'我通过它迭代。在这种情况下会起作用吗? –

+0

至少应用''map''作为代理的副本:''list(agents)''。为什么列表更改? –

+0

我做到了。我也遇到了这个错误:'_pickle.PicklingError:Can not pickle :属性查找检查人口统计失败'列表发生变化,因为我必须在他/她去世时删除一个代理。在你的例子中,我没有代理,直到我启动了“for循环” –