Django数据库到postgresql

问题描述：

大约有5000家公司，每家公司有大约4500个价格，总共有2200万个价格。Django数据库到postgresql

现在，前一段时间，我写了存储在该数据的格式像这个 -

class Endday(models.Model): 
    company = models.TextField(null=True) 
    eop = models.CommaSeparatedIntegerField(blank=True, null=True, max_length=50000)

以及存储代码，该代码是 -

for i in range(1, len(contents)): 
    csline = contents[i].split(",") 
    prices = csline[1:len(csline)] 
    company = csline[0] 
    entry = Endday(company=company, eop=prices) 
    entry.save()

虽然，代码为很慢（显然），但它确实工作并将数据存储在数据库中。有一天，我决定删除Endday的所有内容，并尝试再次存储。但它没有工作给我一个错误Database locked。

无论如何，我做了一些研究，并且知道MySql无法处理这么多的数据。那么它是如何存储在第一位的？我得出的结论是，所有这些价格都存储在数据库的最初存储位置，所以这些数据不会被存储。

经过一番研究，我知道应该使用PostgreSql，所以我更改了数据库，进行了迁移，然后再次尝试代码，但没有运气。我得到一个错误saying-

psycopg2.DataError: value too long for type character varying(50000)

好了，所以我想我们来试试使用bulk_create和修改了代码一点，但我是用了同样的错误欢迎。

接下来，我想也许让两个模型，一个持有公司名称和其他的价格和特定公司的关键。所以，再一次，我改变了代码 -

class EnddayCompanies(models.Model): 
    company = models.TextField(max_length=500) 

class Endday(models.Model): 
    foundation = models.ForeignKey(EnddayCompanies, null=True) 
    eop = models.FloatField(null=True)

而且，则须─

to_be_saved = [] 
for i in range(1, len(contents)): 
    csline = contents[i].split(",") 
    prices = csline[1:len(csline)] 
    company = csline[0] 
    companies.append(csline[0]) 
    prices =[float(x) for x in prices] 
    before_save = [] 
    for j in range(len(prices)): 
    before_save.append(Endday(company=company, eop=prices[j])) 
    to_be_saved.append(before_save) 
Endday.objects.bulk_create(to_be_saved)

然而令我惊讶，这是如此缓慢，在中间，它只是停在一家公司。我试图找到其特定的代码被减缓下来，它是 -

before_save = [] 
    for j in range(len(prices)): 
    before_save.append(Endday(company=company, eop=prices[j])) 
    to_be_saved.append(before_save)

好了，现在我又回到了起点，我想不出任何东西，所以我打电话的SO铃。我现在的问题 -

如何去这？
为什么使用MySql保存工作？
有没有更好的方法来做到这一点？（当然必须有）
如果有，是什么？

您似乎收到的数据错误是因为eop超过了50k个字符，而不是实际存储的问题。被锁定的数据库可能与您删除内容的方式有关（即，如果您使用单独的进程执行此操作）。 – Sayse

所以将它改为'TextField'就行了，但它的速度很慢并且无效 – ThatBird

虽然你提出了一个非常详细的问题，但它仍然不是一个完整的MVCE。例如，你的输入数据是什么样的？在其他意义上，它太广泛了。那里有3或4个问题。你为什么不把它分解成不同的问题？ – e4c5

答

我想你可以创建Company和单独的样板Price是这样的：

class Company(models.Model): 
    name = models.CharField(max_length=20) 

class Price(models.Model): 
    company = models.ForeignKey(Company, related_name='prices') 
    price = models.FloatField()

这是你如何保存数据：

# Assuming that contents is a list of strings with a format like this: 
contents = [ 
    'Company 1, 1, 2, 3, 4...', 
    'Company 2, 1, 2, 3, 4...', 
    .... 
] 

for content in contents: 
    tokens = content.split(',') 
    company = Company.objects.create(name=tokens[0]) 
    Price.objects.bulk_create(
     Price(company=company, price=float(x.strip())) 
     for x in tokens[1:] 
    ) 
    # Then you can call prices now from company 
    company.prices.order_by('price')

更新：我刚刚注意到，它与第二个实现类似，唯一的区别是保存数据的方式。我的实现有较少的迭代。

我会试试这个，让你知道 – ThatBird

@ThatBird很乐意知道结果。 :) –

Django数据库到postgresql

相关推荐