如何将汤文件更改为'dict'保存？

问题描述：

我有两张表att:class，并没有其他atttr或td。如何将汤文件更改为'dict'保存？

<table class='content'> 
    <caption> 
    <em> table1 </em> 
    </caption> 
    <tbody> 
    <tr> 
     <th> A </th> 
     <th> B </th> 
     <th> C </th> 
    </tr> 
    <tr> 
     <td> a1 <td> 
     <td> b1 <td> 
     <td> c1 <td> 
    </tr> 
    <tr> 
     <td> a2 <td> 
     <td> b2 <td> 
     <td> c2 <td> 
    </tr> 
    </tbody> 
</table> 

<table class='content'> 
    <caption> 
    <em> table2 </em> 
    </caption> 
    <tbody> 
    <tr> 
     <th> A </th> 
     <th> B </th> 
     <th> C </th> 
    </tr> 
    <tr> 
     <td> a3 <td> 
     <td> b3 <td> 
     <td> c3 <td> 
    </tr> 
    <tr> 
     <td> a4 <td> 
     <td> b4 <td> 
     <td> c4 <td> 
    </tr> 
    </tbody> 
</table>

然后我要像

{table1:[ {A:[a1,a2]}, {B:[b1,b2]}, {C:[c1,c2]} ], table2:[ {A:[a3,a4]}, {B:[b3,b4]}, {C:[c3,c4]} ], }

的字典任何人可以帮助我得到这个快译通或者类似的一个？

答

试试这个（也注意到，你有<td>...<td>代替<td>...</td>）：

import bs4 

your_html = """...""" 
soup = bs4.BeautifulSoup(your_html) 
big_dict = {} 

for table in soup.find_all("table"): 
    key = table.find("em").get_text().strip() 
    big_dict[key] = [] 
    headers = [] 
    for th in table.find_all("th"): 
     headers.append(th.get_text().strip()) 
     big_dict[key].append({headers[-1]: []}) 
    for row in table.find_all("tr"): 
     for i, cell in enumerate(row.find_all("td")): 
      big_dict[key][i][headers[i]].append(cell.get_text().strip()) 

print(big_dict)

以上得到我：

{'table1': [{'A': ['a1', 'a2']}, {'B': ['b1', 'b2']}, {'C': ['c1', 'c2']}], 'table2': [{'A': ['a3', 'a4']}, {'B': ['b3', 'b4']}, {'C': ['c3', 'c4']}]}

非常感谢。虽然我有点困惑，为什么你添加“big_dict [key] .append（{headers [-1]：[]}）”而不是{headers [0]：[]} – Stella

这是因为我们总是需要添加_last_元件。使用'headers [0]'将总是使用相同的第一个元素。 – 2013-07-13 06:10:45

嗯......我只是觉得很难理解，为什么为最后一个词典增加价值。 – Stella

答

你所要求的是找到表格行数据，映射到与表格caption相关的表格标题作为关键字。

{ 
    table[0].caption: { 
     th[n] : [ 
      col[n][0], 
      col[n][1], 
      col[n][1]] 
    } 
}

所以，你需要将任务分解成：

获取标题为表
获取表头
循环在表的每一行，节约每一td的索引作为表格中的相应列。

不用为您编写代码，我可以指示您在documentation的方向上在HTML文档中进行搜索。

请问更具体的问题，我们可以给你一个更直接的答案，但在未来。

@ TankorSmash，谢谢你的逻辑。 – Stella

如何将汤文件更改为'dict'保存？

相关推荐