从字符串

问题描述：

解析有效的JSON对象或数组我有一个字符串，它可以是两种形式之一：从字符串

name multi word description {...}

或

name multi word description [...]

其中{...}和[...]是任何有效的JSON。我有兴趣解析出字符串的JSON部分，但我不确定最好的方法（特别是因为我不知道这两个字符串是哪一个）。这是我目前的方法：

import json 

string = 'bob1: The ceo of the company {"salary": 100000}' 
o_ind = string.find('{') 
a_ind = string.find('[') 

if o_ind == -1 and a_ind == -1: 
    print("Could not find JSON") 
    exit(0) 

index = min(o_ind, a_ind) 
if index == -1: 
    index = max(o_ind, a_ind) 

json = json.loads(string[index:]) 
print(json)

它的工作原理，但我不禁感觉它可以做得更好。我想也许正则表达式，但我遇到了麻烦匹配子对象和数组，而不是最外面的json对象或数组。有什么建议么？

我认为它简单易读，而不是使用复杂的RegEx。 – thefourtheye

您正在导入Json。只需使用.parse（） – Law

答

您可以通过检查的{或[存在找到JSON的开始，然后一切保存到字符串的结尾为捕获组：

>>> import re 
>>> string1 = 'bob1: The ceo of the company {"salary": 100000}' 
>>> string2 = 'bob1: The ceo of the company ["10001", "10002"]' 
>>> 
>>> re.search(r"\s([{\[].*?[}\]])$", string1).group(1) 
'{"salary": 100000}' 
>>> re.search(r"\s([{\[].*?[}\]])$", string2).group(1) 
'["10001", "10002"]'

这里\s([{\[].*?[}\]])$分解为：

\s - 单个空格字符
括号是capturing group
[{\[]将匹配单个{或[（后者需要用反斜杠转义）
.*?是non-greedy匹配为任何字符的任何次数
[}\]]将匹配单个}和]（后者需要用反斜杠转义）
$意味着字符串的末尾

或者，您可以使用re.split()将字符串拆分为一个空格，后面跟着一个{或[（带有积极的展望）并获取最后一个项目。它适用于样本输入您提供的，但不知道这是一般可靠：

>>> re.split(r"\s(?=[{\[])", string1)[-1] 
'{"salary": 100000}' 
>>> re.split(r"\s(?=[{\[])", string2)[-1] 
'["10001", "10002"]'

答

你可以使用简单的|在正则表达式匹配这两者缺一子：

import re 
import json 

def json_from_s(s): 
    match = re.findall(r"{.+[:,].+}|\[.+[,:].+\]", s) 
    return json.loads(match[0]) if match else None

而且有些测试：

print json_from_s('bob1: The ceo of the company {"salary": 100000}') 
print json_from_s('bob1: The ceo of the company ["salary", 100000]') 
print json_from_s('bob1') 
print json_from_s('{1:}') 
print json_from_s('[,1]')

输出：

{u'salary': 100000} 
[u'salary', 100000] 
None 
None 
None

考虑这种情况：''bob1：公司的首席执行官[{“salary”：100000}]''。正则表达式只匹配内部json对象而不匹配外部json数组 – RPGillespie

我只关注ops问题和解释 – tinySandy

我是OP，我给出的解释是字符串可以是'name multi word description [。 ..]'。我上面给你的情况遵循这种模式，但正则表达式没有捕获它。 – RPGillespie

相关推荐