解析SQL Server 2017中的JSON(Clearbit API调用)

问题描述:

我通过Clearbit提供程序通过API调用将一些数据拉入本地服务器上的数据库。关于使用SQL Server 2017解析数据的一切都是确定的,直到我遇到困难。解析SQL Server 2017中的JSON(Clearbit API调用)

我会直接进行示例以便于理解。

这是一个API调用输出的JSON的例子

{ 
    "id": "384dfe0d-5bba-445e-a390-2d946dc84a12", 
    "name": "Honeywell", 
    "legalName": "Honeywell International Inc", 
    "domain": "honeywell.com", 
    "domainAliases": [ 
    "honeywell.at", 
    "honeywell.it", 
    "evohome.info", 
    "wifithermostat.com", 
    "emsaviation.com", 
    "mytotalconnect.com", 
    "honeywell.nl", 
    "honeywell.co.za", 
    "honeywell.com.au", 
    "honeywell.ca", 
    "alliedsignal.com", 
    "emsdss.com", 
    "primusepic.com", 
    "alarmnet-me.com", 
    "lebow.com", 
    "honeywell.ie", 
    "honeywell.jp", 
    "honeywell.com.br", 
    "trendcontrol.co.uk", 
    "honeywellforjaguar.co.uk", 
    "aviaso.com", 
    "skyforce.co.uk", 
    "newenglandinstruments.com", 
    "honeywell.fi", 
    "alarmnet.com", 
    "skyconnect.com", 
    "skyforceuk.com", 
    "securitex.com", 
    "missionready.com", 
    "honeywellaerospace.com", 
    "formation.com", 
    "aclon.com", 
    "electrocorp.com", 
    "ultrak.com", 
    "satcom1.com", 
    "hsmpats.com", 
    "myaerospace.com", 
    "emsglobaltracking.com", 
    "fascocontrols.com", 
    "honeywellnow.com", 
    "bendixbrakes.com", 
    "elmwoodsensors.com", 
    "ovationselect.com", 
    "honeywellbusinessaviation.com", 
    "iflyaspire.com", 
    "btrinc.com", 
    "honeywellspecialtymaterials.com", 
    "magneticsensors.com", 
    "activeye.com", 
    "egarrett.com", 
    "novar-eds.com", 
    "aviaso.co.uk", 
    "chadwick-helmuth.com", 
    "datainstruments.com", 
    "lebowproducts.com", 
    "honeywell-produktkatalog.de", 
    "honeywellforjaguar.com", 
    "hobbs-corp.com", 
    "emsgt.com", 
    "honeywellaes.com", 
    "honeywellbuildingsolutions.com", 
    "satcom1.aero", 
    "honeywell-building-solutions.de", 
    "lifesafetydistribution.com", 
    "godirect.com", 
    "garrettbulletin.com", 
    "yourhomeexpert.com", 
    "aerospacetrading.com", 
    "sensorsystems.com", 
    "wifithermostat.info", 
    "honeywell-fachseminare.de", 
    "hobbscorporation.com", 
    "kcl.hu", 
    "honeywell.sk", 
    "esser.info", 
    "inertialsensor.com", 
    "sensotec.com", 
    "notifier.com", 
    "honeywellgreer.com", 
    "smartact.de", 
    "honeywellfire.com", 
    "iris-systems.com", 
    "honeywell.ru", 
    "lxei.com", 
    "thermalswitch.com", 
    "hightempsolutions.com", 
    "aubetech.com", 
    "honeywell-haustechnik.de", 
    "careersathoneywell.com", 
    "garrettbyhoneywell.com", 
    "honeywell.in", 
    "honeywell.cn", 
    "honeywell.com.mx", 
    "kcp.com", 
    "satamatics.com", 
    "myflite.com" 
    ], 
    "site": { 
    "title": "Honeywell", 
    "h1": null, 
    "metaDescription": " We are blending products with software solutions to link people and businesses to the information they need to be more efficient, safer and connected. ", 
    "metaAuthor": null, 
    "phoneNumbers": [ 
     "+1 877-271-8620", 
     "+1 800-633-3991", 
     "+1 877-841-2840", 
     "+1 480-353-3020", 
     "+1 973-455-3388", 
     "+1 973-204-9621", 
     "+32 2 728 20 45", 
     "+32 476 20 90 19", 
     "+44 7794 007289", 
     "+86 21 2219 6509" 
    ], 
    "emailAddresses": [ 
     "[email protected]", 
     "[email protected]", 
     "[email protected]", 
     "[email protected]", 
     "[email protected]", 
     "[email protected]", 
     "[email protected]", 
     "[email protected]", 
     "[email protected]", 
     "[email protected]", 
     "[email protected]" 
    ] 
    }, 
    "category": { 
    "sector": "Consumer Discretionary", 
    "industryGroup": "Automobiles & Components", 
    "industry": "Automotive", 
    "subIndustry": "Automotive", 
    "sicCode": "3714", 
    "naicsCode": null 
    }, 
    "tags": [ 
    "Automotive", 
    "Enterprise", 
    "B2B", 
    "Electrical" 
    ], 
    "description": " We are blending products with software solutions to link people and businesses to the information they need to be more efficient, safer and connected. ", 
    "foundedYear": 1936, 
    "location": "115 Tabor Rd, Morris Plains, NJ 07950, USA", 
    "timeZone": "America/New_York", 
    "utcOffset": -4, 
    "geo": { 
    "streetNumber": "115", 
    "streetName": "Tabor Road", 
    "subPremise": null, 
    "city": "Morris Plains", 
    "postalCode": "07950", 
    "state": "New Jersey", 
    "stateCode": "NJ", 
    "country": "United States", 
    "countryCode": "US", 
    "lat": 40.8358456, 
    "lng": -74.4771042 
    }, 
    "logo": "https://logo.clearbit.com/honeywell.com", 
    "facebook": { 
    "handle": "293855263965203", 
    "likes": null 
    }, 
    "linkedin": { 
    "handle": "company/honeywell" 
    }, 
    "twitter": { 
    "handle": "HoneywellNow", 
    "id": "257492733", 
    "bio": "Please visit us over at @Honeywell.", 
    "followers": 2322, 
    "following": 1, 
    "location": "Morris Plains, NJ", 
    "site": "https:", 
    "avatar": 
    }, 
    "crunchbase": { 
    "handle": "organization/honeywell" 
    }, 
    "emailProvider": false, 
    "type": "public", 
    "ticker": "HON", 
    "phone": "+1 973-455-2000", 
    "metrics": { 
    "alexaUsRank": 6045, 
    "alexaGlobalRank": 18053, 
    "googleRank": null, 
    "employees": 51779, 
    "employeesRange": "1000+", 
    "marketCap": 102920000000, 
    "raised": null, 
    "annualRevenue": 39302000000, 
    "fiscalYearEnd": 12 
    }, 
    "indexedAt": "2017-07-11T23:00:41.115Z", 
    "tech": [ 
    "crazy_egg", 
    "google_analytics", 
    "google_tag_manager", 
    "asp_net", 
    "mouseflow", 
    "marketo", 
    "go_squared", 
    "microsoft_exchange_online", 
    "outlook", 
    "recaptcha" 
    ], 
    "parent": { 
    "domain": null 
    }, 
    "similarDomains": [ 
    "abb-livingspace.com", 
    "alerton.com", 
    "gereports.com", 
    "honeywellprocess.com", 
    "honeywelluk.com", 
    "johnsoncontrols.com", 
    "jpinstruments.com", 
    "lenel.com", 
    "maxitrol.com", 
    "nucalgon.com", 
    "schneider-electric.us", 
    "siemens.com" 
    ] 
} 

如果你看一下例子在这里你会看到"domainAliases": [...] 那就是JSON的一部分,我仍然需要解析。

这是SQL解析查询我已经有了:

SELECT * 
    , JSON_VALUE(JSONData,'$.name') AS CompanyName 
    , JSON_VALUE(JSONData,'$.category.sector') AS CategorySector 
    , JSON_VALUE(JSONData, '$.category.industryGroup') AS CategoryIndustryGroup 
    , JSON_VALUE(JSONData, '$.category.industry') AS CategoryIndustry 
    , JSON_VALUE(JSONData, '$.category.subIndustry') AS CategorySubIndustry 
    , JSON_VALUE(JSONData, '$.category.sicCode') AS CategorySicCode 
    , JSON_VALUE(JSONData, '$.category.naicsCode') AS CategoryNaicsCode 
    , JSON_VALUE(JSONData, '$.metrics.employees') AS EmployeesNumber 
    , JSON_VALUE(JSONData, '$.metrics.employeesRange') AS EmployeesRange 
    , JSON_VALUE(JSONData, '$.metrics.marketCap') AS MarketCap 
    , JSON_VALUE(JSONData, '$.metrics.annualRevenue') AS AnnualRevenue 
    , JSON_VALUE(JSONData, '$.similarDomains') AS SimilarDomains 
FROM Domains; 

我想这个数据(“domainAliases”)被存储在其他表作为在上查询的数据(我知道我已经拥有的解析查询只有一个SELECT查询,但我也有查询的UPDATE版本)。

下面是一张新表中相同数据库应该看起来如何完成产品的示例图。左边一栏叫做Company Name,第2列被称为Domain Aliases

The left column is called Company Name, the 2nd column is called Domain Aliases.

现在WHEREJSON数据存储在哪里?我将它存储在一个名为JSONData的列中,表名为:Domains,所有这些都存储在名为Domainbank的数据库中。 JSONData数据类型为nvarchar(max)

我需要将数据按公司名称分组,并且在公司名称旁边应该有别名域,就像图片示例所示。现在请记住,我将运行此查询10k + JSONData s,即将创建的新表将会超级巨大,但只要它们都按公司名称分组,并且所有别名域都应该是好的。某些JSONData s没有以正确的格式返回API调用,因为它们要么没有找到数据,要么发生其他错误,所以如果查询没有找到"domainAliases": [...]下的任何东西,或者它甚至没有找到"domainAliases": [...]那么我不需要公司出现在新桌子上。

如此短暂回顾一下:让我们做一个新表(我们称之为AliasDomains),发现"domainAliases": [...]下的数据也拉动了公司的名字在JSON_VALUE(JSONData,'$.name') AS CompanyName,存储在新表中的数据作为画面例如在后高再由CompanyName组。

所以,从你的帖子我不完全清楚你的问题是什么,但我认为这是如何写一些SQL语句来完成上述?

首先,我会说你不应该在插入,做GROUP BY当从表中检索数据。

说了这么多,你可以很容易地从Domains表完成你想要的一个SELECT连同CROSS APPLY OPENJSON声明,就像这样:

INSERT INTO AliasDomains(CompanyName, DomainAliases) 
SELECT JSON_VALUE(JSONData, '$.name'), value 
FROM Domains 
CROSS APPLY OPENJSON (JSONData, '$.domainAliases') 

编辑:也许应该补充一点value在上面语句从OPENJSON返回,例如它引用您想要的路径(在此例中为domainAliases)的值。

希望这有助于?

Niels

+0

感谢您的回答,它工作得很好:) – Vissow