如何从SQL Server列中的html中获取数据
问题描述:
我在SQL Server列中有一些html内容,我想从html中读取内容。如何从SQL Server列中的html中获取数据
例如:
<ektdesignns_choices ektdesignns_nodetype="element" title="How many gigs do you play each month?" ektdesignns_caption="How many gigs do you play each month?" name="ektpoll1303074024421" ektdesignns_name="ektpoll1303074024421" id="ektpoll1303074024421">
<ol contenteditable="false" onkeypress="design_validate_choice(1, -1, this, 'Options are required.')" onclick="design_validate_choice(1, -1, this, 'Options are required.')" onblur="design_validate_choice(1, -1, this, 'Options are required.')" ektdesignns_validation="choice-req" ektdesignns_maxoccurs="1" ektdesignns_minoccurs="1" unselectable="on" title="How many gigs do you play each month?" class="design_list_vertical">
<li>
<input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="1 or fewer_1" title="1 or fewer" id="ID2504263" />
<label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label>
</li>
<li>
<input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="2-4_2" title="2-4" id="ID5115606" />
<label contenteditable="true" unselectable="off" for="ID5115606">2-4</label>
</li>
<li>
<input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="5-7_3" title="5-7" id="ID477116" />
<label contenteditable="true" unselectable="off" for="ID477116">5-7</label>
</li>
<li>
<input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="8 or more_4" title="8 or more" id="ID5515606" />
<label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label>
</li>
</ol>
</ektdesignns_choices><input type="submit" value="Vote" />
我想读这个网站的所有标签。任何人有任何想法,我该怎么办?
答
如果您的HTML确实符合XHTML,如果您有存储在XML
列在SQL Server表中的HTML,那么你可以从它在T-SQL中使用XQuery检索您的标签:
DECLARE @HtmlTbl TABLE (ID INT IDENTITY, Html XML)
INSERT INTO @HtmlTbl(Html) VALUES('<ektdesignns_choices ektdesignns_nodetype="element" title="How many gigs do you play each month?" ektdesignns_caption="How many gigs do you play each month?" name="ektpoll1303074024421" ektdesignns_name="ektpoll1303074024421" id="ektpoll1303074024421">
<ol contenteditable="false" onkeypress="design_validate_choice(1, -1, this, ''Options are required.'')" onclick="design_validate_choice(1, -1, this, ''Options are required.'')" onblur="design_validate_choice(1, -1, this, ''Options are required.'')" ektdesignns_validation="choice-req" ektdesignns_maxoccurs="1" ektdesignns_minoccurs="1" unselectable="on" title="How many gigs do you play each month?" class="design_list_vertical">
<li>
<input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="1 or fewer_1" title="1 or fewer" id="ID2504263" />
<label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label>
</li>
<li>
<input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="2-4_2" title="2-4" id="ID5115606" />
<label contenteditable="true" unselectable="off" for="ID5115606">2-4</label>
</li>
<li>
<input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="5-7_3" title="5-7" id="ID477116" />
<label contenteditable="true" unselectable="off" for="ID477116">5-7</label>
</li>
<li>
<input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="8 or more_4" title="8 or more" id="ID5515606" />
<label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label>
</li>
</ol></ektdesignns_choices><input type="submit" value="Vote" />')
这将检索所有从你的(X)元素HTML作为一个单独的XML字符串:
SELECT
Html.query('//label')
FROM @HtmlTbl
WHERE ID = 1
输出:
<label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label>
<label contenteditable="true" unselectable="off" for="ID5115606">2-4</label>
<label contenteditable="true" unselectable="off" for="ID477116">5-7</label>
<label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label>
或者这将选择<label>
标签的所有内容,每行一个:
SELECT
C.value('(.)[1]', 'varchar(1000)')
FROM @HtmlTbl
CROSS APPLY Html.nodes('//label') AS T(C)
WHERE ID = 1
输出:
1 or fewer
2-4
5-7
8 or more
答
从数据库中提取数据,然后使用HTML解析器提取所需的信息。它会让你的生活变得更容易很多。
不管你做什么,请不要尝试,除非你是只寻找一个正则表达式匹配的数据使用RegExs。 (因为HTML不是常规语言,它通常会导致比解决问题更多的问题)
答
如果您拥有的所有HTML格式都与此格式相同,则可以将其转换为XML并使用一些XQuery来查找标签节点,
select T.N.value('.', 'nvarchar(100)')
from Table
cross apply XMLCol.nodes('//label') as T(N)