有人可以请解释这一点的HtmlAgilityPack代码？

问题描述：

我尽了最大努力通过代码添加评论，但我有点卡在某些部分。有人可以请解释这一点的HtmlAgilityPack代码？

// create a new instance of the HtmlDocument Class called doc 
1: HtmlDocument doc = new HtmlDocument(); 

// the Load method is called here to load the variable result which is html 
// formatted into a string in a previous code snippet 
2: doc.Load(new StringReader(result)); 

// a new variable called root with datatype HtmlNode is created here. 
// Im not sure what doc.DocumentNode refers to? 
3: HtmlNode root = doc.DocumentNode; 
4: 

// a list is getting constructed here. I haven't had much experience 
// with constructing lists yet 
5: List<string> anchorTags = new List<string>(); 
6: 

// a foreach loop is used to loop through the html document to 
// extract html with 'a' attributes I think..  
7: foreach (HtmlNode link in root.SelectNodes("//a")) 
8: { 
// dont really know whats going on here 
9:  string att = link.OuterHtml; 
// dont really know whats going on here too 
10:  anchorTags.Add(att) 
11: }

我已经从here中解除了这段代码示例。 Credit to Farooq Kaiser

我从来没有使用过这个库，而我只是在黑暗中刺入这里，但我会假定doc.DocumentNode是文档的当前节点，在加载文档之后会是根节点。 – 2010-12-21 15:23:57

答

在HTML敏捷性软件包术语中，“// a”的意思是“查找文档中任何地方名为'a'或'A'的所有标签”。有关XPATH的更全面的帮助，请参阅XPATH文档（独立于HTML敏捷性包）。因此，如果您的文档如下所示：

<div> 
    <A href="xxx">anchor 1</a> 
    <table ...> 
    <a href="zzz">anchor 2</A> 
    </table> 
</div>

您将获得两个锚点HTML元素。 OuterHtml表示包含节点本身的节点的HTML，而InnerHtml仅表示节点的HTML内容。所以，在这里两个OuterHtml是：

<A href="xxx">anchor 1</a>

和

<a href="zzz">anchor 2</A>

注意我已指定 'A' 或 'A'，因为HAP实现需要照顾或HTML不区分大小写。和“// A”dos默认情况下不工作。您需要使用小写指定标签。

答

关键是SelectNodes方法。这部分使用XPath从HTML中获取与您的查询匹配的节点列表。

这是我学到了我的XPath：http://www.w3schools.com/xpath/default.asp

那么它只是走过匹配并得到OuterHTML这些节点 - 完整的HTML，包括标签，并将它们添加到字符串列表。列表基本上只是一个数组，但更灵活。如果你只想要内容，而不是封闭标签，你可以使用HtmlNode.InnerHTML或HtmlNode.InnerText。

+1，对于那些发现XPath不可思议的人，可以使用`Elements（）`/`Descendents（）`，然后使用标准的LinqToXml`XElement`语法来查询所有内容。 – 2010-12-21 15:37:45

@Kirk Woll我不知道。我宁愿使用Linq。每次我使用XPath时，我都必须再次查看备忘单。 – LoveMeSomeCode 2010-12-21 20:19:10

有人可以请解释这一点的HtmlAgilityPack代码？

相关推荐