HTML敏捷包:解析href标记

问题描述:

我将如何有效地解析从这个href属性值:HTML敏捷包:解析href标记

<tr> 
<td rowspan="1" colspan="1">7</td> 
<td rowspan="1" colspan="1"> 
<a class="undMe" href="/ice/player.htm?id=8475179" rel="skaterLinkData" shape="rect">D. Kulikov</a> 
</td> 
<td rowspan="1" colspan="1">D</td> 
<td rowspan="1" colspan="1">0</td> 
<td rowspan="1" colspan="1">0</td> 
<td rowspan="1" colspan="1">0</td> 
[...] 

我很感兴趣,让玩家ID,它是:这里是我的代码迄今:根据你的榜样

 // Iterate all rows (players) 
     for (int i = 1; i < rows.Count; ++i) 
     { 
      HtmlNodeCollection cols = rows[i].SelectNodes(".//td"); 

      // new player 
      Dim_Player player = new Dim_Player(); 

       // Iterate all columns in this row 
       for (int j = 1; j < 6; ++j) 
       { 
        switch (j) { 
         case 1: player.Name = cols[j].InnerText; 
           player.Player_id = Int32.Parse(/* this is where I want to parse the href value */); 
           break; 
         case 2: player.Position = cols[j].InnerText; break; 
         case 3: stats.Goals = Int32.Parse(cols[j].InnerText); break; 
         case 4: stats.Assists = Int32.Parse(cols[j].InnerText); break; 
         case 5: stats.Points = Int32.Parse(cols[j].InnerText); break; 
        } 
       } 
+0

如果你已经很难在'之开关编码索引,为什么你会使用`for`循环?为什么不``player.Position = cols [2] .InnerText;` – 2011-12-13 23:34:43

+0

好点。我正在回收一些我写的旧代码,所以我没有想到这一点。 – 2011-12-13 23:41:27

这个工作对我来说:

HtmlDocument htmlDoc = new HtmlDocument(); 
htmlDoc.Load("test.html"); 
var link = htmlDoc.DocumentNode 
        .Descendants("a") 
        .First(x => x.Attributes["class"] != null 
          && x.Attributes["class"].Value == "undMe"); 

string hrefValue = link.Attributes["href"].Value; 
long playerId = Convert.ToInt64(hrefValue.Split('=')[1]); 

对于真正使用你需要添加错误检查等

使用XPath表达式找到它:

foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@class='undMe']")) 
{ 
     HtmlAttribute att = link.Attributes["href"]; 
     Console.WriteLine(new Regex(@"(?<=[\?&]id=)\d+(?=\&|\#|$)").Match(att.Value).Value); 
}