解析HTML文档?

问题描述:

我如何取代图像的来源,html文本保存在数据库中?解析HTML文档?

谢谢

+0

见http://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c – Foole 2010-07-08 11:52:39

您应该尝试HTML Agility Pack

+0

好对于html,但请记住,它是纯xml的buggy – 2010-07-08 11:54:12

+0

我没有尝试XML的HTML敏捷包,因为有各种专用于操作XML文档的库。 – 2010-07-08 12:03:13

http://social.msdn.microsoft.com/Forums/en-US/regexp/thread/37f46a25-113c-420d-aab8-eea98d341189

string s = "<img src='1.jpg'><img src='2.jpg'><img src='3.jpg'><img src='4.jpg'>"; 
string pattern = @"<img\s+src\s*=\s*["']([^"']+)["']\s*/*>"; 
string output = string.Empty; 
output = Regex.Replace(s, pattern, "[Image: $1]"); 

Regex to get src value from an img tag