列表上的多线程无法按预期方式工作

列表上的多线程无法按预期方式工作

问题描述:

我有一个需要通过Facebook搜索的约1000个项目的列表。我使它成为多线程的,所以我可以加快搜索过程。不幸的是,它似乎像一些线程只是采取组合项目,甚至没有处理它。列表上的多线程无法按预期方式工作

问题在于,当他们得到一个新的组合时,我将它从我的List<string>中删除,以便它不需要多次搜索。

,我使用它不像我用很多很多的线程3.

class BrowserHandler 
{ 
    public static readonly ILogger Logger = LogManager.GetCurrentClassLogger(); 

    public BrowserHandler() 
    { 
     StartBrowser(); 
     StartBrowser(); 
    } 

    private void StartBrowser() 
    { 
     var combination = Program.GetServer().GetNextCombination(); 
     Logger.Debug(combination); 
     runBrowserThread(new Uri("https://www.facebook.com/search/top/?q=" + combination)); 
    } 

    private void runBrowserThread(Uri url) 
    { 
     var th = new Thread(() => { 
      var br = new WebBrowser(); 
      br.DocumentCompleted += browser_DocumentCompleted; 
      br.Navigate(url); 
      Application.Run(); 
     }); 

     th.SetApartmentState(ApartmentState.STA); 
     th.Start(); 
    } 

    void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) 
    { 
     var br = sender as WebBrowser; 

     if (br.Url == e.Url) 
     { 
      if (br.DocumentText.Contains("_52eh _5bcu")) 
      { 
       var links = br.Document.GetElementsByTagName("div"); 

       foreach (HtmlElement link in links) 
       { 
        if (link.GetAttribute("className") == "_52eh _5bcu") 
        { 
         Logger.Warn("Found owner for [" + e.Url.ToString().Split('=')[1] + "] " + link.InnerText); 
        } 
       } 
      } 
      else 
      { 
       Logger.Warn("Finished checking [" + e.Url.ToString().Split('=')[1] + "] and found no owner."); 
      } 

      Application.ExitThread(); 
      StartBrowser(); 
     } 
    } 
} 

输出:

12:56:24 - 07999999991 
    12:56:24 - 07999999891 
    12:56:24 - 07999999791 
    12:56:27 - Found owner for [07999999991] Kaydie-anne Hairdressa Reid 
    12:56:27 - Found owner for [07999999891] Yuli Berk 
    12:56:28 - 07999999691 
    12:56:28 - 07999999591 
    12:56:29 - Finished checking [07999999791] and found no owner. 
    12:56:29 - Finished checking [07999999691] and found no owner. 
    12:56:29 - 07999999491 
    12:56:29 - 07999999391 
    12:56:29 - Finished checking [07999999591] and found no owner. 
    12:56:30 - 07999999291 
    12:56:30 - Finished checking [07999999491] and found no owner. 
    12:56:31 - 07999999191 
    12:56:31 - Finished checking [07999999391] and found no owner. 
    12:56:31 - 07999999091 
    12:56:32 - Finished checking [07999999291] and found no owner. 
    12:56:32 - Finished checking [07999999191] and found no owner. 
    12:56:32 - 07999998991 
    12:56:32 - 07999998891 
    12:56:33 - Finished checking [07999999091] and found no owner. 
    12:56:33 - 07999998791 
    12:56:34 - Found owner for [07999998991] Suzanne McMaster 
    12:56:34 - 07999998691 
    12:56:35 - Finished checking [07999998891] and found no owner. 
    12:56:35 - 07999998591 
    12:56:35 - Finished checking [07999998791] and found no owner. 
    12:56:36 - 07999998491 

正如你可以看到,16也会从列表中取,只有13完成检查。

+3

这似乎是一个非常高的开销方法。为什么不使用HTML解析器(例如AngleSharp)而不是旋转很多浏览器实例)? – Richard

+2

如何才能敢于从非UI线程使用单个WebBrowser控件?单独留下多个实例。 –

+3

我需要一个浏览器,因为Facebook迫使你登录来搜索某些查询。 – distributi0n

可能有一些结果没有通过测试在你的foreach迭代中,通过这样的布尔标志以某种方式管理他们..

class BrowserHandler 
{ 
    public static readonly ILogger Logger = LogManager.GetCurrentClassLogger(); 

    public BrowserHandler() 
    { 
     StartBrowser(); 
     StartBrowser(); 
    } 

    private void StartBrowser() 
    { 
     var combination = Program.GetServer().GetNextCombination(); 
     Logger.Debug(combination); 
     runBrowserThread(new Uri("https://www.facebook.com/search/top/?q=" + combination)); 
    } 

    private void runBrowserThread(Uri url) 
    { 
     var th = new Thread(() => { 
      var br = new WebBrowser(); 
      br.DocumentCompleted += browser_DocumentCompleted; 
      br.Navigate(url); 
      Application.Run(); 
     }); 

     th.SetApartmentState(ApartmentState.STA); 
     th.Start(); 
    } 

    void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) 
    { 
     var br = sender as WebBrowser; 

     if (br.Url == e.Url) 
     { 
      if (br.DocumentText.Contains("_52eh _5bcu")) 
      { 
       var links = br.Document.GetElementsByTagName("div"); 
       bool AnyFound=false; 
       foreach (HtmlElement link in links) 
       { 
        if (link.GetAttribute("className") == "_52eh _5bcu") 
        { 
         Logger.Warn("Found owner for [" + e.Url.ToString().Split('=')[1] + "] " + link.InnerText); 
AnyFound= true; 
        } 
       } 

if(!AnyFound) { 
Logger.Warn("Finished checking [" + e.Url.ToString().Split('=')[1] + "] and found no owner."); 
} 
      } 
      else 
      { 
       Logger.Warn("Finished checking [" + e.Url.ToString().Split('=')[1] + "] and found no owner."); 
      } 

      Application.ExitThread(); 
      StartBrowser(); 
     } 
    } 
} 
+3

我已经检查过这个,它仍然没有显示3个缺失的项目。 – distributi0n

你应该尝试在你的foreach添加else,可能是会有你的遗漏3件物品:

foreach (HtmlElement link in links) 
{ 
    if (link.GetAttribute("className") == "_52eh _5bcu") 
    { 
     Logger.Warn("Found owner for [" + e.Url.ToString().Split('=')[1] + "] " + link.InnerText); 
    } 
    else 
    { 
     // some code here 
    } 
} 
+3

我已经检查过,它仍然没有显示3个缺失的项目。 – distributi0n

+0

可能是你的应用程序在最后三个线程完成之前完成了吗?这就是为什么你错过了这3条日志消息? – VDN

+3

当你说“应用程序已完成”时,你是什么意思?在关闭时完成? – distributi0n