Httpclient 卡死在 httpClient.execute()
记录一次使用Httpclient卡死在 java.net.SocketInputStream.socketRead0(Native Method)解决的过程
以前都用c#写爬虫,未曾出现过问题,前段时间采用java编写爬虫,使用的为Apache的HttpClient进行http请求,总共编写了6个抓取程序,其中五个都没出现问题,但是一个网站从HTTP升级到HTTPS之后(线程数为200),爬虫便会在
httpClient.execute(httpGet);
这一句上卡死,在jconsole中查看,基本所有线程都卡死在
java.net.SocketInputStream.socketRead0(Native Method)这个native方法中。
本人封装的HTTP请求方法如下
public static String httpPost(String url,String postData,Map<String,String> headerMap,String ip){
try {
int socketTimeout = 15000;//读取数据超时
int connectTimeout = 15000;//链接超时
String html = null;
String[] ips = ip.split(":");
HttpHost httpHost = new HttpHost(ips[0],Integer.parseInt(ips[1]));
RequestConfig config = RequestConfig.custom().setProxy(httpHost)
.setConnectTimeout(connectTimeout).setSocketTimeout(socketTimeout)
.setConnectionRequestTimeout(connectTimeout).build();
CloseableHttpClient httpClient = HttpClientBuilder.create()
.setDefaultRequestConfig(config).build();
HttpPost httpost = new HttpPost(url);
httpost.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36");
//设置参数
if(null!=headerMap){
for(Map.Entry<String,String> x:headerMap.entrySet()){
httpost.setHeader(x.getKey(),x.getValue());
}
}
// 构建消息实体
StringEntity postEntity = new StringEntity(postData, Charset.forName("UTF-8"));
postEntity.setContentEncoding("UTF-8");
try{
JSONObject jsonObject = JSONObject.parseObject(postData);
postEntity.setContentType("application/json; charset=UTF-8");
}
catch (Exception e){
postEntity.setContentType("application/x-www-form-urlencoded; charset=UTF-8");
}
// 发送Json格式的数据请求
httpost.setEntity(postEntity);
//执行请求
HttpResponse response = httpClient.execute(httpost);
HttpEntity entity = response.getEntity();
if (entity != null) {
html = EntityUtils.toString(entity, "UTF-8");
EntityUtils.consume(entity);//关闭内容流
}
//释放链接
httpClient.close();
return html;
} catch (Exception e) {
return "";
}
}
因为网站封IP所以使用代理IP进行请求,且请求失败的可能性较高,所以不在方法中输出异常堆栈。因在RequestConfig中设置了读取超时时间与连接超时时间,所以未想到socket未关闭是没有配置socketConfig的原因。
在网上查阅了大量博文之后最终在
中看到了有设置socketConfig的方法,最终修改代码后程序可正常运行。最终修改后的代码如下
public static String httpPost(String url,String postData,Map<String,String> headerMap,String ip){
try {
int socketTimeout = 15000;//读取数据超时
int connectTimeout = 15000;//链接超时
String html = null;
String[] ips = ip.split(":");
HttpHost httpHost = new HttpHost(ips[0],Integer.parseInt(ips[1]));
SocketConfig socketConfig = SocketConfig.custom()
.setSoKeepAlive(false)
.setSoLinger(1)
.setSoReuseAddress(true)
.setSoTimeout(10000)
.setTcpNoDelay(true).build();
RequestConfig config = RequestConfig.custom().setProxy(httpHost)
.setConnectTimeout(connectTimeout).setSocketTimeout(socketTimeout)
.setConnectionRequestTimeout(connectTimeout).build();
CloseableHttpClient httpClient = HttpClientBuilder.create()
.setDefaultSocketConfig(socketConfig)
.setDefaultRequestConfig(config).build();
HttpPost httpost = new HttpPost(url);
httpost.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36");
//设置参数
if(null!=headerMap){
for(Map.Entry<String,String> x:headerMap.entrySet()){
httpost.setHeader(x.getKey(),x.getValue());
}
}
// 构建消息实体
StringEntity postEntity = new StringEntity(postData, Charset.forName("UTF-8"));
postEntity.setContentEncoding("UTF-8");
try{
JSONObject jsonObject = JSONObject.parseObject(postData);
postEntity.setContentType("application/json; charset=UTF-8");
}
catch (Exception e){
postEntity.setContentType("application/x-www-form-urlencoded; charset=UTF-8");
}
// 发送Json格式的数据请求
httpost.setEntity(postEntity);
//执行请求
HttpResponse response = httpClient.execute(httpost);
HttpEntity entity = response.getEntity();
if (entity != null) {
html = EntityUtils.toString(entity, "UTF-8");
EntityUtils.consume(entity);//关闭内容流
}
//释放链接
httpClient.close();
return html;
} catch (Exception e) {
return "";
}
}
写此博文,已告诫自己,在遇到问题时要静下心来去寻找解决问题。