java 实现百度熊掌号历史资源记录提交

最近在做一个需求,需要将大量的历史记录url提交给百度熊掌号资源搜索平台,虽然熊账号给提供了手动提交的工具,但是这种方式的提交费时费力,尤其是在有很多的url需要提交时使用这个方式提交很明显效率低下,所以可以采用提供api提交的方式,

java 实现百度熊掌号历史资源记录提交

一   百度熊掌号账号获取(这个可以自己百度申请账号)

二   看上图,这是官方提供的api说明(这个需要登录自己的账号才可以看到),实际上说到这里基本上已经知道怎么批量提交数据,但是这里有几点需要说明一下:

    1)批量提交时url中的type需要设置为batch,进行批量提交

    2)单次提交时上限是2000个,否则会返回超出提交上限

三 代码实现

import org.apache.commons.io.FileUtils;
import org.apache.commons.lang3.StringUtils;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.StatusLine;
import org.apache.http.client.HttpResponseException;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClientBuilder;
import org.apache.http.util.EntityUtils;

private static CloseableHttpClient client;
client = HttpClientBuilder.create().disableAutomaticRetries().build();  //创建客户端

private void urlsPush(List<String> urlList, String type) {
    if (CollectionUtils.isEmpty(urlList)) {
        return;
    }
     //进行分组提交
    int times = urlList.size() % 2000 == 0 ? (urlList.size() / 2000) : (urlList.size() / 2000 + 1);
    for (int i = 0; i < times; i++) {
        int end = (i + 1) * 2000;
        if (end >= urlList.size()) {
            end = urlList.size();
        }
        List<String> subList = urlList.subList(i * 2000, end);
        StringBuilder sb = new StringBuilder();
        subList.stream().forEach(url -> {
            sb.append(url);
            sb.append("\r\n");
        });
        String params = sb.toString();
        // saveAsFile(params, type);
        // 判断是提交还是保存到文件中
        if (maxPushSize <= 2000) {
            saveAsFile(params, type);
            continue;
        }
        HttpPost request = new HttpPost(URL);
        request.setHeader("content-type", "text/plain");
        HttpEntity entity = new StringEntity(params, Charset.defaultCharset());
        request.setEntity(entity);
        CloseableHttpResponse response = null;
        try {
            logger.info("正在推送数据,本次推送{},推送内容:{}", subList.size(), type);
            long singlePushStart = System.currentTimeMillis();
            response = client.execute(request);
            logger.info("单次推送完成,本次共计用时{}ms", System.currentTimeMillis() - singlePushStart);
        } catch (IOException e) {
            e.printStackTrace();
            logger.info("推送数据异常");
            saveAsFile(params, type);
            continue;
        }
        StatusLine statusLine = response.getStatusLine();
        HttpEntity responseEntity = response.getEntity();
        if (statusLine.getStatusCode() != 200 || responseEntity == null) {
            logger.info("数据获取异常");
            saveAsFile(params, type);
            continue;
        }

        String respStr = "";
        try {
            respStr = EntityUtils.toString(responseEntity);
        } catch (Exception ex) {
            ex.printStackTrace();
        }
        if (StringUtils.isNotBlank(respStr)) {
            try {
                PushUrlsResponse result = null;
                result = JSONObject.parseObject(respStr, PushUrlsResponse.class);
                this.maxPushSize = result.getRemain_batch();
                this.successSize += result.getSuccess_batch();
            } catch (Exception e) {
                logger.info("解析返回内容出现问题,返回内容{}", respStr);
                saveAsFile(params, type);
            }
        }

    }

}

/**
 * url提交响应结果
 */
private static class PushUrlsResponse {
    /**
     * 成功提交条数
     */
    private int success_batch;
    /**
     * 剩余可提交数
     */
    private int remain_batch;

    public int getSuccess_batch() {
        return success_batch;
    }

    public void setSuccess_batch(int success_batch) {
        this.success_batch = success_batch;
    }

    public int getRemain_batch() {
        return remain_batch;
    }

    public void setRemain_batch(int remain_batch) {
        this.remain_batch = remain_batch;
    }
}

我这里对未成功提交的数据,写到了文件中进行保存,所以会有保存的方法

saveAsFile

   参数param是提交的url内容,type为文件中内容的类型(我这里url内容分类比较多,所以需要一个type来标记文件中内容的类型)