Solr DIH上传索引操作实例
1.1. 软件参数
Mysql 5.5以上,Solr服务
1.2. 创建对应mysql表
CREATE TABLE `company` (
`id` varchar(255) NOT NULL,
`name` varchar(255) DEFAULT NULL,
`province` varchar(255) DEFAULT NULL,
`city` varchar(255) DEFAULT NULL,
`industry` varchar(255) DEFAULT NULL,
`joinerNum` int(11) DEFAULT NULL,
`createTime` timestamp NULL DEFAULT NULL,
`updateTime` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`status` tinyint(4) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
1.3. Solr应用DIH配置
1.3.1. 检查solr服务的WEB-INF\lib目录下是否有如下jar包,如果没有请添加
1.3.2. 在对应应用的Solrconfig.xml中新增DIH配置
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">db-data-config.xml</str> </lst> </requestHandler> |
1.3.3. 对应应用新增配置文件db-data-config.xml
注意修改标红的地方
集团company
<dataConfig> <dataSource name="jdbc" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://172.30.133.66:3306/db?autoReconnect=true" user="root" password="123456"/> <document> <entity name="company" pk="id" transformer="RegexTransformer" query="select * from company where status=1" deltaQuery="select id from company where updateTime > '${dataimporter.last_index_time}' and status=1" deltaImportQuery="select * from company where id='${dataimporter.delta.id}'"> <field column="id" name="id" /> <field column="name" name="name" /> <field column="industry" splitBy="\|" sourceColName="industry"/>
</entity> </document> </dataConfig> |
1.4. 执行数据导入jar将数据导入数据库
1.5. DIH执行方式
1.5.1. Solr控制台执行
1.5.2. http请求
全量: |
http://172.30.133.61:8080/solr/collection/dataimport?command=full-import
|
增量:
|
http://172.30.133.61:8080/solr/collection/dataimport?command=delta-import&clean=false |
1.5.3. Shell脚本执行
确保脚本有可执行权限,如无请执行如下命令:
chmod +x deltaImport.sh
增量数据执行脚本:(修改标红处)
#!/bin/bash
deltaImport(){ #应用id列表 appIds=("ce0c305361ca4f4abbb8ddb8d7db1228" "46c457a3852e4614bee9f1c9fd3a7868") #solr服务集群地址 serverUrls=("http://172.30.133.61:8080/solr" "http://172.30.133.62:8080/solr" "http://172.30.133.64:8080/solr") for appId in ${appIds[*]} do for serverUrl in ${serverUrls[*]} do importUrl="${serverUrl}/${appId}/dataimport?command=delta-import" http_result=`curl -m 10 -o /dev/null -s -w %{http_code} $importUrl` if [ $http_result == 200 ] then echo "${importUrl} OK" break else echo "${importUrl} FAIL" fi
done done
} echo "process start" deltaImport echo "process end" |
全量数据执行脚本:(修改标红处)
#!/bin/bash
fullImport(){ #应用id列表 appIds=("ce0c305361ca4f4abbb8ddb8d7db1228" "46c457a3852e4614bee9f1c9fd3a7868") #solr服务集群地址 serverUrls=("http://172.30.133.61:8080/solr" "http://172.30.133.62:8080/solr" "http://172.30.133.64:8080/solr") for appId in ${appIds[*]} do for serverUrl in ${serverUrls[*]} do importUrl="${serverUrl}/${appId}/dataimport?command=full-import" http_result=`curl -m 10 -o /dev/null -s -w %{http_code} $importUrl` if [ $http_result == 200 ] then echo "${importUrl} OK" break else echo "${importUrl} FAIL" fi
done
done
} echo "process start" fullImport echo "process end" |
1.5.4. 定时任务执行
输入:crontab -e(编辑定时任务)
添加下面内容:
0 4 * * * /home/deltaImport.sh
参数说明:
0 4 * * *:每天凌晨4点执行应用的增量索引更新
备注:可根据实际情况设置定时任务的执行时间和执行脚本