solr 的statistics 页面 optimized

在更新索引库过程中出现了:

一.问题描述:

solr 的statistics 页面 optimized

这里处错,在之前配置的时候,这里是没有的,后台打印的信息是:

2018-01-10 14:22:21 ERROR   Request to collection goods failed due to (409) org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8080/solr/go_shard2_replica3: version conflict for 93cb350b-b9c1-46c3-a6cb-3b71eb45c8e8 expected=1589185408266141696 actual=1589185412370268161, retry? 0 CloudSolrClient.requestWithRetryOnStaleState
org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error from server at http://localhost:8080/solr/go_shard2_replica3: version conflict for 93cb350b-b9c1-46c3-a6cb-3b71eb45c8e8 expected=1589185408266141696 actual=1589185412370268161
at org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:819)
at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1263)
at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1134
at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1073
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:160)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:71)
last_sync_time=?,
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:85)

二.问题解决:

我们在添加索引的时候,会有操作:commit和optimize操作。

各自的操作意思是:当你像solr提交索引更新时,只有运行了commit,索引才会发生变化。当然也并不意味着你每次提交都要commit,如果不是那么紧急,你可以多次提交之后,再执行commit操作。

optimize

optimize有点像硬盘上整理磁盘碎片的操作。为了提高搜索速度,它会将索引重组在一起,然后移除需要被删除删除或是更新的文档,请注意,solr是没有update的这种操作的,只有增加与删除。solr在优化时,将需要删除或是被替换的索引标记为deleted,然后再创建新的文档替换掉需要被替换的。optimize就是执行此操作。所以在优化的时候,你的索引会增大,然后再减小。optimize操作会创建一个全新的的索引结构,所以,你需要预备出2倍于你commit时索引大小的空间。

这是出自:http://xiaofeng.iteye.com/blog/1299148这边博客的解释。

我们查找它的官方文档:在Indexing and Basic Data Operations这一节中有更为详细讲解:

在官方文档248页:

The <commit> operation writes all documents loaded since the last commit to one or more segment files on
the disk. Before a commit has been issued, newly indexed content is not visible to searches. The commit
operation opens a new searcher, and triggers any event listeners that have been configured.

commit操作将所有需要更新的文档全部写入索引中,但是新进入的索引不会立即生效。


The <optimize> operation requests Solr to merge internal data structures in order to improve search
performance. For a large index, optimization will take some time to complete, but by merging many small
segment files into a larger one, search performance will improve. If you are using Solr’s replication mechanism to distribute searches across many systems, be aware that after an optimize, a complete index will need to be transferred. In contrast, post-commit transfers are usually much smaller.

optimize操作是合并内部的数据结构来提供搜索性能。对于大型的索引,optimize耗时较多,但是通过合并一些索引结构,到一个大的,那么索引性能会得到提高,需要注意的是一个完整的索引需要传送,对比来说,以post方式进行的提交会更小。

此外,他解释一些运行参数:

solr 的statistics 页面 optimized

在solrj中,对应有这些操作

solr 的statistics 页面 optimized

由此,也引出了solr的几种提交方式:

  <!-- The default high-performance update handler -->
  <updateHandler class="solr.DirectUpdateHandler2">
    <!-- Enables a transaction log, used for real-time get, durability, and
         and solr cloud replica recovery.  The log can grow as big as
         uncommitted changes to the index, so use of a hard autoCommit
         is recommended (see below).
         "dir" - the target directory for transaction logs, defaults to the
                solr data directory.
         "numVersionBuckets" - sets the number of buckets used to keep
                track of max version values when checking for re-ordered
                updates; increase this value to reduce the cost of
                synchronizing access to version buckets during high-volume
                indexing, this requires 8 bytes (long) * numVersionBuckets
                of heap space per Solr core.
    -->
    <updateLog>
      <str name="dir">${solr.ulog.dir:}</str>
      <int name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}</int>
    </updateLog>
    <!-- AutoCommit
         Perform a hard commit automatically under certain conditions.
         Instead of enabling autoCommit, consider using "commitWithin"
         when adding documents. 
         http://wiki.apache.org/solr/UpdateXmlMessage
         maxDocs - Maximum number of documents to add since the last
                   commit before automatically triggering a new commit.
         maxTime - Maximum amount of time in ms that is allowed to pass
                   since a document was added before automatically
                   triggering a new commit. 
         openSearcher - if false, the commit causes recent index changes
           to be flushed to stable storage, but does not cause a new
           searcher to be opened to make those changes visible.
         If the updateLog is enabled, then it's highly recommended to
         have some sort of hard autoCommit to limit the log size.
      -->
    <autoCommit>
      <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
      <openSearcher>false</openSearcher>
    </autoCommit>
    <!-- softAutoCommit is like autoCommit except it causes a
         'soft' commit which only ensures that changes are visible
         but does not ensure that data is synced to disk.  This is
         faster and more near-realtime friendly than a hard commit.
      -->
    <autoSoftCommit>
      <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
    </autoSoftCommit>
    <!-- Update Related Event Listeners         
         Various IndexWriter related events can trigger Listeners to
         take actions.
        postCommit - fired after every commit or optimize command
         postOptimize - fired after every optimize command
      -->
    <!-- The RunExecutableListener executes an external command from a
         hook such as postCommit or postOptimize.
         
         exe - the name of the executable to run
         dir - dir to use as the current working directory. (default=".")
         wait - the calling thread waits until the executable returns. 
                (default="true")
         args - the arguments to pass to the program.  (default is none)
         env - environment variables to set.  (default is none)
      -->
    <!-- This example shows how RunExecutableListener could be used
         with the script based replication...
         http://wiki.apache.org/solr/CollectionDistribution
      -->
    <!--
       <listener event="postCommit" class="solr.RunExecutableListener">
         <str name="exe">solr/bin/snapshooter</str>
         <str name="dir">.</str>
         <bool name="wait">true</bool>
         <arr name="args"> <str>arg1</str> <str>arg2</str> </arr>
         <arr name="env"> <str>MYVAR=val1</str> </arr>
       </listener>
      -->
  </updateHandler>

回到我的问题:如何解决:网上有提示说点击就可解决,但是,在后台,不能自动化完成是很苦逼的是一件事:有方法进行optimize