org.infinispan.util.concurrent.TimeoutException:“节点名称”的复制超时
问题描述:
我们有三个服务必须位于群集中。因此,我们使用Infinispan为集群节点和共享这些服务之间的数据。成功重新启动后,有时候我收到异常,并在其他节点中收到了“已更改”事件。实际上所有节点都在运行。我无法弄清楚这个原因。org.infinispan.util.concurrent.TimeoutException:“节点名称”的复制超时
我使用的Infinispan 8.1.3分布式缓存,JGroups的-3.4
org.infinispan.util.concurrent.TimeoutException: Replication timeout for sipproxy-16964
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:765)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$80(JGroupsTransport.java:599)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport$$Lambda$9/1547262581.apply(Unknown Source)
at java.util.concurrent.CompletableFuture$ThenApply.run(CompletableFuture.java:717)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2345)
at org.infinispan.remoting.transport.jgroups.SingleResponseFuture.call(SingleResponseFuture.java:46)
at org.infinispan.remoting.transport.jgroups.SingleResponseFuture.call(SingleResponseFuture.java:17)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-08-22 04:44:52,902 INFO [JGroupsTransport] (ViewHandler,ISPN,transport_manager-48870) ISPN000094: Received new cluster view for channel ISPN: [transport_manager-48870|3] (2) [transport_manager-48870, mediaproxy-47178]
2017-08-22 04:44:52,949 WARN [PreferAvailabilityStrategy] (transport-thread-transport_manager-p4-t24) ISPN000313: Cache mediaProxyResponseCache lost data because of abrupt leavers [sipproxy-16964]
2017-08-22 04:44:52,951 WARN [ClusterTopologyManagerImpl] (transport-thread-transport_manager-p4-t24) ISPN000197: Error updating cluster member list
java.lang.IllegalArgumentException: There must be at least one node with a non-zero capacity factor
at org.infinispan.distribution.ch.impl.DefaultConsistentHashFactory.checkCapacityFactors(DefaultConsistentHashFactory.java:57)
at org.infinispan.distribution.ch.impl.DefaultConsistentHashFactory.updateMembers(DefaultConsistentHashFactory.java:74)
at org.infinispan.distribution.ch.impl.DefaultConsistentHashFactory.updateMembers(DefaultConsistentHashFactory.java:26)
at org.infinispan.topology.ClusterCacheStatus.updateCurrentTopology(ClusterCacheStatus.java:431)
at org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.onClusterViewChange(PreferAvailabilityStrategy.java:56)
at org.infinispan.topology.ClusterCacheStatus.doHandleClusterView(ClusterCacheStatus.java:337)
at org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheMembers(ClusterTopologyManagerImpl.java:397)
at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:314)
at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$1.run(ClusterTopologyManagerImpl.java:571)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
jgroups.xml:
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.4.xsd">
<TCP bind_addr="131.10.20.16"
bind_port="8010" port_range="10"
recv_buf_size="20000000"
send_buf_size="640000"
loopback="false"
max_bundle_size="64k"
bundler_type="old"
enable_diagnostics="true"
thread_naming_pattern="cl"
timer_type="new"
timer.min_threads="4"
timer.max_threads="30"
timer.keep_alive_time="3000"
timer.queue_max_size="100"
timer.wheel_size="200"
timer.tick_time="50"
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="30"
thread_pool.keep_alive_time="5000"
thread_pool.queue_enabled="true"
thread_pool.queue_max_size="100"
thread_pool.rejection_policy="discard"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="2"
oob_thread_pool.max_threads="30"
oob_thread_pool.keep_alive_time="5000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="discard"/>
<TCPPING initial_hosts="131.10.20.16[8010],131.10.20.17[8010],131.10.20.182[8010]" port_range="2"
timeout="3000" num_initial_members="3" />
<MERGE3 max_interval="30000"
min_interval="10000"/>
<FD_SOCK/>
<FD_ALL interval="3000" timeout="10000" />
<VERIFY_SUSPECT timeout="500" />
<BARRIER />
<pbcast.NAKACK use_mcast_xmit="false"
retransmit_timeout="100,300,600,1200"
discard_delivered_msgs="true" />
<UNICAST3 conn_expiry_timeout="0"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
max_bytes="10m"/>
<pbcast.GMS print_local_addr="true" join_timeout="5000"
max_bundling_time="30"
view_bundling="true"/>
<UFC max_credits="2M"
min_threshold="0.4"/>
<MFC max_credits="2M"
min_threshold="0.4"/>
<FRAG2 frag_size="60000" />
<pbcast.STATE_TRANSFER/>
</config>
答
TimeoutException异常只是一个RPC响应还未内收到说暂停,没有更多。当服务器处于压力下时可能会发生这种情况,但这可能不是这种情况 - 以下日志说节点是'怀疑'的 - 该节点可能无响应超过10秒(这是配置的限制,见FD_ALL)。
首先检查该服务器中的日志是否有错误,以及GC日志是否有任何停止世界暂停。
答
作为@flavius建议的主要原因是您的某个节点由于某种原因停止并且未能回复RPC。
我建议改变的JGroups的日志记录级别,这样你可以看到为什么一个节点被怀疑(它可以由FD_SOCK
或FD_ALL
协议发生),为什么它是从视图中消除(这是很可能是这个发生因为VERIFY_SUSPECT
协议)。
你也可以检查为什么发生。在大多数情况下,这是由于长时间的GC暂停造成的。但是由于其他原因,您的虚拟机也可能被主机暂停。我建议在这两个VM中使用JHiccup,并将其作为Java代理附加到您的进程。这样你应该注意到它是否是JVM停止世界造成这个或是操作系统。
好的,谢谢.i会检查当时是否有gc –
你是对的!完整的GC导致这个:) –