Apache Curator - Zookeeper连接丢失异常，可能的内存泄漏

问题描述：

我一直在研究一个持续监视分布式原子长计数器的进程。它使用以下类ZkClient的方法getCounter每分钟监控一次。实际上，我有多个线程运行，每个线程都监视存储在Zookeeper节点中的不同计数器（分布式原子长）。每个线程通过getCounter方法的参数指定计数器的路径。Apache Curator - Zookeeper连接丢失异常，可能的内存泄漏

public class TagserterZookeeperManager { 

public enum ZkClient { 
    COUNTER("10.11.18.25:2181"); // Integration URL 

    private CuratorFramework client; 
    private ZkClient(String servers) { 
     Properties props = TagserterConfigs.ZOOKEEPER.getProperties(); 
     String zkFromConfig = props.getProperty("servers", ""); 
     if (zkFromConfig != null && !zkFromConfig.isEmpty()) { 
      servers = zkFromConfig.trim(); 
     } 
     ExponentialBackoffRetry exponentialBackoffRetry = new ExponentialBackoffRetry(1000, 3); 
     client = CuratorFrameworkFactory.newClient(servers, exponentialBackoffRetry); 
     client.start(); 
    } 

    public CuratorFramework getClient() { 
     return client; 
    } 
} 

public static String buildPath(String ... node) { 
    StringBuilder sb = new StringBuilder(); 
    for (int i = 0; i < node.length; i++) { 
     if (node[i] != null && !node[i].isEmpty()) { 
      sb.append("/"); 
      sb.append(node[i]); 
     } 
    } 
    return sb.toString(); 
} 

public static DistributedAtomicLong getCounter(String taskType, int hid, String jobId, String countType) { 
    String path = buildPath(taskType, hid+"", jobId, countType); 
    Builder builder = PromotedToLock.builder().lockPath(path + "/lock").retryPolicy(new ExponentialBackoffRetry(10, 10)); 
    DistributedAtomicLong count = new DistributedAtomicLong(ZkClient.COUNTER.getClient(), path, new RetryNTimes(5, 20), builder.build()); 
    return count; 
} 

}

从线程内，这是怎么了调用这个方法：

DistributedAtomicLong counterTotal = TagserterZookeeperManager 
         .getCounter("testTopic", hid, jobId, "test");

现在好像之后线程已经运行了几个小时，在一个阶段我开始得到以下

org.apache.zookeeper.KeeperException $ ConnectionLossException：KeeperErrorCode = ConnectionLoss用于/康特它尝试读取计数getCounter方法内org.apache.zookeeper.KeeperException$ConnectionLossException例外ntTaskProd at org.apache.zookeeper.KeperException.create（KeeperException.java:99） at org.apache.zookeeper.KeeperException.create（KeeperException.java:51） at org.apache.zookeeper.ZooKeeper.exists（ZooKeeper .java：1045） at org.apache.zookeeper.ZooKeeper.exists（ZooKeeper.java:1073） at org.apache.curator.utils.ZKPaths.mkdirs（ZKPaths.java:215） at org.apache.curator .utils.EnsurePath $ InitialHelper $ 1.call（EnsurePath.java:148） at org.apache.curator.RetryLoop.callWithRetry（RetryLoop.java:107） at org.apache.curator.utils.EnsurePath $ InitialHelper.ensure（ EnsurePath.java:141） at org.apache.curator.utils.EnsurePath.ensure（EnsurePath.java:99） at org.apache.curator.fram ework.recipes.atomic.DistributedAtomicValue.getCurrentValue（DistributedAtomicValue.java:254） at org.apache.curator.framework.recipes.atomic.DistributedAtomicValue.get（DistributedAtomicValue.java:91） at org.apache.curator.framework。 recipes.atomic.DistributedAtomicLong.get（DistributedAtomicLong.java:72） ...

我不断收到来自其此异常了一会儿，我把它会引起一些内部内存泄漏，最终导致了感觉OutOfMemory错误并且整个过程都被解除。有没有人知道这可能是什么原因？为什么Zookeeper突然开始抛出连接丢失异常？在进程退出后，我可以通过我编写的另一个小控制台程序（也使用curator）手动连接到Zookeeper，并且在那里看起来都很好。

嗨，你是怎么最终解决这个问题的？即使在Curator框架上显式调用close（），我也似乎遇到同样的问题。 –

@SumitNigam对不起，在这一个迟到回到你。其实我已经停止了那个项目的工作，从那时起它已经有一段时间了。事实证明，我们可能需要重新编写和重构项目的主要部分，其原因有很多。对于那个很抱歉。 –

答

为了使用curator来监视Zookeeper中的节点，您可以使用NodeCache这不会解决您的连接问题....但是，不是每分钟轮询一次该节点，您可以在它发生更改时获取推送事件。

根据我的经验，NodeCache可以很好地处理断开连接并恢复连接。

Apache Curator - Zookeeper连接丢失异常，可能的内存泄漏

相关推荐