重启后kubernetes节点没有响应

重启后kubernetes节点没有响应

问题描述:

我有一个主节点和四个节点的kubernetes集群。 kube-proxy在所有四个节点上工作正常,我可以在任何节点*问服务,而不管它在哪里运行;即。 http://node1:30000http://node4:30000给出了相同的响应。重启后kubernetes节点没有响应

通过运行shutdown -r now重新启动node4后,它恢复运行,但我注意到节点不再响应请求。我运行下面的命令:

curl http://node4:30000 

如果我从我的电脑运行,或从集群中的其他节点 - 节点1至节点3,或主 - 我得到:

curl: (7) Failed to connect to node4 port 30000: Connection timed out 

但是,如果我从node4运行它,它会成功响应。这让我相信kube-proxy运行良好,但有些东西阻止了外部连接。

当我运行kubectl描述节点节点4,我的输出看起来正常:

Name:     node4 
Labels:     beta.kubernetes.io/arch=amd64 
         beta.kubernetes.io/os=linux 
         kubernetes.io/hostname=node4 
Taints:     <none> 
CreationTimestamp:  Tue, 21 Feb 2017 15:21:17 -0400 
Phase: 
Conditions: 
    Type     Status LastHeartbeatTime      LastTransitionTime      Reason       Message 
    ----     ------ -----------------      ------------------      ------       ------- 
    OutOfDisk    False Wed, 22 Feb 2017 08:03:40 -0400   Tue, 21 Feb 2017 15:21:18 -0400   KubeletHasSufficientDisk  kubelet has sufficient disk space available 
    MemoryPressure  False Wed, 22 Feb 2017 08:03:40 -0400   Tue, 21 Feb 2017 15:21:18 -0400   KubeletHasSufficientMemory  kubelet has sufficient memory available 
    DiskPressure   False Wed, 22 Feb 2017 08:03:40 -0400   Tue, 21 Feb 2017 15:21:18 -0400   KubeletHasNoDiskPressure  kubelet has no disk pressure 
    Ready     True Wed, 22 Feb 2017 08:03:40 -0400   Tue, 21 Feb 2017 15:21:28 -0400   KubeletReady     kubelet is posting ready status. AppArmor enabled 
Addresses:    10.6.81.64,10.6.81.64,node4 
Capacity: 
alpha.kubernetes.io/nvidia-gpu:  0 
cpu:         2 
memory:        4028748Ki 
pods:         110 
Allocatable: 
alpha.kubernetes.io/nvidia-gpu:  0 
cpu:         2 
memory:        4028748Ki 
pods:         110 
System Info: 
Machine ID:     dbc0bb6ba10acae66b1061f958220ade 
System UUID:     4229186F-AA5C-59CE-E5A2-258C1BBE9D2C 
Boot ID:      a3968e6c-eba3-498c-957f-f29283af1cff 
Kernel Version:    4.4.0-63-generic 
OS Image:      Ubuntu 16.04.1 LTS 
Operating System:    linux 
Architecture:     amd64 
Container Runtime Version:  docker://1.13.0 
Kubelet Version:    v1.5.2 
Kube-Proxy Version:   v1.5.2 
ExternalID:      node4 
Non-terminated Pods:   (27 in total) 
    Namespace      Name                 CPU Requests CPU Limits  Memory Requests Memory Limits 
    ---------      ----                 ------------ ----------  --------------- ------------- 
    << application pods listed here >> 
    kube-system     kube-proxy-0p3lj              0 (0%)   0 (0%)   0 (0%)   0 (0%) 
    kube-system     weave-net-uqmr1               20m (1%)  0 (0%)   0 (0%)   0 (0%) 
Allocated resources: 
    (Total limits may be over 100 percent, i.e., overcommitted. 
    CPU Requests CPU Limits  Memory Requests Memory Limits 
    ------------ ----------  --------------- ------------- 
    20m (1%)  0 (0%)   0 (0%)   0 (0%) 

有没有什么具体的我需要做一个系统重新启动后带来的节点重新联机?

+0

您是如何部署/创建此群集的?你可以检查其他节点是否也使用docker 1.13.x或者它们是否仍然在1.12.x上? –

我的团队能够通过将docker降级到1.12来解决这个问题。看来,这个问题是关系到这个问题:

https://github.com/kubernetes/kubernetes/issues/40182

降级搬运工到1.12之后,一切工作了。