重启后kubernetes节点没有响应
问题描述:
我有一个主节点和四个节点的kubernetes集群。 kube-proxy在所有四个节点上工作正常,我可以在任何节点*问服务,而不管它在哪里运行;即。 http://node1:30000到http://node4:30000给出了相同的响应。重启后kubernetes节点没有响应
通过运行shutdown -r now重新启动node4后,它恢复运行,但我注意到节点不再响应请求。我运行下面的命令:
curl http://node4:30000
如果我从我的电脑运行,或从集群中的其他节点 - 节点1至节点3,或主 - 我得到:
curl: (7) Failed to connect to node4 port 30000: Connection timed out
但是,如果我从node4运行它,它会成功响应。这让我相信kube-proxy运行良好,但有些东西阻止了外部连接。
当我运行kubectl描述节点节点4,我的输出看起来正常:
Name: node4
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=node4
Taints: <none>
CreationTimestamp: Tue, 21 Feb 2017 15:21:17 -0400
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Wed, 22 Feb 2017 08:03:40 -0400 Tue, 21 Feb 2017 15:21:18 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Wed, 22 Feb 2017 08:03:40 -0400 Tue, 21 Feb 2017 15:21:18 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 22 Feb 2017 08:03:40 -0400 Tue, 21 Feb 2017 15:21:18 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Wed, 22 Feb 2017 08:03:40 -0400 Tue, 21 Feb 2017 15:21:28 -0400 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses: 10.6.81.64,10.6.81.64,node4
Capacity:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 2
memory: 4028748Ki
pods: 110
Allocatable:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 2
memory: 4028748Ki
pods: 110
System Info:
Machine ID: dbc0bb6ba10acae66b1061f958220ade
System UUID: 4229186F-AA5C-59CE-E5A2-258C1BBE9D2C
Boot ID: a3968e6c-eba3-498c-957f-f29283af1cff
Kernel Version: 4.4.0-63-generic
OS Image: Ubuntu 16.04.1 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.13.0
Kubelet Version: v1.5.2
Kube-Proxy Version: v1.5.2
ExternalID: node4
Non-terminated Pods: (27 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
<< application pods listed here >>
kube-system kube-proxy-0p3lj 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system weave-net-uqmr1 20m (1%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
20m (1%) 0 (0%) 0 (0%) 0 (0%)
有没有什么具体的我需要做一个系统重新启动后带来的节点重新联机?
答
我的团队能够通过将docker降级到1.12来解决这个问题。看来,这个问题是关系到这个问题:
https://github.com/kubernetes/kubernetes/issues/40182
降级搬运工到1.12之后,一切工作了。
您是如何部署/创建此群集的?你可以检查其他节点是否也使用docker 1.13.x或者它们是否仍然在1.12.x上? –