prometheus-operator 监控报错
报错alert:
KubeControllerManagerDown
message:
KubeControllerManager has disappeared from Prometheus target discovery.
根据ServiceMonitor—> Service—>endpoints(pod) 服务发现机制查看到KubeControllerManager没有对应的svc 所以我们需要创建svc
通过查看kube-controller-manager的servicemonitor 我们可以看到对应的标签以及port的name如:
# kubectl get servicemonitor kube-controller-manager -n monitoring -o yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
creationTimestamp: 2019-02-27T08:16:09Z
generation: 1
labels:
k8s-app: kube-controller-manager
name: kube-controller-manager
namespace: monitoring
resourceVersion: "15981895"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/servicemonitors/kube-controller-manager
uid: effe150e-3a67-11e9-a3ab-00163f007b7a
spec:
endpoints:
- interval: 30s
metricRelabelings:
- action: drop
regex: etcd_(debugging|disk|request|server).*
sourceLabels:
- __name__
port: http-metrics
jobLabel: k8s-app
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: kube-controller-manager
那么我们这里的配置就需要修改为一下配置
kind: Service
apiVersion: v1
metadata:
name: kube-controller-manager
labels:
k8s-app: kube-controller-manager
namespace: kube-system
spec:
clusterIP: None
ports:
- protocol: TCP
port: 10252
targetPort: 10252
name: http-metrics
selector:
component: kube-controller-manager
如果出现报错:
Get http://172.20.3.140:10252/metrics/: dial tcp 172.20.3.140:10252: connect: connection refused
通过查看服务本身端口发现启动方式是127.0.0.1 所以我们这边只需要修改KubeControllerManager启动方式
kubeadmn的修改方式
在宿主机的/etc/kubernetes/manifests里找到配置文件 将address修改0.0.0.0之后自动会重载配置不需要重启
修复结果