Kubernetes集群监控之Prometheus Operator

What is Prometheus Operator?

Prometheus 是一套开源的系统监控、报警、时间序列数据库的组合,而 Prometheus Operator 是 CoreOS 开源的一套用于管理在 Kubernetes 集群上的 Prometheus 控制器,它是为了简化在 Kubernetes 上部署、管理和运行 Prometheus 和 Alertmanager 集群。

features

  • 创建/销毁:使用操作员轻松为您的Kubernetes命名空间,特定应用程序或团队轻松启动Prometheus实例。

  • 简单配置:从本机Kubernetes资源配置Prometheus的基础知识,如版本,持久性,保留策略和副本。

  • 通过标签进行目标服务:根据熟悉的Kubernetes标签查询自动生成监控目标配置; 无需学习Prometheus特定的配置语言。

Prometheus Operator vs. kube-prometheus

  • Prometheus操作员将Prometheus配置为Kubernetes原生,并管理和操作Prometheus和Alertmanager集群。 这是关于完整端到端监控的难题之一。

  • kube-prometheus将Prometheus Operator与一系列清单相结合,以帮助开始监控Kubernetes本身以及运行在其上的应用程序。

  • kube-prometheus没有版本,并且以与Prometheus Operator相同的速度发布。 发行说明仅描述对操作员的更改,发布存档仅包含操作员代码的匹配更改。 对于kube-prometheus的更改,请始终引用此存储库的主分支。

  • kube-prometheus是一个单独的项目,将来会有自己的存储库[1] [operator-vs-kube。

prometheus-operator自定义资源

  • Prometheus,定义了所需的Prometheus部署。 运营商始终确保正在运行与资源定义匹配的部署。

  • ServiceMonitor,以声明方式指定应如何监视服务组。 操作员根据定义自动生成Prometheus刮削配置。

  • PrometheusRule,定义所需的Prometheus规则文件,可由包含Prometheus警报和记录规则的Prometheus实例加载。

  • Alertmanager,定义了所需的Alertmanager部署。 运营商始终确保正在运行与资源定义匹配的部署。

Prometheus Operator的github链接:

https://github.com/coreos/prometheus-operator.git

目前已经移到 coreos/kube-prometheus
https://github.com/coreos/kube-prometheus.git

Prometheus Operator所有yaml文件所在路径:
https://github.com/coreos/prometheus-operator/contrib/kube-prometheus/manifests
移到
https://github.com/coreos/kube-prometheus/manifests

https://github.com/coreos/
编辑prometheus-operator-0.23.2目录下的bundle.yaml
修改项namespace: monitoring

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus-operator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus-operator
subjects:
- kind: ServiceAccount
  name: prometheus-operator
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-operator
rules:
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - '*'
- apiGroups:
  - monitoring.coreos.com
  resources:
  - alertmanagers
  - prometheuses
  - prometheuses/finalizers
  - alertmanagers/finalizers
  - servicemonitors
  - prometheusrules
  verbs:
  - '*'
- apiGroups:
  - apps
  resources:
  - statefulsets
  verbs:
  - '*'
- apiGroups:
  - ""
  resources:
  - configmaps
  - secrets
  verbs:
  - '*'
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - list
  - delete
- apiGroups:
  - ""
  resources:
  - services
  - endpoints
  verbs:
  - get
  - create
  - update
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - namespaces
  verbs:
  - list
  - watch
---
apiVersion: apps/v1beta2
kind: Deployment
metadata:
  labels:
    k8s-app: prometheus-operator
  name: prometheus-operator
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: prometheus-operator
  template:
    metadata:
      labels:
        k8s-app: prometheus-operator
    spec:
      containers:
      - args:
        - --kubelet-service=kube-system/kubelet
        - --logtostderr=true
        - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1
        - --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.23.2
        image: quay.io/coreos/prometheus-operator:v0.23.2
        name: prometheus-operator
        ports:
        - containerPort: 8080
          name: http
        resources:
          limits:
            cpu: 200m
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
      nodeSelector:
        beta.kubernetes.io/os: linux
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
      serviceAccountName: prometheus-operator
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-operator
  namespace: monitoring

执行创建

kubectl create -f bundle.yaml 

部署kube-prometheus

kubectl create -f prometheus-operator/contrib/kube-prometheus/manifests

根据命名空间查询

kubectl get all -n monitoring 
[[email protected] usr]$ kubectl get all -n monitoring 
NAME                                       READY   STATUS             RESTARTS   AGE
pod/alertmanager-main-0                    2/2     Running            0          3h53m
pod/alertmanager-main-1                    2/2     Running            0          3h52m
pod/alertmanager-main-2                    2/2     Running            0          3h52m
pod/grafana-5c54dbc48b-jvhcd               1/1     Running            0          3h54m
pod/kube-state-metrics-fd9b964d5-srwkp     4/4     Running            0          3h49m
pod/node-exporter-5ndbs                    2/2     Running            0          3h54m
pod/node-exporter-nts45                    2/2     Running            0          3h54m
pod/node-exporter-pxtw5                    2/2     Running            0          3h54m
pod/node-exporter-tvntn                    1/2     CrashLoopBackOff   47         3h54m
pod/node-exporter-wb7sx                    2/2     Running            0          3h54m
pod/prometheus-k8s-0                       3/3     Running            1          3h53m
pod/prometheus-k8s-1                       3/3     Running            1          3h49m
pod/prometheus-operator-76599f4b8c-zm5wl   1/1     Running            0          3h50m

NAME                            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
service/alertmanager-main       NodePort    10.99.5.67      <none>        9093:30662/TCP      3h54m
service/alertmanager-operated   ClusterIP   None            <none>        9093/TCP,6783/TCP   3h53m
service/grafana                 NodePort    10.106.129.28   <none>        3000:31844/TCP      3h54m
service/kube-state-metrics      ClusterIP   None            <none>        8443/TCP,9443/TCP   3h54m
service/node-exporter           ClusterIP   None            <none>        9100/TCP            3h54m
service/prometheus-k8s          NodePort    10.99.129.143   <none>        9090:31144/TCP      3h54m
service/prometheus-operated     ClusterIP   None            <none>        9090/TCP            3h53m
service/prometheus-operator     ClusterIP   None            <none>        8080/TCP            3h54m

NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
daemonset.apps/node-exporter   5         5         4       5            4           beta.kubernetes.io/os=linux   3h54m

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/grafana               1/1     1            1           3h54m
deployment.apps/kube-state-metrics    1/1     1            1           3h54m
deployment.apps/prometheus-operator   1/1     1            1           3h54m

NAME                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/grafana-5c54dbc48b               1         1         1       3h54m
replicaset.apps/kube-state-metrics-6f5c6d88d5    0         0         0       3h54m
replicaset.apps/kube-state-metrics-fd9b964d5     1         1         1       3h53m
replicaset.apps/prometheus-operator-76599f4b8c   1         1         1       3h54m
replicaset.apps/prometheus-operator-f9fcb78bd    0         0         0       3h50m

NAME                                 READY   AGE
statefulset.apps/alertmanager-main   3/3     3h53m
statefulset.apps/prometheus-k8s      2/2     3h53m

修改访问方式(集群外部访问)
把svc的访问方式改为NodePort模式。
使用kubectl edit svc [svcname] -n monitoring方式修改
需要修改的是alertmanager-main,grafana,prometheus-k8s
例子:kubectl edit svc alertmanager-main -n monitoring

apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: monitoring
spec:
  type: NodePort      #添加内容
  ports:
  - name: http
    port: 3000
    targetPort: http
    nodePort: 30100   #添加内容
  selector:
    app: grafana
[[email protected] usr]$ kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
alertmanager-main       NodePort    10.99.5.67      <none>        9093:30662/TCP      3h55m
alertmanager-operated   ClusterIP   None            <none>        9093/TCP,6783/TCP   3h54m
grafana                 NodePort    10.106.129.28   <none>        3000:31844/TCP      3h55m
kube-state-metrics      ClusterIP   None            <none>        8443/TCP,9443/TCP   3h55m
node-exporter           ClusterIP   None            <none>        9100/TCP            3h55m
prometheus-k8s          NodePort    10.99.129.143   <none>        9090:31144/TCP      3h55m
prometheus-operated     ClusterIP   None            <none>        9090/TCP            3h54m
prometheus-operator     ClusterIP   None            <none>        8080/TCP            3h55m

访问prometheus 端口 31144 例子http://118.31.17.205:31144/graph
Kubernetes集群监控之Prometheus Operator
通过访问http://118.31.17.205:31144/target 可以看到prometheus已经成功连接上了k8s的apiserver
Kubernetes集群监控之Prometheus Operator
访问alertmanager-main 例子:http://118.31.17.205:30662 alertmanager-main 30662
Kubernetes集群监控之Prometheus Operator
查看service-discovery http://118.31.17.205:31144/service-discovery
Kubernetes集群监控之Prometheus Operator
访问grafana 例子:http://118.31.17.205:31844 grafana 31844
输入密码就可以了(初始化用户名密码admin)
Kubernetes集群监控之Prometheus Operator
Kubernetes集群监控之Prometheus Operator
添加数据源
grafana默认已经添加了Prometheus数据源,可以直接用,grafana支持多种时序数据源,每种数据源都有各自的查询编辑器。
Kubernetes集群监控之Prometheus Operator
导入dashboard:导入面板,可以直接输入模板编号315在线导入,或者下载好对应的json模板文件本地导入,
官方面板模板下载地址:
https://grafana.com/dashboards/315
https://grafana.com/dashboards/8919

Kubernetes集群监控之Prometheus Operator
导入面板之后就可以看到对应的监控数据了,点击HOME选择查看,其实Grafana已经预定义了一系列Dashboard:
Kubernetes集群监控之Prometheus Operator
查看集群监控信息
Kubernetes集群监控之Prometheus Operator
Kubernetes集群监控之Prometheus Operator

参考

https://github.com/coreos/prometheus-operator