千家信息网

k8s实践17:监控利器prometheus helm方式部署配置测试

发表于:2025-02-04 作者:千家信息网编辑
千家信息网最后更新 2025年02月04日,监控利器prometheus helm方式部署配置测试1.部署helm部署helm参考方法后面使用helm部署grafana和prometheus,因此首先需要部署helm,保证helm能正常使用.部
千家信息网最后更新 2025年02月04日k8s实践17:监控利器prometheus helm方式部署配置测试
监控利器prometheus helm方式部署配置测试

1.部署helm

部署helm参考方法

后面使用helm部署grafana和prometheus,因此首先需要部署helm,保证helm能正常使用.

部署helm客户端过程见下:

[root@k8s-node1 helm]# curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 > get_helm.sh  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                 Dload  Upload   Total   Spent    Left  Speed100  6617  100  6617    0     0   5189      0  0:00:01  0:00:01 --:--:--  5193[root@k8s-node1 helm]# lsget_helm.sh[root@k8s-node1 helm]# chmod 700 get_helm.sh[root@k8s-node1 helm]# ./get_helm.sh Downloading https://get.helm.sh/helm-v3.0.2-linux-amd64.tar.gzPreparing to install helm into /usr/local/binhelm installed into /usr/local/bin/helm[root@k8s-node1 helm]# helm versionversion.BuildInfo{Version:"v3.0.2", GitCommit:"19e47ee3283ae98139d98460de796c1be1e3975f", GitTreeState:"clean", GoVersion:"go1.13.5"}

加yum

[root@k8s-node1 helm]# helm repo add stable https://kubernetes-charts.storage.googleapis.com/"stable" has been added to your repositories

搜搜看看

[root@k8s-node1 helm]# helm search repo stable |grep grafanastable/grafana                          4.2.2           6.5.2                   The leading tool for querying and visualizing t...[root@k8s-node1 helm]# helm search repo stable |grep prometheusstable/helm-exporter                    0.3.1           0.4.0                   Exports helm release stats to prometheus          stable/prometheus                       9.7.2           2.13.1                  Prometheus is a monitoring system and time seri...stable/prometheus-adapter               1.4.0           v0.5.0                  A Helm chart for k8s prometheus adapter           stable/prometheus-blackbox-exporter     1.6.0           0.15.1                  Prometheus Blackbox Exporter                      stable/prometheus-cloudwatch-exporter   0.5.0           0.6.0                   A Helm chart for prometheus cloudwatch-exporter   stable/prometheus-consul-exporter       0.1.4           0.4.0                   A Helm chart for the Prometheus Consul Exporter   stable/prometheus-couchdb-exporter      0.1.1           1.0                     A Helm chart to export the metrics from couchdb...stable/prometheus-mongodb-exporter      2.4.0           v0.10.0                 A Prometheus exporter for MongoDB metrics         stable/prometheus-mysql-exporter        0.5.2           v0.11.0                 A Helm chart for prometheus mysql exporter with...stable/prometheus-nats-exporter         2.3.0           0.6.0                   A Helm chart for prometheus-nats-exporter         stable/prometheus-node-exporter         1.8.1           0.18.1                  A Helm chart for prometheus node-exporter         stable/prometheus-operator              8.5.0           0.34.0                  Provides easy monitoring definitions for Kubern...stable/prometheus-postgres-exporter     1.1.1           0.5.1                   A Helm chart for prometheus postgres-exporter     stable/prometheus-pushgateway           1.2.10          1.0.1                   A Helm chart for prometheus pushgateway           stable/prometheus-rabbitmq-exporter     0.5.5           v0.29.0                 Rabbitmq metrics exporter for prometheus          stable/prometheus-redis-exporter        3.2.0           1.0.4                   Prometheus exporter for Redis metrics             stable/prometheus-snmp-exporter         0.0.4           0.14.0                  Prometheus SNMP Exporter                          stable/prometheus-to-sd                 0.3.0           0.5.2                   Scrape metrics stored in prometheus format and ...

部署个应用测试

[root@k8s-node1 helm]# helm install stable/nginx-ingress --generate-nameNAME: nginx-ingress-1577092943LAST DEPLOYED: Mon Dec 23 17:22:26 2019NAMESPACE: defaultSTATUS: deployedREVISION: 1TEST SUITE: NoneNOTES:The nginx-ingress controller has been installed.It may take a few minutes for the LoadBalancer IP to be available.You can watch the status by running 'kubectl --namespace default get services -o wide -w nginx-ingress-1577092943-controller'
[root@k8s-node1 helm]# helm lsNAME                        NAMESPACE   REVISION    UPDATED                                 STATUS      CHART                   APP VERSIONnginx-ingress-1577092943    default     1           2019-12-23 17:22:26.230661264 +0800 CST deployed    nginx-ingress-1.27.0    0.26.1  

都起来了,见下

[root@k8s-node1 helm]# kubectl get all |grep nginxpod/nginx-ingress-1577092943-controller-8468884448-9wszl        1/1     Running   0          4m49spod/nginx-ingress-1577092943-default-backend-74c4db5b5b-clc2s   1/1     Running   0          4m49sservice/nginx-ingress-1577092943-controller        LoadBalancer   10.254.229.168        80:8691/TCP,443:8569/TCP   4m49sservice/nginx-ingress-1577092943-default-backend   ClusterIP      10.254.37.89             80/TCP                     4m49sdeployment.apps/nginx-ingress-1577092943-controller        1/1     1            1           4m49sdeployment.apps/nginx-ingress-1577092943-default-backend   1/1     1            1           4m49sreplicaset.apps/nginx-ingress-1577092943-controller-8468884448        1         1         1       4m49sreplicaset.apps/nginx-ingress-1577092943-default-backend-74c4db5b5b   1         1         1       4m49s

部署完成,测试可行,移除现在安装的应用.

[root@k8s-node1 helm]# helm lsNAME                        NAMESPACE   REVISION    UPDATED                                 STATUS      CHART                   APP VERSIONnginx-ingress-1577092943    default     1           2019-12-23 17:22:26.230661264 +0800 CST deployed    nginx-ingress-1.27.0    0.26.1     [root@k8s-node1 helm]# helm uninstall nginx-ingress-1577092943release "nginx-ingress-1577092943" uninstalled

2.helm部署prometheus

helm部署prometheus

prometheus官方地址
prometheus学习文档

2.1.开始部署

[root@k8s-node1 ~]# helm install stable/prometheus --generate-nameNAME: prometheus-1577239571LAST DEPLOYED: Wed Dec 25 10:06:14 2019NAMESPACE: defaultSTATUS: deployedREVISION: 1TEST SUITE: NoneNOTES:The Prometheus server can be accessed via port 80 on the following DNS name from within your cluster:prometheus-1577239571-server.default.svc.cluster.local

2.2.遇到问题

检索启动的svc,pod

[root@k8s-node1 ~]# kubectl get svc,pod -o wide|grep prometheusservice/prometheus-1577239571-alertmanager         ClusterIP   10.254.251.30            80/TCP          2m26s   app=prometheus,component=alertmanager,release=prometheus-1577239571service/prometheus-1577239571-kube-state-metrics   ClusterIP   None                     80/TCP          2m26s   app=prometheus,component=kube-state-metrics,release=prometheus-1577239571service/prometheus-1577239571-node-exporter        ClusterIP   None                     9100/TCP        2m26s   app=prometheus,component=node-exporter,release=prometheus-1577239571service/prometheus-1577239571-pushgateway          ClusterIP   10.254.188.166           9091/TCP        2m26s   app=prometheus,component=pushgateway,release=prometheus-1577239571service/prometheus-1577239571-server               ClusterIP   10.254.128.74            80/TCP          2m26s   app=prometheus,component=server,release=prometheus-1577239571pod/prometheus-1577239571-alertmanager-67b967b8c7-lmjf7         0/2     Pending   0          2m25s                                pod/prometheus-1577239571-kube-state-metrics-6d86bf588b-w7hrq   1/1     Running   0          2m25s   172.30.4.7        k8s-node1              pod/prometheus-1577239571-node-exporter-k9bsf                   1/1     Running   0          2m25s   192.168.174.130   k8s-node3              pod/prometheus-1577239571-node-exporter-rv9k8                   1/1     Running   0          2m25s   192.168.174.129   k8s-node2              pod/prometheus-1577239571-node-exporter-xc8f2                   1/1     Running   0          2m25s   192.168.174.128   k8s-node1              pod/prometheus-1577239571-pushgateway-d9b4cb944-zppfm           1/1     Running   0          2m25s   172.30.26.7       k8s-node3              pod/prometheus-1577239571-server-c5d4dffbf-gzk9n                0/2     Pending   0          2m25s                                

有两个pod一直是pending状态,检索原因,describe看看如下报错:

Warning  FailedScheduling  25s (x5 over 4m27s)  default-scheduler  pod has unbound immediate PersistentVolumeClaims (repeated 3 times)

pvc的报错,检索pvc

[root@k8s-node1 templates]# kubectl get pvc |grep prometheusprometheus-1577239571-alertmanager   Pending                                                                                              21mprometheus-1577239571-server         Pending                                                                                              21m

describe pvc看看详情,报错,没有pv或者没有对接存储,因此无法启用pvc,报错见下:

 Normal  FailedBinding  16s (x82 over 20m)  persistentvolume-controller  no persistent volumes available for this claim and no storage class is set

怎么办?我这里的集群是pvc动态对接nfs存储的,能否修改成对接nfs存储呢?

对接Nfs存储参考前面文章,storageclass的名字见下:

[root@k8s-node1 templates]# kubectl get storageclassNAME                  PROVISIONER      AGEmanaged-nfs-storage   fuseim.pri/ifs   5d17h

2.3.对接存储,解决报错

检索stable/prometheus的变量,检索关于报错的pv的变量设置,参考命令见下

helm show values stable/prometheus

因为需要检索后修改,把chart文件下下来,检索修改.

[root@k8s-node1 prometheus-grafana]# helm pull stable/prometheus[root@k8s-node1 prometheus-grafana]# lsprometheus-9.7.2.tgz[root@k8s-node1 prometheus-grafana]# tar zxvf prometheus-9.7.2.tgz  --warning=no-timestamp[root@k8s-node1 prometheus-grafana]# lsprometheus  prometheus-9.7.2.tgz
[root@k8s-node1 prometheus-grafana]# tree prometheusprometheus├── Chart.yaml├── README.md├── templates│   ├── alertmanager-clusterrolebinding.yaml│   ├── alertmanager-clusterrole.yaml│   ├── alertmanager-configmap.yaml│   ├── alertmanager-deployment.yaml│   ├── alertmanager-ingress.yaml│   ├── alertmanager-networkpolicy.yaml│   ├── alertmanager-pdb.yaml│   ├── alertmanager-podsecuritypolicy.yaml│   ├── alertmanager-pvc.yaml│   ├── alertmanager-serviceaccount.yaml│   ├── alertmanager-service-headless.yaml│   ├── alertmanager-service.yaml│   ├── alertmanager-statefulset.yaml│   ├── _helpers.tpl│   ├── kube-state-metrics-clusterrolebinding.yaml│   ├── kube-state-metrics-clusterrole.yaml│   ├── kube-state-metrics-deployment.yaml│   ├── kube-state-metrics-networkpolicy.yaml│   ├── kube-state-metrics-pdb.yaml│   ├── kube-state-metrics-podsecuritypolicy.yaml│   ├── kube-state-metrics-serviceaccount.yaml│   ├── kube-state-metrics-svc.yaml│   ├── node-exporter-daemonset.yaml│   ├── node-exporter-podsecuritypolicy.yaml│   ├── node-exporter-rolebinding.yaml│   ├── node-exporter-role.yaml│   ├── node-exporter-serviceaccount.yaml│   ├── node-exporter-service.yaml│   ├── NOTES.txt│   ├── pushgateway-clusterrolebinding.yaml│   ├── pushgateway-clusterrole.yaml│   ├── pushgateway-deployment.yaml│   ├── pushgateway-ingress.yaml│   ├── pushgateway-networkpolicy.yaml│   ├── pushgateway-pdb.yaml│   ├── pushgateway-podsecuritypolicy.yaml│   ├── pushgateway-pvc.yaml│   ├── pushgateway-serviceaccount.yaml│   ├── pushgateway-service.yaml│   ├── server-clusterrolebinding.yaml│   ├── server-clusterrole.yaml│   ├── server-configmap.yaml│   ├── server-deployment.yaml│   ├── server-ingress.yaml│   ├── server-networkpolicy.yaml│   ├── server-pdb.yaml│   ├── server-podsecuritypolicy.yaml│   ├── server-pvc.yaml│   ├── server-serviceaccount.yaml│   ├── server-service-headless.yaml│   ├── server-service.yaml│   ├── server-statefulset.yaml│   └── server-vpa.yaml└── values.yaml1 directory, 56 files

定义所有变量的是values.yaml文件,检索这个文件.

包含的东西特别多,需要一条条去检查,关于pv的其中一个配置定义

persistentVolume:    ## If true, alertmanager will create/use a Persistent Volume Claim    ## If false, use emptyDir    ##    enabled: true    ## alertmanager data Persistent Volume access modes    ## Must match those of existing PV or dynamic provisioner    ## Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/    ##    accessModes:      - ReadWriteOnce    ## alertmanager data Persistent Volume Claim annotations    ## annotations: {}    ## alertmanager data Persistent Volume existing claim name    ## Requires alertmanager.persistentVolume.enabled: true    ## If defined, PVC must be created manually before volume will be bound    existingClaim: ""    ## alertmanager data Persistent Volume mount root path    ##    mountPath: /data    ## alertmanager data Persistent Volume size    ##    size: 2Gi    ## alertmanager data Persistent Volume Storage Class    ## If defined, storageClassName:     ## If set to "-", storageClassName: "", which disables dynamic provisioning    ## If undefined (the default) or set to null, no storageClassName spec is    ##   set, choosing the default provisioner.  (gp2 on AWS, standard on    ##   GKE, AWS & OpenStack)    ##    # storageClass: "-"    ## alertmanager data Persistent Volume Binding Mode    ## If defined, volumeBindingMode:     ## If undefined (the default) or set to null, no volumeBindingMode spec is    ##   set, choosing the default mode.    ##

根据上面的变量解释,可知,chart定义了一个2GB的pvc,配置pvc对接动态存储的参数是:storageClass,默认没有启用.启用这个参数对接storageclass.

把# storageClass: "-" 修改成 storageClass: managed-nfs-storage (managed-nfs-storage 是我在集群配置的storageclass的名字,总共需要修改三处)

[root@k8s-node1 prometheus-grafana]# cat prometheus/values.yaml  |grep -B 8  managed    ## alertmanager data Persistent Volume Storage Class    ## If defined, storageClassName:     ## If set to "-", storageClassName: "", which disables dynamic provisioning    ## If undefined (the default) or set to null, no storageClassName spec is    ##   set, choosing the default provisioner.  (gp2 on AWS, standard on    ##   GKE, AWS & OpenStack)    ##    # storageClass: "-"    storageClass: managed-nfs-storage--    ## Prometheus server data Persistent Volume Storage Class    ## If defined, storageClassName:     ## If set to "-", storageClassName: "", which disables dynamic provisioning    ## If undefined (the default) or set to null, no storageClassName spec is    ##   set, choosing the default provisioner.  (gp2 on AWS, standard on    ##   GKE, AWS & OpenStack)    ##    # storageClass: "-"    storageClass: managed-nfs-storage    --    ## pushgateway data Persistent Volume Storage Class    ## If defined, storageClassName:     ## If set to "-", storageClassName: "", which disables dynamic provisioning    ## If undefined (the default) or set to null, no storageClassName spec is    ##   set, choosing the default provisioner.  (gp2 on AWS, standard on    ##   GKE, AWS & OpenStack)    ##    # storageClass: "-"    storageClass: managed-nfs-storage    

可以的,修改对接存储参数后,安装成功,见下

[root@k8s-node1 prometheus-grafana]# kubectl get svc,pod -o wide |grep prometheusservice/prometheus-1577263826-alertmanager         ClusterIP   10.254.112.105           80/TCP          4m6s   app=prometheus,component=alertmanager,release=prometheus-1577263826service/prometheus-1577263826-kube-state-metrics   ClusterIP   None                     80/TCP          4m6s   app=prometheus,component=kube-state-metrics,release=prometheus-1577263826service/prometheus-1577263826-node-exporter        ClusterIP   None                     9100/TCP        4m6s   app=prometheus,component=node-exporter,release=prometheus-1577263826service/prometheus-1577263826-pushgateway          ClusterIP   10.254.185.145           9091/TCP        4m6s   app=prometheus,component=pushgateway,release=prometheus-1577263826service/prometheus-1577263826-server               ClusterIP   10.254.132.104           80/TCP          4m6s   app=prometheus,component=server,release=prometheus-1577263826pod/prometheus-1577263826-alertmanager-5cfccc55b7-6hdqn         2/2     Running   0          4m5s    172.30.26.8       k8s-node3              pod/prometheus-1577263826-kube-state-metrics-697db589d4-d5rmm   1/1     Running   0          4m5s    172.30.26.7       k8s-node3              pod/prometheus-1577263826-node-exporter-5gcc2                   1/1     Running   0          4m5s    192.168.174.129   k8s-node2              pod/prometheus-1577263826-node-exporter-b569p                   1/1     Running   0          4m5s    192.168.174.130   k8s-node3              pod/prometheus-1577263826-node-exporter-mft6l                   1/1     Running   0          4m5s    192.168.174.128   k8s-node1              pod/prometheus-1577263826-pushgateway-95c67bd5d-28p25           1/1     Running   0          4m5s    172.30.4.7        k8s-node1              pod/prometheus-1577263826-server-88fbdfc47-p2bfm                2/2     Running   0          4m5s    172.30.4.8        k8s-node1              

2.4.prometheus基础概念

prometheus这些组件的作用

资料来源

prometheus server

Prometheus Server是Prometheus组件中的核心部分,负责实现对监控数据的获取,存储以及查询.

Prometheus Server内置的Express Browser UI,通过这个UI可以直接通过PromQL实现数据的查询以及可视化.

node-exporter

Exporter将监控数据采集的端点通过HTTP服务的形式暴露给Prometheus Server,Prometheus Server通过访问该Exporter提供的Endpoint端点,即可获取到需要采集的监控数据.

alertmanager

在Prometheus Server中支持基于PromQL创建告警规则,如果满足PromQL定义的规则,则会产生一条告警,而告警的后续处理流程则由AlertManager进行管理.在AlertManager中我们可以与邮件,Slack等等内置的通知方式进行集成,也可以通过Webhook自定义告警处理方式.AlertManager即Prometheus体系中的告警处理中心.

pushgateway

由于Prometheus数据采集基于Pull模型进行设计,因此在网络环境的配置上必须要让Prometheus Server能够直接与Exporter进行通信.当这种网络需求无法直接满足时,就可以利用PushGateway来进行中转.可以通过PushGateway将内部网络的监控数据主动Push到Gateway当中.而Prometheus Server则可以采用同样Pull的方式从PushGateway中获取到监控数据.

这里的环境用不到这个.

kube-state-metrics

基础概念是:kube-state-metrics轮询Kubernetes API,并将Kubernetes的结构化信息转换为metrics.比如调度多少rc,现在可用多少个rc?现在有多少个Job在执行?

2.5.配置web访问prometheus server和kube-state-metrics

前面环境部署了traefik,只需要添加ingress即可,见下:

prometheus server

[root@k8s-node1 prometheus-grafana]# cat prometheus-server-ingress.yaml apiVersion: extensions/v1beta1kind: Ingressmetadata:  name: prometheus-server  namespace: defaultspec:  rules:  - host: prometheus-server    http:      paths:      - path: /        backend:          serviceName: prometheus-1577263826-server          servicePort: 80

kube-state-metrics

[root@k8s-node1 prometheus-grafana]# cat kube-state-ingress.yaml apiVersion: extensions/v1beta1kind: Ingressmetadata:  name: kube-state  namespace: defaultspec:  rules:  - host: kube-state    http:      paths:      - path: /        backend:          serviceName: prometheus-1577263826-kube-state-metrics          servicePort: 80

指定下Host解析,即可正常访问,注意两个server都是https的.怎么配置traefik请参考traefik的配置.

0