千家信息网

Kubernetes节点之间的ping监控怎么实现

发表于:2024-11-25 作者:千家信息网编辑
千家信息网最后更新 2024年11月25日,小编给大家分享一下Kubernetes节点之间的ping监控怎么实现,希望大家阅读完这篇文章之后都有所收获,下面让我们一起去探讨吧!脚本和配置我们解决方案的主要组件是一个脚本,该脚本监视每个节点的.s
千家信息网最后更新 2024年11月25日Kubernetes节点之间的ping监控怎么实现

小编给大家分享一下Kubernetes节点之间的ping监控怎么实现,希望大家阅读完这篇文章之后都有所收获,下面让我们一起去探讨吧!

脚本和配置

我们解决方案的主要组件是一个脚本,该脚本监视每个节点的.status.addresses值。如果某个节点的该值已更改(例如添加了新节点),则我们的脚本使用Helm value方式将节点列表以ConfigMap的形式传递给Helm图表:

apiVersion: v1 kind: ConfigMap metadata: name: ping-exporter-config namespace: d8-system data: nodes.json: > {{ .Values.pingExporter.targets | toJson }}    .Values.pingExporter.targets类似以下:  "cluster_targets":[{"ipAddress":"192.168.191.11","name":"kube-a-3"},{"ipAddress":"192.168.191.12","name":"kube-a-2"},{"ipAddress":"192.168.191.22","name":"kube-a-1"},{"ipAddress":"192.168.191.23","name":"kube-db-1"},{"ipAddress":"192.168.191.9","name":"kube-db-2"},{"ipAddress":"51.75.130.47","name":"kube-a-4"}],"external_targets":[{"host":"8.8.8.8","name":"google-dns"},{"host":"youtube.com"}]}

下面是Python脚本:

  1. #!/usr/bin/env python3

  2. import subprocess

  3. import prometheus_client

  4. import re

  5. import statistics

  6. import os

  7. import json

  8. import glob

  9. import better_exchook

  10. import datetime

  11. better_exchook.install()

  12. FPING_CMDLINE = "/usr/sbin/fping -p 1000 -C 30 -B 1 -q -r 1".split(" ")

  13. FPING_REGEX = re.compile(r"^(\S*)\s*: (.*)$", re.MULTILINE)

  14. CONFIG_PATH = "/config/targets.json"

  15. registry = prometheus_client.CollectorRegistry()

  16. prometheus_exceptions_counter = \

  17. prometheus_client.Counter('kube_node_ping_exceptions', 'Total number of exceptions', [], registry=registry)

  18. prom_metrics_cluster = {"sent": prometheus_client.Counter('kube_node_ping_packets_sent_total',

  19. 'ICMP packets sent',

  20. ['destination_node', 'destination_node_ip_address'],

  21. registry=registry),

  22. "received": prometheus_client.Counter('kube_node_ping_packets_received_total',

  23. 'ICMP packets received',

  24. ['destination_node', 'destination_node_ip_address'],

  25. registry=registry),

  26. "rtt": prometheus_client.Counter('kube_node_ping_rtt_milliseconds_total',

  27. 'round-trip time',

  28. ['destination_node', 'destination_node_ip_address'],

  29. registry=registry),

  30. "min": prometheus_client.Gauge('kube_node_ping_rtt_min', 'minimum round-trip time',

  31. ['destination_node', 'destination_node_ip_address'],

  32. registry=registry),

  33. "max": prometheus_client.Gauge('kube_node_ping_rtt_max', 'maximum round-trip time',

  34. ['destination_node', 'destination_node_ip_address'],

  35. registry=registry),

  36. "mdev": prometheus_client.Gauge('kube_node_ping_rtt_mdev',

  37. 'mean deviation of round-trip times',

  38. ['destination_node', 'destination_node_ip_address'],

  39. registry=registry)}

  40. prom_metrics_external = {"sent": prometheus_client.Counter('external_ping_packets_sent_total',

  41. 'ICMP packets sent',

  42. ['destination_name', 'destination_host'],

  43. registry=registry),

  44. "received": prometheus_client.Counter('external_ping_packets_received_total',

  45. 'ICMP packets received',

  46. ['destination_name', 'destination_host'],

  47. registry=registry),

  48. "rtt": prometheus_client.Counter('external_ping_rtt_milliseconds_total',

  49. 'round-trip time',

  50. ['destination_name', 'destination_host'],

  51. registry=registry),

  52. "min": prometheus_client.Gauge('external_ping_rtt_min', 'minimum round-trip time',

  53. ['destination_name', 'destination_host'],

  54. registry=registry),

  55. "max": prometheus_client.Gauge('external_ping_rtt_max', 'maximum round-trip time',

  56. ['destination_name', 'destination_host'],

  57. registry=registry),

  58. "mdev": prometheus_client.Gauge('external_ping_rtt_mdev',

  59. 'mean deviation of round-trip times',

  60. ['destination_name', 'destination_host'],

  61. registry=registry)}

  62. def validate_envs():

  63. envs = {"MY_NODE_NAME": os.getenv("MY_NODE_NAME"), "PROMETHEUS_TEXTFILE_DIR": os.getenv("PROMETHEUS_TEXTFILE_DIR"),

  64. "PROMETHEUS_TEXTFILE_PREFIX": os.getenv("PROMETHEUS_TEXTFILE_PREFIX")}

  65. for k, v in envs.items():

  66. if not v:

  67. raise ValueError("{} environment variable is empty".format(k))

  68. return envs

  69. @prometheus_exceptions_counter.count_exceptions()

  70. def compute_results(results):

  71. computed = {}

  72. matches = FPING_REGEX.finditer(results)

  73. for match in matches:

  74. host = match.group(1)

  75. ping_results = match.group(2)

  76. if "duplicate" in ping_results:

  77. continue

  78. splitted = ping_results.split(" ")

  79. if len(splitted) != 30:

  80. raise ValueError("ping returned wrong number of results: \"{}\"".format(splitted))

  81. positive_results = [float(x) for x in splitted if x != "-"]

  82. if len(positive_results) > 0:

  83. computed[host] = {"sent": 30, "received": len(positive_results),

  84. "rtt": sum(positive_results),

  85. "max": max(positive_results), "min": min(positive_results),

  86. "mdev": statistics.pstdev(positive_results)}

  87. else:

  88. computed[host] = {"sent": 30, "received": len(positive_results), "rtt": 0,

  89. "max": 0, "min": 0, "mdev": 0}

  90. if not len(computed):

  91. raise ValueError("regex match\"{}\" found nothing in fping output \"{}\"".format(FPING_REGEX, results))

  92. return computed

  93. @prometheus_exceptions_counter.count_exceptions()

  94. def call_fping(ips):

  95. cmdline = FPING_CMDLINE + ips

  96. process = subprocess.run(cmdline, stdout=subprocess.PIPE,

  97. stderr=subprocess.STDOUT, universal_newlines=True)

  98. if process.returncode == 3:

  99. raise ValueError("invalid arguments: {}".format(cmdline))

  100. if process.returncode == 4:

  101. raise OSError("fping reported syscall error: {}".format(process.stderr))

  102. return process.stdout

  103. envs = validate_envs()

  104. files = glob.glob(envs["PROMETHEUS_TEXTFILE_DIR"] + "*")

  105. for f in files:

  106. os.remove(f)

  107. labeled_prom_metrics = {"cluster_targets": [], "external_targets": []}

  108. while True:

  109. with open(CONFIG_PATH, "r") as f:

  110. config = json.loads(f.read())

  111. config["external_targets"] = [] if config["external_targets"] is None else config["external_targets"]

  112. for target in config["external_targets"]:

  113. target["name"] = target["host"] if "name" not in target.keys() else target["name"]

  114. if labeled_prom_metrics["cluster_targets"]:

  115. for metric in labeled_prom_metrics["cluster_targets"]:

  116. if (metric["node_name"], metric["ip"]) not in [(node["name"], node["ipAddress"]) for node in config['cluster_targets']]:

  117. for k, v in prom_metrics_cluster.items():

  118. v.remove(metric["node_name"], metric["ip"])

  119. if labeled_prom_metrics["external_targets"]:

  120. for metric in labeled_prom_metrics["external_targets"]:

  121. if (metric["target_name"], metric["host"]) not in [(target["name"], target["host"]) for target in config['external_targets']]:

  122. for k, v in prom_metrics_external.items():

  123. v.remove(metric["target_name"], metric["host"])

  124. labeled_prom_metrics = {"cluster_targets": [], "external_targets": []}

  125. for node in config["cluster_targets"]:

  126. metrics = {"node_name": node["name"], "ip": node["ipAddress"], "prom_metrics": {}}

  127. for k, v in prom_metrics_cluster.items():

  128. metrics["prom_metrics"][k] = v.labels(node["name"], node["ipAddress"])

  129. labeled_prom_metrics["cluster_targets"].append(metrics)

  130. for target in config["external_targets"]:

  131. metrics = {"target_name": target["name"], "host": target["host"], "prom_metrics": {}}

  132. for k, v in prom_metrics_external.items():

  133. metrics["prom_metrics"][k] = v.labels(target["name"], target["host"])

  134. labeled_prom_metrics["external_targets"].append(metrics)

  135. out = call_fping([prom_metric["ip"] for prom_metric in labeled_prom_metrics["cluster_targets"]] + \

  136. [prom_metric["host"] for prom_metric in labeled_prom_metrics["external_targets"]])

  137. computed = compute_results(out)

  138. for dimension in labeled_prom_metrics["cluster_targets"]:

  139. result = computed[dimension["ip"]]

  140. dimension["prom_metrics"]["sent"].inc(computed[dimension["ip"]]["sent"])

  141. dimension["prom_metrics"]["received"].inc(computed[dimension["ip"]]["received"])

  142. dimension["prom_metrics"]["rtt"].inc(computed[dimension["ip"]]["rtt"])

  143. dimension["prom_metrics"]["min"].set(computed[dimension["ip"]]["min"])

  144. dimension["prom_metrics"]["max"].set(computed[dimension["ip"]]["max"])

  145. dimension["prom_metrics"]["mdev"].set(computed[dimension["ip"]]["mdev"])

  146. for dimension in labeled_prom_metrics["external_targets"]:

  147. result = computed[dimension["host"]]

  148. dimension["prom_metrics"]["sent"].inc(computed[dimension["host"]]["sent"])

  149. dimension["prom_metrics"]["received"].inc(computed[dimension["host"]]["received"])

  150. dimension["prom_metrics"]["rtt"].inc(computed[dimension["host"]]["rtt"])

  151. dimension["prom_metrics"]["min"].set(computed[dimension["host"]]["min"])

  152. dimension["prom_metrics"]["max"].set(computed[dimension["host"]]["max"])

  153. dimension["prom_metrics"]["mdev"].set(computed[dimension["host"]]["mdev"])

  154. prometheus_client.write_to_textfile(


  155. envs["PROMETHEUS_TEXTFILE_DIR"] + envs["PROMETHEUS_TEXTFILE_PREFIX"] + envs["MY_NODE_NAME"] + ".prom", registry)

该脚本在每个Kubernetes节点上运行,并且每秒两次发送ICMP数据包到Kubernetes集群的所有实例。收集的结果会存储在文本文件中。

该脚本会包含在Docker镜像中:

FROM python:3.6-alpine3.8 COPY rootfs / WORKDIR /app RUN pip3 install --upgrade pip && pip3 install -r requirements.txt && apk add --no-cache fping ENTRYPOINT ["python3", "/app/ping-exporter.py"]

另外,我们还创建了一个ServiceAccount和一个具有唯一权限的对应角色用于获取节点列表(这样我们就可以知道它们的IP地址):

apiVersion: v1 kind: ServiceAccount metadata: name: ping-exporter namespace: d8-system --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: d8-system:ping-exporter rules: - apiGroups: [""] resources: ["nodes"] verbs: ["list"] --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: d8-system:kube-ping-exporter subjects: - kind: ServiceAccount name: ping-exporter namespace: d8-system roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: d8-system:ping-exporter

最后,我们需要DaemonSet来运行在集群中的所有实例:

apiVersion: apps/v1 kind: DaemonSet metadata: name: ping-exporter namespace: d8-system spec: updateStrategy: type: RollingUpdate selector: matchLabels:   name: ping-exporter template: metadata:   labels:     name: ping-exporter spec:   terminationGracePeriodSeconds: 0   tolerations:   - operator: "Exists"   hostNetwork: true   serviceAccountName: ping-exporter   priorityClassName: cluster-low   containers:   - image: private-registry.flant.com/ping-exporter/ping-exporter:v1     name: ping-exporter     env:       - name: MY_NODE_NAME         valueFrom:           fieldRef:             fieldPath: spec.nodeName       - name: PROMETHEUS_TEXTFILE_DIR         value: /node-exporter-textfile/       - name: PROMETHEUS_TEXTFILE_PREFIX         value: ping-exporter_     volumeMounts:       - name: textfile         mountPath: /node-exporter-textfile       - name: config         mountPath: /config   volumes:     - name: textfile       hostPath:         path: /var/run/node-exporter-textfile     - name: config       configMap:         name: ping-exporter-config   imagePullSecrets:   - name: private-registry

该解决方案的最后操作细节是:

  • Python脚本执行时,其结果(即存储在主机上/var/run/node-exporter-textfile目录中的文本文件)将传递到DaemonSet类型的node-exporter。

  • node-exporter使用--collector.textfile.directory /host/textfile参数启动,这里的/host/textfile是hostPath目录/var/run/node-exporter-textfile。(你可以点击这里了解关于node-exporter中文本文件收集器的更多信息。)

  • 最后node-exporter读取这些文件,然后Prometheus从node-exporter实例上收集所有数据。

那么结果如何?

现在该来享受期待已久的结果了。指标创建之后,我们可以使用它们,当然也可以对其进行可视化。以下可以看到它们是怎样的。

首先,有一个通用选择器可让我们在其中选择节点以检查其"源"和"目标"连接。你可以获得一个汇总表,用于在Grafana仪表板中指定的时间段内ping选定节点的结果:

以下是包含有关选定节点的组合统计信息的图形:

另外,我们有一个记录列表,其中每个记录都链接到在"源"节点中选择的每个特定节点的图:

如果将记录展开,你将看到从当前节点到目标节点中已选择的所有其他节点的详细ping统计信息:

下面是相关的图形:

节点之间的ping出现问题的图看起来如何?

如果你在现实生活中观察到类似情况,那就该进行故障排查了!

最后,这是我们对外部主机执行ping操作的可视化效果:

我们可以检查所有节点的总体视图,也可以仅检查任何特定节点的图形:

看完了这篇文章,相信你对"Kubernetes节点之间的ping监控怎么实现"有了一定的了解,如果想了解更多相关知识,欢迎关注行业资讯频道,感谢各位的阅读!

0