导航：首页 > 服务器 >

kubernetes Readiness and liveness and startupProbe

发表于：2025-02-04 作者：千家信息网编辑

千家信息网最后更新 2025年02月04日，kubernetes Pod 的生命周期（Readiness and liveness and startupProbe）容器探针为什么要使用readiness and liveness？因为k8s中

千家信息网最后更新 2025年02月04日kubernetes Readiness and liveness and startupProbe

kubernetes Pod 的生命周期（Readiness and liveness and startupProbe）

容器探针

为什么要使用readiness and liveness？

因为k8s中采用大量的异步机制、以及多种对象关系设计上的解耦，当应用实例数 增加/删除、或者应用版本发生变化触发滚动升级时，系统并不能保证应用相关的service、ingress配置总是及时能完成刷新。在一些情况下，往往只是新的Pod完成自身初始化，系统尚未完成EndPoint、负载均衡器等外部可达的访问信息刷新，老得Pod就立即被删除，最终造成服务短暂的额不可用，这对于生产来说是不可接受的，所以这个时候存活探针（Readiness）就登场了

启动探针（startup Probe）

有时候，服务启动之后并不一定能够立马使用，我们以前常做的就是使用就绪探针设置initialDelay(容器启动后多少s开始探测)值，来判断服务是否存活，大概设置如下

livenessProbe:  httpGet:    path: /test    prot: 80  failureThreshold: 1  initialDelay：10  periodSeconds: 10

但是这个时候会出现这么一个情况，如果我们的服务A启动需要60s ，如果采用上面探针的会，这个pod就陷入死循环了，因为启动之后经过10s探测发现不正常就会更具重启策略进行重启Pod，一直进入死循环。那聪明的你肯定能猜到我们调整下initialDelay的值不就好了吗？但是你能保证每个服务你都知道启动需要多少s 能好吗？
聪明的您肯定又想到了哪我们可以调整下failureThreshold的值就好了，但是应该调整为多大呢？如果我们设置成

livenessProbe:  httpGet:    path: /test    prot: 80  failureThreshold: 5  initialDelay：10  periodSeconds: 10

如果设置成这样，第一次pod 是能正常启动了，但是我们到后面探测的话需要（5*10s=50s）我们才能发现我们的服务不可用。这在生产中是不允许发生的，所以我们采用startupProbe使用和livenessProbe一样的探针来判断服务是否启动成功了

livenessProbe:  httpGet:    path: /test    prot: 80  failureThreshold: 1  initialDelay：10  periodSeconds: 10startupProbe:  httpGet:    path: /test    prot: 80  failureThreshold: 10  initialDelay：10  periodSeconds: 10

我们这只成这样的话，只要服务在1010=100s内任何时候启动来都行，探针探测成功后就交给livenessProbe进行继续探测了，当我们发现问题的时候110=10 在10s内就能发现问题，并及时作出响应。

服务探针(readiness probe)

检测容器中的程序是否启动就绪,只有当检测容器中的程序启动成功之后,才会变成running状态,否则就是容器启动成功,他还是失败的信号(因为他里面的服务没有探测成功)

存活探针(liveness Probe)(是否运行)

检测容器是否在运行,只是单纯的检测容器是否存活,并不会检测里面的服务是否正常.如果探针检测到失败,他将启动他的重启策略.

三种类型的处理程序:

1, ExecAction: 通过自定义命令来进行探测,当返回值是0的时候说明存活,当返回值非0的时候表示不存活.
2, TCPSocketAction: 对容器上的端口进行tcp检查,如果端口是打开的,则说明存活
3, HTTPGetAction: 对指定端口和url地址执行HTTP Get请求,如果响应的状态码大于等于200且小于400,则认为存活

每次探测都只能只能是下面三种结果:

1, 成功: 容器通过了测试
2, 失败: 容器未通过测试
3, 未知: 测试失败,因此不会采取任何动作

探针示例:

ExecAction

# cat nginx.yamlapiVersion: v1kind: Podmetadata:  name: nginxspec:  restartPolicy: OnFailure  containers:  - name: nginx    image: nginx:1.14.1    imagePullPolicy: IfNotPresent    ports:    - name: http      containerPort: 80      protocol: TCP    - name: https      containerPort: 443      protocol: TCP    livenessProbe:      exec:        command: ["test","-f","/usr/share/nginx/html/index.html"]      failureThreshold: 3      initialDelaySeconds: 5      periodSeconds: 5      timeoutSeconds: 5    readinessProbe:      httpGet:        port: 80        path: /index.html      initialDelaySeconds: 15      timeoutSeconds: 1

我们启动这个容器,测试一下服务探针.

kubectl create -f nginx.yaml

我们进入到nginx容器里面把index这个文件删除了,看看详细信息

#kubectl describe pod nginx.....Events:  Type     Reason     Age                From                    Message  ----     ------     ----               ----                    -------  Normal   Scheduled  4m24s              default-scheduler       Successfully assigned default/nginx to 192.168.1.124  Normal   Pulling    4m23s              kubelet, 192.168.1.124  pulling image "nginx:1.14.1"  Normal   Pulled     4m1s               kubelet, 192.168.1.124  Successfully pulled image "nginx:1.14.1"  Warning  Unhealthy  57s                kubelet, 192.168.1.124  Readiness probe failed: HTTP probe failed with statuscode: 404  Warning  Unhealthy  50s (x3 over 60s)  kubelet, 192.168.1.124  Liveness probe failed:  Normal   Killing    50s                kubelet, 192.168.1.124  Killing container with id docker://nginx:Container failed liveness probe.. Container will be killed and recreated.  Normal   Pulled     50s                kubelet, 192.168.1.124  Container image "nginx:1.14.1" already present on machine  Normal   Created    49s (x2 over 4m)   kubelet, 192.168.1.124  Created container  Normal   Started    49s (x2 over 4m)   kubelet, 192.168.1.124  Started container

很明显的从事件信息里面可以看到他服务探测有一次是报错404的,然后立马就执行了重启容器的过程

探针参数介绍:

exec: 使用自定义命令编写探针
httpGet: 使用http访问的方式探测
tcpSocket: 使用tcp套接字来探测
failureThreshold: 连续失败几次算真正的失败
initialDelaySeconds: 容器启动多少秒之后开始探测(因为容器里面的服务启动需要时间)
periodSeconds: 探测时间间隔多少秒
timeoutSeconds: 命令执行的超时时间

HTTPGet的探针参数: