导航：首页 > 服务器 >

K8S中Eviction Manager如何实现Pod驱逐

发表于：2025-01-28 作者：千家信息网编辑

千家信息网最后更新 2025年01月28日，今天就跟大家聊聊有关K8S中Eviction Manager如何实现Pod驱逐，可能很多人都不太了解，为了让大家更加了解，小编给大家总结了以下内容，希望大家根据这篇文章可以有所收获。为了保证Node节

千家信息网最后更新 2025年01月28日K8S中Eviction Manager如何实现Pod驱逐

今天就跟大家聊聊有关K8S中Eviction Manager如何实现Pod驱逐，可能很多人都不太了解，为了让大家更加了解，小编给大家总结了以下内容，希望大家根据这篇文章可以有所收获。

为了保证Node节点的稳定性，当资源(memory/storage)出现紧缺时，kubelet会主动选择驱逐一些Pods来释放资源。实现该功能的组件是Eviction Manager。

当驱逐一个Pod时，kubelet会将pod内的所有containers都kill掉，并把pod的状态设置为Failed。被kill掉的pod，可能被调度到其他node上。

可以人为定义Thresholds来告诉Kubelet在什么情况下驱逐pods。有两种类型的thresholds：

Soft Eviction Thresholds - 到达阈值时，并不会马上触发驱逐操作，而是会等待一个用户配置的grace period之后再触发。

Hard Eviction Thresholds - 立刻Kill Pods。

实现

Eviction Manager相关的代码在包/pkg/kubelet/eviction中，核心逻辑是managerImpl.synchronize方法。EvictionManager会在一个单独的协程中周期性调用synchronize方法，实现驱逐。

synchronize方法主要包含以下几个步骤：

1.初始化配置

func (m *managerImpl) synchronize(diskInfoProvider DiskInfoProvider, podFunc ActivePodsFunc, capacityProvider CapacityProvider) []*v1.Pod {

// 1. 从配置中读取所有Thresholds

thresholds := m.config.Thresholds

if len(thresholds) == 0 {

return nil

}

....

// 2. 初始化rank funcs/reclaim funcs等

if m.dedicatedImageFs == nil {

hasImageFs, ok := diskInfoProvider.HasDedicatedImageFs()

if ok != nil {

return nil

}

m.dedicatedImageFs = &hasImageFs

m.resourceToRankFunc = buildResourceToRankFunc(hasImageFs)

m.resourceToNodeReclaimFuncs = buildResourceToNodeReclaimFuncs(m.imageGC, m.containerGC, hasImageFs)

}

// 3. 通过初始化传入的func获取Pods

activePods := podFunc()

// 4. 通过summary provider获得当前资源使用情况

observations, statsFunc, err := makeSignalObservations(m.summaryProvider, capacityProvider, activePods)

....

// 5. 通过memcg加速内存占用通知，只在notifiersInitialized 为false时进入

if m.config.KernelMemcgNotification && !m.notifiersInitialized {

....

m.notifiersInitialized = true // 初始化完成

err = startMemoryThresholdNotifier(m.config.Thresholds, observations, true, func(desc string) {

// 回调函数，memcg的通知会立即触发synchronize函数

glog.Infof("hard memory eviction threshold crossed at %s", desc)

m.synchronize(diskInfoProvider, podFunc, capacityProvider)

})

....

}

....

}

2.计算Thresholds

func (m *managerImpl) synchronize(diskInfoProvider DiskInfoProvider, podFunc ActivePodsFunc, capacityProvider CapacityProvider) []*v1.Pod {

....

// 1. 根据配置的thresholds参数以及当前资源使用情况，计算出超出的thresholds

thresholds = thresholdsMet(thresholds, observations, false)

// 2. 合并上次计算的thresholds结果

if len(m.thresholdsMet) > 0 {

thresholdsNotYetResolved := thresholdsMet(m.thresholdsMet, observations, true)

thresholds = mergeThresholds(thresholds, thresholdsNotYetResolved)

}

// 3. 过滤未真正激活的soft thresholds

now := m.clock.Now()

thresholdsFirstObservedAt := thresholdsFirstObservedAt(thresholds, m.thresholdsFirstObservedAt, now)

....

thresholds = thresholdsMetGracePeriod(thresholdsFirstObservedAt, now)

// 4. 更新计算结果

m.Lock()

m.nodeConditions = nodeConditions

m.thresholdsFirstObservedAt = thresholdsFirstObservedAt

m.nodeConditionsLastObservedAt = nodeConditionsLastObservedAt

m.thresholdsMet = thresholds

// determine the set of thresholds whose stats have been updated since the last sync

thresholds = thresholdsUpdatedStats(thresholds, observations, m.lastObservations)

debugLogThresholdsWithObservation("thresholds - updated stats", thresholds, observations)

m.lastObservations = observations

m.Unlock()

...

}

3.计算本轮Eviction考察的Resource

在每一轮Eviction中，kubelet至多只会kill一个Pod。由于Eviction Manager会同时处理多种资源(memory/storage)的紧缺情况，因此在选择Pod之前，首先会选出本轮Eviction参考的资源类型，再将Pods对该种资源的使用量进行排序，选出kill掉的Pod。

func (m *managerImpl) synchronize(diskInfoProvider DiskInfoProvider, podFunc ActivePodsFunc, capacityProvider CapacityProvider) []*v1.Pod {

....

// 1. 收集当前所有发生紧缺的resource类型

starvedResources := getStarvedResources(thresholds)

if len(starvedResources) == 0 {

glog.V(3).Infof("eviction manager: no resources are starved")

return nil

}

// 2. 排序并选择其中一种resource

sort.Sort(byEvictionPriority(starvedResources))

resourceToReclaim := starvedResources[0]

// determine if this is a soft or hard eviction associated with the resource