千家信息网

k8s cpu limit node节点异常 怎么办

发表于:2024-11-24 作者:千家信息网编辑
千家信息网最后更新 2024年11月24日,本篇文章为大家展示了k8s cpu limit node节点异常 怎么办,内容简明扼要并且容易理解,绝对能使你眼前一亮,通过这篇文章的详细介绍希望你能有所收获。日志信息:OCI runtime cre
千家信息网最后更新 2024年11月24日k8s cpu limit node节点异常 怎么办

本篇文章为大家展示了k8s cpu limit node节点异常 怎么办,内容简明扼要并且容易理解,绝对能使你眼前一亮,通过这篇文章的详细介绍希望你能有所收获。

日志信息:

OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:367: setting cgroup config for procHooks process caused \\\"failed to write 100000 to cpu.cfs_period_us: write /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/pod0f117614-9bae-11e9-b422-525400530fa1/fa79711ac936f4650b00bab66e21f5efb64fac642aa38e9c314385ab4023b484/cpu.cfs_period_us: invalid argument\\\"\"": unknown

从日志信息描述来看:cgroup cput 参数:cpu.cfs_period_us: invalid argument\\\"\"": unknown 有问题!

参数文件目录:/sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/pod0f117614-9bae-11e9-b422-525400530fa1/fa79711ac936f4650b00bab66e21f5efb64fac642aa38e9c314385ab4023b484/cpu.cfs_period_us

具体原因导致未知!

出现的症状:

新调度过来的pod起步来了,报上面那个错误,以前的pod好像没有问题

https://my.oschina.net/xiaominmin/blog/3068378

https://kubernetes.io/zh/docs/tasks/administer-cluster/cpu-management-policies/

https://qingwave.github.io/2019/01/09/%E6%B7%B1%E5%85%A5%E7%90%86%E8%A7%A3K8s%E8%B5%84%E6%BA%90%E9%99%90%E5%88%B6/#%E5%86%85%E5%AD%98%E9%99%90%E5%88%B6

https://www.yangcs.net/posts/understanding-resource-limits-in-kubernetes-cpu-time/

原因:内核问题

解决:node 执行脚本

#/usr/bin/python2"""Fixs a CFS bug in Container-optimized OS (COS) nodes.The bug is described at:https://docs.google.com/document/d/13KLD__6A935igLXpTFFomqfclATC89nAkhPOxsuKA0I/edit#Examples:To check script sanity in COS VM: sudo python fix-cfs.py --dry_runTo fix cfs bug in COS VM: sudo python fix-cfs.pyWhen running in a container/DaemonSet: sudo python fix-cfs.py --sys=/host/sys --dry_run sudo python fix-cfs.py --sys=/host/sys sudo python fix-cfs.py --sys=/host/sys --interval=10"""from __future__ import print_functionimport argparseimport mathimport osimport timedef _ParseCommandLine():  parser = argparse.ArgumentParser(description='Fix cfs bug.')  parser.add_argument('--dry_run', dest='dry_run', action='store_true',                      help='Whether or not to run in dry_run mode')  parser.set_defaults(dry_run=False)  parser.add_argument('--sys', type=str, default='/sys',                      help='The root directory of the /sys fs')  parser.add_argument('--interval', type=int,                      help='Seconds to wait between invocations')  return parser.parse_args()def ReadFile(directory, filename):  with open(os.path.join(directory, filename), 'r') as f:    return f.read()def WriteFile(directory, filename, number):  with open(os.path.join(directory, filename), 'w') as f:    f.write('%d' % number)def ListSubdirs(directory):  return [      os.path.join(directory, f)      for f in os.listdir(directory)      if os.path.isdir(os.path.join(directory, f))  ]def CalculateNewQuotaPeriodForPod(quota, period):  scaled_times = math.ceil(      (math.log(period) - math.log(100000)) / (math.log(147) - math.log(128)))  new_period = 100000  new_quota = math.floor(      quota * math.pow(128.0 / 147, scaled_times)) + 1 + scaled_times  return new_quota, new_perioddef CalculateNewQuotaPeriodForContainer(quota, period):  scaled_times = math.ceil(      (math.log(period) - math.log(100000)) / (math.log(147) - math.log(128)))  new_period = 100000  new_quota = math.floor(quota * math.pow(128.0 / 147, scaled_times))  new_quota = max(new_quota, 1000)  return new_quota, new_perioddef FixPodIfAffected(pod_dir, dry_run):  try:    quota = long(ReadFile(pod_dir, 'cpu.cfs_quota_us'))    period = long(ReadFile(pod_dir, 'cpu.cfs_period_us'))  except IOError:    print('Skipping pod_dir %s. The pod may have disappeared from cfs before',          ' it could be examined' % (pod_dir))    return  if quota <= 0 or period <= 0:    return  if period <= 100000:    return  print('Found a problem:')  print('pod %s has quota %d, period %d' % (pod_dir, quota, period))  new_quota, new_period = CalculateNewQuotaPeriodForPod(quota, period)  if dry_run:    print('dry_run: would fix pod %s with quota %d, period %d' %          (pod_dir, new_quota, new_period))    return  try:    WriteFile(pod_dir, 'cpu.cfs_period_us', new_period)    WriteFile(pod_dir, 'cpu.cfs_quota_us', new_quota)    print('fixed pod %s with quota %d, period %d' %          (pod_dir, new_quota, new_period))  except IOError:    print('Warning: failed to fix cfs at pod_dir %s, ',          'the directory may have disappeared.')    returndef FixContainerIfAffected(container_dir, dry_run):  try:    quota = long(ReadFile(container_dir, 'cpu.cfs_quota_us'))    period = long(ReadFile(container_dir, 'cpu.cfs_period_us'))  except IOError:    print('Skipping container_dir %s. The container may have disappeared from',          ' cfs before it could be examined' % (container_dir))    return  if quota <= 0 or period <= 0:    return  if period <= 100000:    return  print('Found a problem:')  print('container %s has quota %d, period %d' % (container_dir, quota, period))  new_quota, new_period = CalculateNewQuotaPeriodForContainer(quota, period)  if dry_run:    print('dry_run: would fix container %s with quota %d, period %d' %          (container_dir, new_quota, new_period))    return  try:    WriteFile(container_dir, 'cpu.cfs_quota_us', new_quota)    WriteFile(container_dir, 'cpu.cfs_period_us', new_period)    print('fixed container %s with quota %d, period %d' %          (container_dir, new_quota, new_period))  except IOError:    print('Warning: failed to fix cfs at container_dir %s, ',          'the directory may have disappeared.')    returndef FixAllPods(sysfs='/sys', dry_run=True):  pods_dir = os.path.join(sysfs, 'fs/cgroup/cpu/kubepods/burstable')  pod_dirs = ListSubdirs(pods_dir)  for pod_dir in pod_dirs:    FixPodIfAffected(pod_dir, dry_run)    container_dirs = ListSubdirs(pod_dir)    for container_dir in container_dirs:      FixContainerIfAffected(container_dir, dry_run)def main():  args = _ParseCommandLine()  while True:    FixAllPods(sysfs=args.sys, dry_run=args.dry_run)    if args.interval:      time.sleep(args.interval)    else:      breakif __name__ == '__main__':  main()

上述内容就是k8s cpu limit node节点异常 怎么办,你们学到知识或技能了吗?如果还想学到更多技能或者丰富自己的知识储备,欢迎关注行业资讯频道。

0