怎么理解PostgreSQL中Clock Sweep算法
发表于:2025-01-21 作者:千家信息网编辑
千家信息网最后更新 2025年01月21日,本篇内容介绍了"怎么理解PostgreSQL中Clock Sweep算法"的有关知识,在实际案例的操作过程中,不少人都会遇到这样的困境,接下来就让小编带领大家学习一下如何处理这些情况吧!希望大家仔细阅
千家信息网最后更新 2025年01月21日怎么理解PostgreSQL中Clock Sweep算法
本篇内容介绍了"怎么理解PostgreSQL中Clock Sweep算法"的有关知识,在实际案例的操作过程中,不少人都会遇到这样的困境,接下来就让小编带领大家学习一下如何处理这些情况吧!希望大家仔细阅读,能够学有所成!
一、数据结构
BufferDesc
共享缓冲区的共享描述符(状态)数据
/* * Flags for buffer descriptors * buffer描述器标记 * * Note: TAG_VALID essentially means that there is a buffer hashtable * entry associated with the buffer's tag. * 注意:TAG_VALID本质上意味着有一个与缓冲区的标记相关联的缓冲区散列表条目。 *///buffer header锁定#define BM_LOCKED (1U << 22) /* buffer header is locked *///数据需要写入(标记为DIRTY)#define BM_DIRTY (1U << 23) /* data needs writing *///数据是有效的#define BM_VALID (1U << 24) /* data is valid *///已分配buffer tag#define BM_TAG_VALID (1U << 25) /* tag is assigned *///正在R/W#define BM_IO_IN_PROGRESS (1U << 26) /* read or write in progress *///上一个I/O出现错误#define BM_IO_ERROR (1U << 27) /* previous I/O failed *///开始写则变DIRTY#define BM_JUST_DIRTIED (1U << 28) /* dirtied since write started *///存在等待sole pin的其他进程#define BM_PIN_COUNT_WAITER (1U << 29) /* have waiter for sole pin *///checkpoint发生,必须刷到磁盘上#define BM_CHECKPOINT_NEEDED (1U << 30) /* must write for checkpoint *///持久化buffer(不是unlogged或者初始化fork)#define BM_PERMANENT (1U << 31) /* permanent buffer (not unlogged, * or init fork) *//* * BufferDesc -- shared descriptor/state data for a single shared buffer. * BufferDesc -- 共享缓冲区的共享描述符(状态)数据 * * Note: Buffer header lock (BM_LOCKED flag) must be held to examine or change * the tag, state or wait_backend_pid fields. In general, buffer header lock * is a spinlock which is combined with flags, refcount and usagecount into * single atomic variable. This layout allow us to do some operations in a * single atomic operation, without actually acquiring and releasing spinlock; * for instance, increase or decrease refcount. buf_id field never changes * after initialization, so does not need locking. freeNext is protected by * the buffer_strategy_lock not buffer header lock. The LWLock can take care * of itself. The buffer header lock is *not* used to control access to the * data in the buffer! * 注意:必须持有Buffer header锁(BM_LOCKED标记)才能检查或修改tag/state/wait_backend_pid字段. * 通常来说,buffer header lock是spinlock,它与标记位/参考计数/使用计数组合到单个原子变量中. * 这个布局设计允许我们执行原子操作,而不需要实际获得或者释放spinlock(比如,增加或者减少参考计数). * buf_id字段在初始化后不会出现变化,因此不需要锁定. * freeNext通过buffer_strategy_lock锁而不是buffer header lock保护. * LWLock可以很好的处理自己的状态. * 务请注意的是:buffer header lock不用于控制buffer中的数据访问! * * It's assumed that nobody changes the state field while buffer header lock * is held. Thus buffer header lock holder can do complex updates of the * state variable in single write, simultaneously with lock release (cleaning * BM_LOCKED flag). On the other hand, updating of state without holding * buffer header lock is restricted to CAS, which insure that BM_LOCKED flag * is not set. Atomic increment/decrement, OR/AND etc. are not allowed. * 假定在持有buffer header lock的情况下,没有人改变状态字段. * 持有buffer header lock的进程可以执行在单个写操作中执行复杂的状态变量更新, * 同步的释放锁(清除BM_LOCKED标记). * 换句话说,如果没有持有buffer header lock的状态更新,会受限于CAS, * 这种情况下确保BM_LOCKED没有被设置. * 比如原子的增加/减少(AND/OR)等操作是不允许的. * * An exception is that if we have the buffer pinned, its tag can't change * underneath us, so we can examine the tag without locking the buffer header. * Also, in places we do one-time reads of the flags without bothering to * lock the buffer header; this is generally for situations where we don't * expect the flag bit being tested to be changing. * 一种例外情况是如果我们已有buffer pinned,该buffer的tag不能改变(在本进程之下), * 因此不需要锁定buffer header就可以检查tag了. * 同时,在执行一次性的flags读取时不需要锁定buffer header. * 这种情况通常用于我们不希望正在测试的flag bit将被改变. * * We can't physically remove items from a disk page if another backend has * the buffer pinned. Hence, a backend may need to wait for all other pins * to go away. This is signaled by storing its own PID into * wait_backend_pid and setting flag bit BM_PIN_COUNT_WAITER. At present, * there can be only one such waiter per buffer. * 如果其他进程有buffer pinned,那么进程不能物理的从磁盘页面中删除items. * 因此,后台进程需要等待其他pins清除.这可以通过存储它自己的PID到wait_backend_pid中, * 并设置标记位BM_PIN_COUNT_WAITER. * 目前,每个缓冲区只能由一个等待进程. * * We use this same struct for local buffer headers, but the locks are not * used and not all of the flag bits are useful either. To avoid unnecessary * overhead, manipulations of the state field should be done without actual * atomic operations (i.e. only pg_atomic_read_u32() and * pg_atomic_unlocked_write_u32()). * 本地缓冲头部使用同样的结构,但并不需要使用locks,而且并不是所有的标记位都使用. * 为了避免不必要的负载,状态域的维护不需要实际的原子操作 * (比如只有pg_atomic_read_u32() and pg_atomic_unlocked_write_u32()) * * Be careful to avoid increasing the size of the struct when adding or * reordering members. Keeping it below 64 bytes (the most common CPU * cache line size) is fairly important for performance. * 在增加或者记录成员变量时,小心避免增加结构体的大小. * 保持结构体大小在64字节内(通常的CPU缓存线大小)对于性能是非常重要的. */typedef struct BufferDesc{ //buffer tag BufferTag tag; /* ID of page contained in buffer */ //buffer索引编号(0开始),指向相应的buffer pool slot int buf_id; /* buffer's index number (from 0) */ /* state of the tag, containing flags, refcount and usagecount */ //tag状态,包括flags/refcount和usagecount pg_atomic_uint32 state; //pin-count等待进程ID int wait_backend_pid; /* backend PID of pin-count waiter */ //空闲链表链中下一个空闲的buffer int freeNext; /* link in freelist chain */ //缓冲区内容锁 LWLock content_lock; /* to lock access to buffer contents */} BufferDesc;
BufferTag
Buffer tag标记了buffer存储的是磁盘中哪个block
/* * Buffer tag identifies which disk block the buffer contains. * Buffer tag标记了buffer存储的是磁盘中哪个block * * Note: the BufferTag data must be sufficient to determine where to write the * block, without reference to pg_class or pg_tablespace entries. It's * possible that the backend flushing the buffer doesn't even believe the * relation is visible yet (its xact may have started before the xact that * created the rel). The storage manager must be able to cope anyway. * 注意:BufferTag必须足以确定如何写block而不需要参照pg_class或者pg_tablespace数据字典信息. * 有可能后台进程在刷新缓冲区的时候深圳不相信关系是可见的(事务可能在创建rel的事务之前). * 存储管理器必须可以处理这些事情. * * Note: if there's any pad bytes in the struct, INIT_BUFFERTAG will have * to be fixed to zero them, since this struct is used as a hash key. * 注意:如果在结构体中有填充的字节,INIT_BUFFERTAG必须将它们固定为零,因为这个结构体用作散列键. */typedef struct buftag{ //物理relation标识符 RelFileNode rnode; /* physical relation identifier */ ForkNumber forkNum; //相对于relation起始的块号 BlockNumber blockNum; /* blknum relative to begin of reln */} BufferTag;
二、源码解读
算法思想,在 Buffer Manager 中已有详细介绍,摘录如下:
WHILE true(1) Obtain the candidate buffer descriptor pointed by the nextVictimBuffer(2) IF the candidate descriptor is unpinned THEN(3) IF the candidate descriptor's usage_count == 0 THEN BREAK WHILE LOOP /* the corresponding slot of this descriptor is victim slot. */ ELSE Decrease the candidate descriptpor's usage_count by 1 END IF END IF(4) Advance nextVictimBuffer to the next one END WHILE (5) RETURN buffer_id of the victim
算法思想朴素且简单,比起高大上的LRU算法要简单多了.
nextVictimBuffer可视为时钟的时针,把缓冲区视为环形缓冲区,时针循环周而往复的循环,如缓冲区满足unpinned(ref_count == 0) && usage_count == 0的条件,则选中该缓冲,否则,usage_count减一,继续循环,直至找到满足条件的buffer为止.选中的buffer一定是buffers中较少使用的那个.
StrategyGetBuffer中的代码片段:
... /* Nothing on the freelist, so run the "clock sweep" algorithm */ //空闲链表中找不到或者满足不了条件,则执行"clock sweep"算法 //int NBuffers = 1000; trycounter = NBuffers;//尝试次数 for (;;) { //------- 循环 //获取buffer描述符 buf = GetBufferDescriptor(ClockSweepTick()); /* * If the buffer is pinned or has a nonzero usage_count, we cannot use * it; decrement the usage_count (unless pinned) and keep scanning. * 如果buffer已pinned,或者有一个非零值的usage_count,不能使用这个buffer. * 减少usage_count(除非已pinned)继续扫描. */ local_buf_state = LockBufHdr(buf); if (BUF_STATE_GET_REFCOUNT(local_buf_state) == 0) { //----- refcount == 0 if (BUF_STATE_GET_USAGECOUNT(local_buf_state) != 0) { //usage_count <> 0 //usage_count - 1 local_buf_state -= BUF_USAGECOUNT_ONE; //重置尝试次数 trycounter = NBuffers; } else { //usage_count = 0 /* Found a usable buffer */ //发现一个可用的buffer if (strategy != NULL) //添加到该策略的环形缓冲区中 AddBufferToRing(strategy, buf); //输出参数赋值 *buf_state = local_buf_state; //返回buf return buf; } } else if (--trycounter == 0) { //----- refcount <> 0 && --trycounter == 0 /* * We've scanned all the buffers without making any state changes, * so all the buffers are pinned (or were when we looked at them). * We could hope that someone will free one eventually, but it's * probably better to fail than to risk getting stuck in an * infinite loop. * 在没有改变任何状态的情况,我们已经完成了所有buffers的遍历, * 因此所有的buffers已pinned(或者在搜索的时候pinned). * 我们希望某些进程会周期性的释放buffer,但如果实在拿不到,那报错总比傻傻的死循环要好. */ UnlockBufHdr(buf, local_buf_state); elog(ERROR, "no unpinned buffers available"); } //解锁buffer header UnlockBufHdr(buf, local_buf_state); }
ClockSweepTick()函数是StrategyGetBuffer()的辅助过程,移动时针(当前位置往前一格),返回时针指向的buffer.
其实现逻辑如下:
/* * ClockSweepTick - Helper routine for StrategyGetBuffer() * ClockSweepTick - StrategyGetBuffer()的辅助过程 * * Move the clock hand one buffer ahead of its current position and return the * id of the buffer now under the hand. * 移动时针(当前位置往前一格),返回时针指向的buffer. */static inline uint32ClockSweepTick(void){ uint32 victim; /* * Atomically move hand ahead one buffer - if there's several processes * doing this, this can lead to buffers being returned slightly out of * apparent order. * 原子移动时针一格 * - 如果有多个进程执行这个操作,这可能会导致缓冲返回的顺序稍微有些混乱. * */ victim = pg_atomic_fetch_add_u32(&StrategyControl->nextVictimBuffer, 1); if (victim >= NBuffers) { //-------- 候选buffer大于NBuffers //记录元素的victim uint32 originalVictim = victim; /* always wrap what we look up in BufferDescriptors */ //回卷BufferDescriptors中查找的内容 victim = victim % NBuffers; /* * If we're the one that just caused a wraparound, force * completePasses to be incremented while holding the spinlock. We * need the spinlock so StrategySyncStart() can return a consistent * value consisting of nextVictimBuffer and completePasses. * 如果我们正好导致了wraparound,在持有自旋锁的情况下强制completePasses增加. * 之所以需要自旋锁是因为StrategySyncStart()才能返回nextVictimBuffer和completePasses的一致性值. */ if (victim == 0) { //出现回卷 uint32 expected;//期望值(不考虑回卷) uint32 wrapped;//回卷后的值 bool success = false;//成功标记 //期望值 expected = originalVictim + 1; while (!success) { /* * Acquire the spinlock while increasing completePasses. That * allows other readers to read nextVictimBuffer and * completePasses in a consistent manner which is required for * StrategySyncStart(). In theory delaying the increment * could lead to an overflow of nextVictimBuffers, but that's * highly unlikely and wouldn't be particularly harmful. * 在增加completePasses时请求获取自旋锁. * 这样可以让其他读取进程以一致性的方式读取nextVictimBuffer和completePasses, * 这种一致性的方式在StrategySyncStart()中是需要的. * 理论上来说,延迟增加可能会导致nextVictimBuffers溢出, * 但但这是非常不可能的,也不会特别有害。 */ SpinLockAcquire(&StrategyControl->buffer_strategy_lock); //获取回卷数 wrapped = expected % NBuffers; //原子比较并交换数字 success = pg_atomic_compare_exchange_u32(&StrategyControl->nextVictimBuffer, &expected, wrapped); if (success) //如成功,则增加计数 StrategyControl->completePasses++; //释放自旋锁 SpinLockRelease(&StrategyControl->buffer_strategy_lock); } } } //返回victim return victim;}/* * pg_atomic_compare_exchange_u32 - CAS operation *pg_atomic_compare_exchange_u32 - CAS操作 * * Atomically compare the current value of ptr with *expected and store newval * iff ptr and *expected have the same value. The current value of *ptr will * always be stored in *expected. * 原子比较ptr当前值与*expected,如果ptr和*expected值一致,则存储newval到ptr中. * *ptr的当前值通常会存储在*expected中. * * Return true if values have been exchanged, false otherwise. * 如值已交换,则返回T,否则返回F. * * Full barrier semantics. * 完整的屏障语义. */static inline boolpg_atomic_compare_exchange_u32(volatile pg_atomic_uint32 *ptr, uint32 *expected, uint32 newval){ AssertPointerAlignment(ptr, 4); AssertPointerAlignment(expected, 4); return pg_atomic_compare_exchange_u32_impl(ptr, expected, newval);}boolpg_atomic_compare_exchange_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 *expected, uint32 newval){ bool ret; /* * Do atomic op under a spinlock. It might look like we could just skip * the cmpxchg if the lock isn't available, but that'd just emulate a * 'weak' compare and swap. I.e. one that allows spurious failures. Since * several algorithms rely on a strong variant and that is efficiently * implementable on most major architectures let's emulate it here as * well. * 在自旋锁保护下执行原子操作. * 这看起来如果锁不可能的话,我们可以跳过cmpxchg,但这只是模拟了一个"浅"比较和交换. * 比如,这会引起spurious failures. * 由于有几种算法依赖于一种强大的变体,而且这种变体可以在大多数主要架构上有效地实现, * 因此我们在这里也对其进行模拟。 */ SpinLockAcquire((slock_t *) &ptr->sema); /* perform compare/exchange logic */ //执行比较/交换逻辑 ret = ptr->value == *expected;//ptr与*expected是否一致? *expected = ptr->value;//*expected赋值为ptr if (ret) ptr->value = newval;//值一致,则ptr设置为新值 /* and release lock */ //释放自旋锁 SpinLockRelease((slock_t *) &ptr->sema); //返回结果 return ret;}
"怎么理解PostgreSQL中Clock Sweep算法"的内容就介绍到这里了,感谢大家的阅读。如果想了解更多行业相关的知识可以关注网站,小编将为大家输出更多高质量的实用文章!
缓冲
进程
标记
缓冲区
状态
原子
算法
情况
数据
时针
一致
结构
存储
循环
内容
磁盘
一致性
变量
大小
字段
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
网络安全与管理总结
数据库原理数据独立性
网络技术有多发达
esix虚拟服务器搭建
工业触摸屏如何调取数据库
网络安全事件应急处置方法
怎么选择服务器
幼儿园网络安全主题教育视频中班
网络安全和网络文明
软件开发框架作用
大学生自学数据库难吗
护苗网络安全知多少手抄报
我的世界幸运石服务器插件
sql数据库怎么建立关系
公共服务器和管理系统的区别
计算机三级网络技术分享
浙江企朋网络技术股份有限
淄博软件开发前景
网络安全会议总结讲话
scratch 数据库结合
手机软件开发底层
软件开发文档国标
数据库匹配表vb
奉贤区项目数据库服务活动简介
南京正规软件开发服务保障
数据库检查点技术名词解释
数据库安全策略什么意思
邯郸c语言软件开发价位
办公电脑网络安全自查表
抖音换头像显示服务器升级怎么办