千家信息网

PostgreSQL中BufferAlloc函数有什么作用

发表于:2025-02-23 作者:千家信息网编辑
千家信息网最后更新 2025年02月23日,这篇文章主要介绍"PostgreSQL中BufferAlloc函数有什么作用",在日常操作中,相信很多人在PostgreSQL中BufferAlloc函数有什么作用问题上存在疑惑,小编查阅了各式资料,
千家信息网最后更新 2025年02月23日PostgreSQL中BufferAlloc函数有什么作用

这篇文章主要介绍"PostgreSQL中BufferAlloc函数有什么作用",在日常操作中,相信很多人在PostgreSQL中BufferAlloc函数有什么作用问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答"PostgreSQL中BufferAlloc函数有什么作用"的疑惑有所帮助!接下来,请跟着小编一起来学习吧!

一、数据结构

BufferDesc
共享缓冲区的共享描述符(状态)数据

/* * Flags for buffer descriptors * buffer描述器标记 * * Note: TAG_VALID essentially means that there is a buffer hashtable * entry associated with the buffer's tag. * 注意:TAG_VALID本质上意味着有一个与缓冲区的标记相关联的缓冲区散列表条目。 *///buffer header锁定#define BM_LOCKED               (1U << 22)  /* buffer header is locked *///数据需要写入(标记为DIRTY)#define BM_DIRTY                (1U << 23)  /* data needs writing *///数据是有效的#define BM_VALID                (1U << 24)  /* data is valid *///已分配buffer tag#define BM_TAG_VALID            (1U << 25)  /* tag is assigned *///正在R/W#define BM_IO_IN_PROGRESS       (1U << 26)  /* read or write in progress *///上一个I/O出现错误#define BM_IO_ERROR             (1U << 27)  /* previous I/O failed *///开始写则变DIRTY#define BM_JUST_DIRTIED         (1U << 28)  /* dirtied since write started *///存在等待sole pin的其他进程#define BM_PIN_COUNT_WAITER     (1U << 29)  /* have waiter for sole pin *///checkpoint发生,必须刷到磁盘上#define BM_CHECKPOINT_NEEDED    (1U << 30)  /* must write for checkpoint *///持久化buffer(不是unlogged或者初始化fork)#define BM_PERMANENT            (1U << 31)  /* permanent buffer (not unlogged,                                             * or init fork) *//* *  BufferDesc -- shared descriptor/state data for a single shared buffer. *  BufferDesc -- 共享缓冲区的共享描述符(状态)数据 * * Note: Buffer header lock (BM_LOCKED flag) must be held to examine or change * the tag, state or wait_backend_pid fields.  In general, buffer header lock * is a spinlock which is combined with flags, refcount and usagecount into * single atomic variable.  This layout allow us to do some operations in a * single atomic operation, without actually acquiring and releasing spinlock; * for instance, increase or decrease refcount.  buf_id field never changes * after initialization, so does not need locking.  freeNext is protected by * the buffer_strategy_lock not buffer header lock.  The LWLock can take care * of itself.  The buffer header lock is *not* used to control access to the * data in the buffer! * 注意:必须持有Buffer header锁(BM_LOCKED标记)才能检查或修改tag/state/wait_backend_pid字段. * 通常来说,buffer header lock是spinlock,它与标记位/参考计数/使用计数组合到单个原子变量中. * 这个布局设计允许我们执行原子操作,而不需要实际获得或者释放spinlock(比如,增加或者减少参考计数). * buf_id字段在初始化后不会出现变化,因此不需要锁定. * freeNext通过buffer_strategy_lock锁而不是buffer header lock保护. * LWLock可以很好的处理自己的状态. * 务请注意的是:buffer header lock不用于控制buffer中的数据访问! * * It's assumed that nobody changes the state field while buffer header lock * is held.  Thus buffer header lock holder can do complex updates of the * state variable in single write, simultaneously with lock release (cleaning * BM_LOCKED flag).  On the other hand, updating of state without holding * buffer header lock is restricted to CAS, which insure that BM_LOCKED flag * is not set.  Atomic increment/decrement, OR/AND etc. are not allowed. * 假定在持有buffer header lock的情况下,没有人改变状态字段. * 持有buffer header lock的进程可以执行在单个写操作中执行复杂的状态变量更新, *   同步的释放锁(清除BM_LOCKED标记). * 换句话说,如果没有持有buffer header lock的状态更新,会受限于CAS, *   这种情况下确保BM_LOCKED没有被设置. * 比如原子的增加/减少(AND/OR)等操作是不允许的. * * An exception is that if we have the buffer pinned, its tag can't change * underneath us, so we can examine the tag without locking the buffer header. * Also, in places we do one-time reads of the flags without bothering to * lock the buffer header; this is generally for situations where we don't * expect the flag bit being tested to be changing. * 一种例外情况是如果我们已有buffer pinned,该buffer的tag不能改变(在本进程之下), *   因此不需要锁定buffer header就可以检查tag了. * 同时,在执行一次性的flags读取时不需要锁定buffer header. * 这种情况通常用于我们不希望正在测试的flag bit将被改变. * * We can't physically remove items from a disk page if another backend has * the buffer pinned.  Hence, a backend may need to wait for all other pins * to go away.  This is signaled by storing its own PID into * wait_backend_pid and setting flag bit BM_PIN_COUNT_WAITER.  At present, * there can be only one such waiter per buffer. * 如果其他进程有buffer pinned,那么进程不能物理的从磁盘页面中删除items. * 因此,后台进程需要等待其他pins清除.这可以通过存储它自己的PID到wait_backend_pid中, *   并设置标记位BM_PIN_COUNT_WAITER. * 目前,每个缓冲区只能由一个等待进程. * * We use this same struct for local buffer headers, but the locks are not * used and not all of the flag bits are useful either. To avoid unnecessary * overhead, manipulations of the state field should be done without actual * atomic operations (i.e. only pg_atomic_read_u32() and * pg_atomic_unlocked_write_u32()). * 本地缓冲头部使用同样的结构,但并不需要使用locks,而且并不是所有的标记位都使用. * 为了避免不必要的负载,状态域的维护不需要实际的原子操作 * (比如只有pg_atomic_read_u32() and pg_atomic_unlocked_write_u32()) * * Be careful to avoid increasing the size of the struct when adding or * reordering members.  Keeping it below 64 bytes (the most common CPU * cache line size) is fairly important for performance. * 在增加或者记录成员变量时,小心避免增加结构体的大小. * 保持结构体大小在64字节内(通常的CPU缓存线大小)对于性能是非常重要的. */typedef struct BufferDesc{    //buffer tag    BufferTag   tag;            /* ID of page contained in buffer */    //buffer索引编号(0开始)    int         buf_id;         /* buffer's index number (from 0) */    /* state of the tag, containing flags, refcount and usagecount */    //tag状态,包括flags/refcount和usagecount    pg_atomic_uint32 state;    //pin-count等待进程ID    int         wait_backend_pid;   /* backend PID of pin-count waiter */    //空闲链表链中下一个空闲的buffer    int         freeNext;       /* link in freelist chain */    //缓冲区内容锁    LWLock      content_lock;   /* to lock access to buffer contents */} BufferDesc;

BufferTag
Buffer tag标记了buffer存储的是磁盘中哪个block

/* * Buffer tag identifies which disk block the buffer contains. * Buffer tag标记了buffer存储的是磁盘中哪个block * * Note: the BufferTag data must be sufficient to determine where to write the * block, without reference to pg_class or pg_tablespace entries.  It's * possible that the backend flushing the buffer doesn't even believe the * relation is visible yet (its xact may have started before the xact that * created the rel).  The storage manager must be able to cope anyway. * 注意:BufferTag必须足以确定如何写block而不需要参照pg_class或者pg_tablespace数据字典信息. * 有可能后台进程在刷新缓冲区的时候深圳不相信关系是可见的(事务可能在创建rel的事务之前). * 存储管理器必须可以处理这些事情. * * Note: if there's any pad bytes in the struct, INIT_BUFFERTAG will have * to be fixed to zero them, since this struct is used as a hash key. * 注意:如果在结构体中有填充的字节,INIT_BUFFERTAG必须将它们固定为零,因为这个结构体用作散列键. */typedef struct buftag{    //物理relation标识符    RelFileNode rnode;          /* physical relation identifier */    ForkNumber  forkNum;    //相对于relation起始的块号    BlockNumber blockNum;       /* blknum relative to begin of reln */} BufferTag;

SMgrRelation
smgr.c维护一个包含SMgrRelation对象的hash表,SMgrRelation对象本质上是缓存的文件句柄.

/* * smgr.c maintains a table of SMgrRelation objects, which are essentially * cached file handles.  An SMgrRelation is created (if not already present) * by smgropen(), and destroyed by smgrclose().  Note that neither of these * operations imply I/O, they just create or destroy a hashtable entry. * (But smgrclose() may release associated resources, such as OS-level file * descriptors.) * smgr.c维护一个包含SMgrRelation对象的hash表,SMgrRelation对象本质上是缓存的文件句柄. * SMgrRelation对象(如非现成)通过smgropen()方法创建,通过smgrclose()方法销毁. * 注意:这些操作都不会执行I/O操作,只会创建或者销毁哈希表条目. * (但是smgrclose()方法可能会释放相关的资源,比如OS基本的文件描述符) * * An SMgrRelation may have an "owner", which is just a pointer to it from * somewhere else; smgr.c will clear this pointer if the SMgrRelation is * closed.  We use this to avoid dangling pointers from relcache to smgr * without having to make the smgr explicitly aware of relcache.  There * can't be more than one "owner" pointer per SMgrRelation, but that's * all we need. * SMgrRelation可能会有"宿主",这个宿主可能只是从某个地方指向它的指针而已; * 如SMgrRelationsmgr.c会清除该指针.这样做可以避免从relcache到smgr的悬空指针, *   而不必要让smgr显式的感知relcache(也就是隔离了smgr了relcache). * 每个SMgrRelation不能跟多个"owner"指针关联,但这就是我们所需要的. * * SMgrRelations that do not have an "owner" are considered to be transient, * and are deleted at end of transaction. * SMgrRelations如无owner指针,则被视为临时对象,在事务的最后被删除.  */typedef struct SMgrRelationData{    /* rnode is the hashtable lookup key, so it must be first! */    //-------- rnode是哈希表的搜索键,因此在结构体的首位    //关系物理定义ID    RelFileNodeBackend smgr_rnode;  /* relation physical identifier */    /* pointer to owning pointer, or NULL if none */    //--------- 指向拥有的指针,如无则为NULL    struct SMgrRelationData **smgr_owner;    /*     * These next three fields are not actually used or manipulated by smgr,     * except that they are reset to InvalidBlockNumber upon a cache flush     * event (in particular, upon truncation of the relation).  Higher levels     * store cached state here so that it will be reset when truncation     * happens.  In all three cases, InvalidBlockNumber means "unknown".     * 接下来的3个字段实际上并不用于或者由smgr管理,     *   除非这些表里在cache flush event发生时被重置为InvalidBlockNumber     *   (特别是在关系被截断时).     * 在这里,更高层的存储缓存了状态因此在截断发生时会被重置.     * 在这3种情况下,InvalidBlockNumber都意味着"unknown".     */    //当前插入的目标bloc    BlockNumber smgr_targblock; /* current insertion target block */    //最后已知的fsm fork大小    BlockNumber smgr_fsm_nblocks;   /* last known size of fsm fork */    //最后已知的vm fork大小    BlockNumber smgr_vm_nblocks;    /* last known size of vm fork */    /* additional public fields may someday exist here */    //------- 未来可能新增的公共域    /*     * Fields below here are intended to be private to smgr.c and its     * submodules.  Do not touch them from elsewhere.     * 下面的字段是smgr.c及其子模块私有的,不要从其他模块接触这些字段.     */    //存储管理器选择器    int         smgr_which;     /* storage manager selector */    /*     * for md.c; per-fork arrays of the number of open segments     * (md_num_open_segs) and the segments themselves (md_seg_fds).     * 用于md.c,打开段(md_num_open_segs)和段自身(md_seg_fds)的数组(每个fork一个)     */    int         md_num_open_segs[MAX_FORKNUM + 1];    struct _MdfdVec *md_seg_fds[MAX_FORKNUM + 1];    /* if unowned, list link in list of all unowned SMgrRelations */    //如没有宿主,未宿主的SMgrRelations链表的链表链接.    struct SMgrRelationData *next_unowned_reln;} SMgrRelationData;typedef SMgrRelationData *SMgrRelation;

RelFileNodeBackend
组合relfilenode和后台进程ID,用于提供需要定位物理存储的所有信息.

/* * Augmenting a relfilenode with the backend ID provides all the information * we need to locate the physical storage.  The backend ID is InvalidBackendId * for regular relations (those accessible to more than one backend), or the * owning backend's ID for backend-local relations.  Backend-local relations * are always transient and removed in case of a database crash; they are * never WAL-logged or fsync'd. * 组合relfilenode和后台进程ID,用于提供需要定位物理存储的所有信息. * 对于普通的关系(可通过多个后台进程访问),后台进程ID是InvalidBackendId; * 如为临时表,则为自己的后台进程ID. * 临时表(backend-local relations)通常是临时存在的,在数据库崩溃时删除,无需WAL-logged或者fsync. */typedef struct RelFileNodeBackend{    RelFileNode node;//节点    BackendId   backend;//后台进程} RelFileNodeBackend;

二、源码解读

BufferAlloc是ReadBuffer的子过程.处理共享缓存的搜索.如果已无buffer可用,则选择一个可替换的buffer并删除旧页面,但注意不要读入新页面.
该函数的主要处理逻辑如下:
1.初始化,根据Tag确定hash值和分区锁定ID
2.检查block是否已在buffer pool中
3.在缓冲区中找到该buffer(buf_id >= 0)
3.1获取buffer描述符并Pin buffer
3.2如PinBuffer返回F,则执行StartBufferIO,如该函数返回F,则设置标记*foundPtr为F
3.3返回buf
4.在缓冲区中找不到该buffer(buf_id < 0)
4.1释放newPartitionLock
4.2执行循环,寻找合适的buffer
4.2.1确保在自旋锁尚未持有时,有一个空闲的refcount入口(条目)
4.2.2选择一个待淘汰的buffer
4.2.3拷贝buffer flags到oldFlags中
4.2.4Pin buffer,然后释放buffer自旋锁
4.2.5如buffer标记位BM_DIRTY,FlushBuffer
4.2.6如buffer标记为BM_TAG_VALID,计算原tag的hashcode和partition lock ID,并锁定新旧分区锁
否则需要新的分区,锁定新分区锁,重置原分区锁和原hash值
4.2.7尝试使用buffer新的tag构造hash表入口
4.2.8存在冲突(buf_id >= 0),在这里只需要像一开始处理的那样,视为已在缓冲池发现该buffer
4.2.9不存在冲突(buf_id < 0),锁定buffer header,如缓冲区没有变脏或者被pinned,则已找到buf,跳出循环
否则,解锁buffer header,删除hash表入口,释放锁,重新寻找buffer
4.3可以重新设置buffer tag,完成后解锁buffer header,删除原有的hash表入口,释放分区锁
4.4执行StartBufferIO,设置*foundPtr标记
4.5返回buf

/* * BufferAlloc -- subroutine for ReadBuffer.  Handles lookup of a shared *      buffer.  If no buffer exists already, selects a replacement *      victim and evicts the old page, but does NOT read in new page. * BufferAlloc -- ReadBuffer的子过程.处理共享缓存的搜索. *      如果已无buffer可用,则选择一个可替换的buffer并删除旧页面,但注意不要读入新页面. * * "strategy" can be a buffer replacement strategy object, or NULL for * the default strategy.  The selected buffer's usage_count is advanced when * using the default strategy, but otherwise possibly not (see PinBuffer). * "strategy"可以是缓存替换策略对象,如为默认策略,则为NULL. * 如使用默认读取策略,则选中的缓冲buffer的usage_count会加一,但也可能不会增加(详细参见PinBuffer). * * The returned buffer is pinned and is already marked as holding the * desired page.  If it already did have the desired page, *foundPtr is * set true.  Otherwise, *foundPtr is set false and the buffer is marked * as IO_IN_PROGRESS; ReadBuffer will now need to do I/O to fill it. * 返回的buffer已pinned并已标记为持有指定的页面. * 如果确实已持有指定的页面,*foundPtr设置为T. * 否则的话,*foundPtr设置为F,buffer标记为IO_IN_PROGRESS,ReadBuffer将会执行I/O操作. * * *foundPtr is actually redundant with the buffer's BM_VALID flag, but * we keep it for simplicity in ReadBuffer. * *foundPtr跟buffer的BM_VALID标记是重复的,但为了ReadBuffer中的简化,仍然保持这个参数. * * No locks are held either at entry or exit. * 在进入或者退出的时候,不需要持有任何的Locks. */static BufferDesc *BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,            BlockNumber blockNum,            BufferAccessStrategy strategy,            bool *foundPtr){    //请求block的ID    BufferTag   newTag;         /* identity of requested block */    //newTag的Hash值    uint32      newHash;        /* hash value for newTag */    //缓冲区分区锁    LWLock     *newPartitionLock;   /* buffer partition lock for it */    //选中缓冲区对应的上一个ID    BufferTag   oldTag;         /* previous identity of selected buffer */    //oldTag的hash值    uint32      oldHash;        /* hash value for oldTag */    //原缓冲区分区锁    LWLock     *oldPartitionLock;   /* buffer partition lock for it */    //原标记位    uint32      oldFlags;    //buffer ID编号    int         buf_id;    //buffer描述符    BufferDesc *buf;    //是否有效    bool        valid;    //buffer状态    uint32      buf_state;    /* create a tag so we can lookup the buffer */    //创建一个tag,用于检索buffer    INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);    /* determine its hash code and partition lock ID */    //根据Tag确定hash值和分区锁定ID    newHash = BufTableHashCode(&newTag);    newPartitionLock = BufMappingPartitionLock(newHash);    /* see if the block is in the buffer pool already */    //检查block是否已在buffer pool中    LWLockAcquire(newPartitionLock, LW_SHARED);    buf_id = BufTableLookup(&newTag, newHash);    if (buf_id >= 0)    {        //---- 在缓冲区中找到该buffer        /*         * Found it.  Now, pin the buffer so no one can steal it from the         * buffer pool, and check to see if the correct data has been loaded         * into the buffer.         * 找到了!现在pin缓冲区,确保没有进程可以从缓冲区中删除         *   检查正确的数据是否已装载到缓冲区中.         */        buf = GetBufferDescriptor(buf_id);        //Pin缓冲区        valid = PinBuffer(buf, strategy);        /* Can release the mapping lock as soon as we've pinned it */        //一旦pinned,立即释放newPartitionLock        LWLockRelease(newPartitionLock);        //设置返回参数        *foundPtr = true;        if (!valid)        {            //如无效            /*             * We can only get here if (a) someone else is still reading in             * the page, or (b) a previous read attempt failed.  We have to             * wait for any active read attempt to finish, and then set up our             * own read attempt if the page is still not BM_VALID.             * StartBufferIO does it all.             * 程序执行到这里原因是(a)有其他进程仍然读入了该page,或者(b)上一次读取尝试失败.             * 在这里必须等到其他活动的读取完成,然后在page状态仍然不是BM_VALID时设置读取尝试.             * StartBufferIO过程执行这些工作.             */            if (StartBufferIO(buf, true))            {                /*                 * If we get here, previous attempts to read the buffer must                 * have failed ... but we shall bravely try again.                 */                //上一次尝试读取已然失败,这里还是需要勇敢的再试一次!                *foundPtr = false;//设置为F            }        }        //返回buf        return buf;    }    /*     * Didn't find it in the buffer pool.  We'll have to initialize a new     * buffer.  Remember to unlock the mapping lock while doing the work.     * 没有在缓冲池中发现该buffer.     * 这时候不得不初始化一个buffer.     * 记住:在执行工作的时候,记得首先解锁mapping lock.     */    LWLockRelease(newPartitionLock);    /* Loop here in case we have to try another victim buffer */    //循环,寻找合适的buffer    for (;;)    {        /*         * Ensure, while the spinlock's not yet held, that there's a free         * refcount entry.         * 确保在自旋锁尚未持有时,有一个空闲的refcount入口(条目).         */        ReservePrivateRefCountEntry();        /*         * Select a victim buffer.  The buffer is returned with its header         * spinlock still held!         * 选择一个待淘汰的buffer.         * 返回的buffer,仍然持有其header的自旋锁.         */        buf = StrategyGetBuffer(strategy, &buf_state);        Assert(BUF_STATE_GET_REFCOUNT(buf_state) == 0);        /* Must copy buffer flags while we still hold the spinlock */        //在仍持有自旋锁的情况下必须拷贝buffer flags        oldFlags = buf_state & BUF_FLAG_MASK;        /* Pin the buffer and then release the buffer spinlock */        //Pin buffer,然后释放buffer自旋锁        PinBuffer_Locked(buf);        /*         * If the buffer was dirty, try to write it out.  There is a race         * condition here, in that someone might dirty it after we released it         * above, or even while we are writing it out (since our share-lock         * won't prevent hint-bit updates).  We will recheck the dirty bit         * after re-locking the buffer header.         * 如果buffer已脏,尝试刷新到磁盘上.         * 这里有一个竞争条件,那就是某些进程可能在我们在上面释放它(或者甚至在我们正在刷新时)之后使该缓冲区变脏.         * 在再次锁定buffer header后,我们会重新检查相应的dirty标记位.           */        if (oldFlags & BM_DIRTY)        {            /*             * We need a share-lock on the buffer contents to write it out             * (else we might write invalid data, eg because someone else is             * compacting the page contents while we write).  We must use a             * conditional lock acquisition here to avoid deadlock.  Even             * though the buffer was not pinned (and therefore surely not             * locked) when StrategyGetBuffer returned it, someone else could             * have pinned and exclusive-locked it by the time we get here. If             * we try to get the lock unconditionally, we'd block waiting for             * them; if they later block waiting for us, deadlock ensues.             * (This has been observed to happen when two backends are both             * trying to split btree index pages, and the second one just             * happens to be trying to split the page the first one got from             * StrategyGetBuffer.)             * 需要持有buffer内容的共享锁来刷出该缓冲区.             * (否则的话,我们可能会写入无效的数据,原因比如是其他进程在我们写入时压缩page).             * 在这里,必须使用条件锁来避免死锁.             * 在StrategyGetBuffer返回时虽然buffer尚未pinned,             *   其他进程可能已经pinned该buffer并且同时已持有独占锁.             * 如果我们尝试无条件的锁定,那么因为等待而阻塞.其他进程稍后又会等待本进程,那么死锁就会发生.             * (在实际中,两个后台进程在尝试分裂B树索引pages,             *  而第二个正好尝试分裂第一个进程通过StrategyGetBuffer获取的page时,会发生这种情况).             */            if (LWLockConditionalAcquire(BufferDescriptorGetContentLock(buf),                                         LW_SHARED))            {                //---- 执行有条件锁定请求(buffer内容共享锁)                /*                 * If using a nondefault strategy, and writing the buffer                 * would require a WAL flush, let the strategy decide whether                 * to go ahead and write/reuse the buffer or to choose another                 * victim.  We need lock to inspect the page LSN, so this                 * can't be done inside StrategyGetBuffer.                 * 如使用非默认的策略,则写缓冲会请求WAL flush,让策略确定如何继续以及写入/重用                 *   缓冲或者选择另外一个待淘汰的buffer.                 * 我们需要锁定,检查page的LSN,因此不能在StrategyGetBuffer中完成.                 */                if (strategy != NULL)                {                    //非默认策略                    XLogRecPtr  lsn;                    /* Read the LSN while holding buffer header lock */                    //在持有buffer header lock时读取LSN                    buf_state = LockBufHdr(buf);                    lsn = BufferGetLSN(buf);                    UnlockBufHdr(buf, buf_state);                    if (XLogNeedsFlush(lsn) &&                        StrategyRejectBuffer(strategy, buf))                    {                        //需要flush WAL并且StrategyRejectBuffer                        /* Drop lock/pin and loop around for another buffer */                        //清除lock/pin并循环到另外一个buffer                        LWLockRelease(BufferDescriptorGetContentLock(buf));                        UnpinBuffer(buf, true);                        continue;                    }                }                /* OK, do the I/O */                //现在可以执行I/O了                TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_START(forkNum, blockNum,                                                          smgr->smgr_rnode.node.spcNode,                                                          smgr->smgr_rnode.node.dbNode,                                                          smgr->smgr_rnode.node.relNode);                FlushBuffer(buf, NULL);                LWLockRelease(BufferDescriptorGetContentLock(buf));                ScheduleBufferTagForWriteback(&BackendWritebackContext,                                              &buf->tag);                TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_DONE(forkNum, blockNum,                                                         smgr->smgr_rnode.node.spcNode,                                                         smgr->smgr_rnode.node.dbNode,                                                         smgr->smgr_rnode.node.relNode);            }            else            {                /*                 * Someone else has locked the buffer, so give it up and loop                 * back to get another one.                 * 其他进程已经锁定了buffer,放弃,获取另外一个                 */                UnpinBuffer(buf, true);                continue;            }        }        /*         * To change the association of a valid buffer, we'll need to have         * exclusive lock on both the old and new mapping partitions.         * 修改有效缓冲区的相关性,需要在原有和新的映射分区上持有独占锁         */        if (oldFlags & BM_TAG_VALID)        {            //----------- buffer标记为BM_TAG_VALID            /*             * Need to compute the old tag's hashcode and partition lock ID.             * XXX is it worth storing the hashcode in BufferDesc so we need             * not recompute it here?  Probably not.             * 需要计算原tag的hashcode和partition lock ID.             * 这里是否值得存储hashcode在BufferDesc中而无需再次计算?可能不值得.             */            oldTag = buf->tag;            oldHash = BufTableHashCode(&oldTag);            oldPartitionLock = BufMappingPartitionLock(oldHash);            /*             * Must lock the lower-numbered partition first to avoid             * deadlocks.             * 必须首先锁定更低一级编号的分区以避免死锁             */            if (oldPartitionLock < newPartitionLock)            {                //按顺序锁定                LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);                LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);            }            else if (oldPartitionLock > newPartitionLock)            {                //按顺序锁定                LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);                LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);            }            else            {                /* only one partition, only one lock */                //只有一个分区,只需要一个锁                LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);            }        }        else        {            //----------- buffer未标记为BM_TAG_VALID            /* if it wasn't valid, we need only the new partition */            //buffer无效,需要新的分区            LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);            /* remember we have no old-partition lock or tag */            //不需要原有分区的锁&tag            oldPartitionLock = NULL;            /* this just keeps the compiler quiet about uninit variables */            //这行代码的目的是让编译器"闭嘴"            oldHash = 0;        }        /*         * Try to make a hashtable entry for the buffer under its new tag.         * This could fail because while we were writing someone else         * allocated another buffer for the same block we want to read in.         * Note that we have not yet removed the hashtable entry for the old         * tag.         * 尝试使用buffer新的tag构造hash表入口.         * 这可能会失败,因为在我们写入时其他进程可能已为我们希望读入的同一个block分配了另外一个buffer.         * 注意我们还没有删除原有tag的hash表入口.         */        buf_id = BufTableInsert(&newTag, newHash, buf->buf_id);        if (buf_id >= 0)        {            /*             * Got a collision. Someone has already done what we were about to             * do. We'll just handle this as if it were found in the buffer             * pool in the first place.  First, give up the buffer we were             * planning to use.             * 存在冲突.某个进程已完成了我们准备做的事情.             * 在这里只需要像一开始处理的那样,视为已在缓冲池发现该buffer.             * 首先,放弃计划使用的buffer.             */            UnpinBuffer(buf, true);            /* Can give up that buffer's mapping partition lock now */            //放弃原有的partition lock            if (oldPartitionLock != NULL &&                oldPartitionLock != newPartitionLock)                LWLockRelease(oldPartitionLock);            /* remaining code should match code at top of routine */            //剩余的代码应匹配上面的处理过程            //详细参见以上代码注释            buf = GetBufferDescriptor(buf_id);            valid = PinBuffer(buf, strategy);            /* Can release the mapping lock as soon as we've pinned it */            //是否新partition lock            LWLockRelease(newPartitionLock);            //设置标记            *foundPtr = true;            if (!valid)            {                /*                 * We can only get here if (a) someone else is still reading                 * in the page, or (b) a previous read attempt failed.  We                 * have to wait for any active read attempt to finish, and                 * then set up our own read attempt if the page is still not                 * BM_VALID.  StartBufferIO does it all.                 */                if (StartBufferIO(buf, true))                {                    /*                     * If we get here, previous attempts to read the buffer                     * must have failed ... but we shall bravely try again.                     */                    *foundPtr = false;                }            }            return buf;        }        /*         * Need to lock the buffer header too in order to change its tag.         * 需要锁定缓冲头部,目的是修改tag         */        buf_state = LockBufHdr(buf);        /*         * Somebody could have pinned or re-dirtied the buffer while we were         * doing the I/O and making the new hashtable entry.  If so, we can't         * recycle this buffer; we must undo everything we've done and start         * over with a new victim buffer.         * 在我们执行I/O和标记新的hash表入口时,某些进程可能已经pinned或者重新弄脏了buffer.         * 如出现这样的情况,不能回收该缓冲区;必须回滚我们所做的所有事情,并重新寻找新的待淘汰的缓冲区.         */        oldFlags = buf_state & BUF_FLAG_MASK;        if (BUF_STATE_GET_REFCOUNT(buf_state) == 1 && !(oldFlags & BM_DIRTY))            //已经OK了            break;        //解锁buffer header        UnlockBufHdr(buf, buf_state);        //删除hash表入口        BufTableDelete(&newTag, newHash);        //释放锁        if (oldPartitionLock != NULL &&            oldPartitionLock != newPartitionLock)            LWLockRelease(oldPartitionLock);        LWLockRelease(newPartitionLock);        UnpinBuffer(buf, true);        //重新寻找buffer    }    /*     * Okay, it's finally safe to rename the buffer.     * 现在终于可以安全的给buffer重命名了     *     * Clearing BM_VALID here is necessary, clearing the dirtybits is just     * paranoia.  We also reset the usage_count since any recency of use of     * the old content is no longer relevant.  (The usage_count starts out at     * 1 so that the buffer can survive one clock-sweep pass.)     * 如需要,清除BM_VALID标记,清除脏标记位.     * 我们还需要重置usage_count,因为使用旧内容的recency不再相关.     * (usage_count从1开始,因此buffer可以在一个时钟周期经过后仍能存活)     *     * Make sure BM_PERMANENT is set for buffers that must be written at every     * checkpoint.  Unlogged buffers only need to be written at shutdown     * checkpoints, except for their "init" forks, which need to be treated     * just like permanent relations.     * 确保标记为BM_PERMANENT的buffer必须在每次checkpoint时刷到磁盘上.     * Unlogged缓冲只需要在shutdown checkpoint时才需要写入,除非它们"init" forks,     *   这些操作需要类似持久化关系一样处理.     */    buf->tag = newTag;    buf_state &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED |                   BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT |                   BUF_USAGECOUNT_MASK);    if (relpersistence == RELPERSISTENCE_PERMANENT || forkNum == INIT_FORKNUM)        buf_state |= BM_TAG_VALID | BM_PERMANENT | BUF_USAGECOUNT_ONE;    else        buf_state |= BM_TAG_VALID | BUF_USAGECOUNT_ONE;    UnlockBufHdr(buf, buf_state);    if (oldPartitionLock != NULL)    {        BufTableDelete(&oldTag, oldHash);        if (oldPartitionLock != newPartitionLock)            LWLockRelease(oldPartitionLock);    }    LWLockRelease(newPartitionLock);    /*     * Buffer contents are currently invalid.  Try to get the io_in_progress     * lock.  If StartBufferIO returns false, then someone else managed to     * read it before we did, so there's nothing left for BufferAlloc() to do.     * 缓冲区内存已无效.     * 尝试获取io_in_progress lock.如StartBufferIO返回F,意味着其他进程已在我们完成前读取该缓冲区,     *   因此对于BufferAlloc()来说,已无事可做.     */    if (StartBufferIO(buf, true))        *foundPtr = false;    else        *foundPtr = true;    return buf;}

三、跟踪分析

测试脚本,查询数据表:

10:01:54 (xdb@[local]:5432)testdb=# select * from t1 limit 10;

启动gdb,设置断点

(gdb) b BufferAllocBreakpoint 1 at 0x8778ad: file bufmgr.c, line 1005.(gdb) cContinuing.Breakpoint 1, BufferAlloc (smgr=0x2267430, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=0, strategy=0x0,     foundPtr=0x7ffcc97fb4f3) at bufmgr.c:10051005        INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);(gdb)

输入参数
smgr-SMgrRelationData结构体指针
relpersistence-关系是否持久化
forkNum-fork类型,MAIN_FORKNUM对应数据文件,还有fsm/vm文件
blockNum-块号
strategy-buffer访问策略,为NULL
*foundPtr-输出参数

(gdb) p *smgr$1 = {smgr_rnode = {node = {spcNode = 1663, dbNode = 16402, relNode = 51439}, backend = -1}, smgr_owner = 0x7f86133f3778,   smgr_targblock = 4294967295, smgr_fsm_nblocks = 4294967295, smgr_vm_nblocks = 4294967295, smgr_which = 0,   md_num_open_segs = {0, 0, 0, 0}, md_seg_fds = {0x0, 0x0, 0x0, 0x0}, next_unowned_reln = 0x0}(gdb) p *smgr->smgr_owner$2 = (struct SMgrRelationData *) 0x2267430(gdb) p **smgr->smgr_owner$3 = {smgr_rnode = {node = {spcNode = 1663, dbNode = 16402, relNode = 51439}, backend = -1}, smgr_owner = 0x7f86133f3778,   smgr_targblock = 4294967295, smgr_fsm_nblocks = 4294967295, smgr_vm_nblocks = 4294967295, smgr_which = 0,   md_num_open_segs = {0, 0, 0, 0}, md_seg_fds = {0x0, 0x0, 0x0, 0x0}, next_unowned_reln = 0x0}(gdb)

1.初始化,根据Tag确定hash值和分区锁定ID

(gdb) n1008        newHash = BufTableHashCode(&newTag);(gdb) p newTag$4 = {rnode = {spcNode = 1663, dbNode = 16402, relNode = 51439}, forkNum = MAIN_FORKNUM, blockNum = 0}(gdb) n1009        newPartitionLock = BufMappingPartitionLock(newHash);(gdb) 1012        LWLockAcquire(newPartitionLock, LW_SHARED);(gdb) 1013        buf_id = BufTableLookup(&newTag, newHash);(gdb) p newHash$5 = 1398580903(gdb) p newPartitionLock$6 = (LWLock *) 0x7f85e5db9600(gdb) p *newPartitionLock$7 = {tranche = 59, state = {value = 536870913}, waiters = {head = 2147483647, tail = 2147483647}}(gdb)

2.检查block是否已在buffer pool中

(gdb) n1014        if (buf_id >= 0)(gdb) p buf_id$8 = -1

4.在缓冲区中找不到该buffer(buf_id < 0)
4.1释放newPartitionLock
4.2执行循环,寻找合适的buffer
4.2.1确保在自旋锁尚未持有时,有一个空闲的refcount入口(条目) --> ReservePrivateRefCountEntry

(gdb) n1056        LWLockRelease(newPartitionLock);(gdb) 1065            ReservePrivateRefCountEntry();(gdb)

4.2.2选择一个待淘汰的buffer

(gdb) n1071            buf = StrategyGetBuffer(strategy, &buf_state);(gdb) n1073            Assert(BUF_STATE_GET_REFCOUNT(buf_state) == 0);(gdb) p buf$9 = (BufferDesc *) 0x7f85e705fd80(gdb) p *buf$10 = {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum = InvalidForkNumber, blockNum = 4294967295},   buf_id = 104, state = {value = 4194304}, wait_backend_pid = 0, freeNext = -2, content_lock = {tranche = 54, state = {      value = 536870912}, waiters = {head = 2147483647, tail = 2147483647}}}(gdb)

4.2.3拷贝buffer flags到oldFlags中

(gdb) n1076            oldFlags = buf_state & BUF_FLAG_MASK;(gdb)

4.2.4Pin buffer,然后释放buffer自旋锁

(gdb) 1079            PinBuffer_Locked(buf);(gdb)

4.2.5如buffer标记位BM_DIRTY,FlushBuffer

1088            if (oldFlags & BM_DIRTY)(gdb)

4.2.6如buffer标记为BM_TAG_VALID,计算原tag的hashcode和partition lock ID,并锁定新旧分区锁
否则需要新的分区,锁定新分区锁,重置原分区锁和原hash值

(gdb) 1166            if (oldFlags & BM_TAG_VALID)(gdb) 1200                LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);(gdb) 1202                oldPartitionLock = NULL;(gdb) 1204                oldHash = 0;(gdb) p oldFlags$11 = 4194304(gdb)

4.2.7尝试使用buffer新的tag构造hash表入口

(gdb) 1214            buf_id = BufTableInsert(&newTag, newHash, buf->buf_id);(gdb) n1216            if (buf_id >= 0)(gdb) p buf_id$12 = -1(gdb)

4.2.9不存在冲突(buf_id < 0),锁定buffer header,如缓冲区没有变脏或者被pinned,则已找到buf,跳出循环
否则,解锁buffer header,删除hash表入口,释放锁,重新寻找buffer

(gdb) n1267            buf_state = LockBufHdr(buf);(gdb) 1275            oldFlags = buf_state & BUF_FLAG_MASK;(gdb) 1276            if (BUF_STATE_GET_REFCOUNT(buf_state) == 1 && !(oldFlags & BM_DIRTY))(gdb) 1277                break;(gdb)

4.3可以重新设置buffer tag,完成后解锁buffer header,删除原有的hash表入口,释放分区锁

1301        buf->tag = newTag;(gdb) 1302        buf_state &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED |(gdb) 1305        if (relpersistence == RELPERSISTENCE_PERMANENT || forkNum == INIT_FORKNUM)(gdb) 1306            buf_state |= BM_TAG_VALID | BM_PERMANENT | BUF_USAGECOUNT_ONE;(gdb) 1310        UnlockBufHdr(buf, buf_state);(gdb) 1312        if (oldPartitionLock != NULL)(gdb) 1319        LWLockRelease(newPartitionLock);(gdb) p *buf$13 = {tag = {rnode = {spcNode = 1663, dbNode = 16402, relNode = 51439}, forkNum = MAIN_FORKNUM, blockNum = 0},   buf_id = 104, state = {value = 2181300225}, wait_backend_pid = 0, freeNext = -2, content_lock = {tranche = 54, state = {      value = 536870912}, waiters = {head = 2147483647, tail = 2147483647}}}(gdb)

4.4执行StartBufferIO,设置*foundPtr标记

(gdb) 1326        if (StartBufferIO(buf, true))(gdb) n1327            *foundPtr = false;(gdb)

4.5返回buf

(gdb) 1331        return buf;(gdb) 1332    }(gdb)

执行完成

(gdb) ReadBuffer_common (smgr=0x2267430, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=0, mode=RBM_NORMAL, strategy=0x0,     hit=0x7ffcc97fb5eb) at bufmgr.c:747747         if (found)(gdb) 750             pgBufferUsage.shared_blks_read++;(gdb)

到此,关于"PostgreSQL中BufferAlloc函数有什么作用"的学习就结束了,希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习,快去试试吧!若想继续学习更多相关知识,请继续关注网站,小编会继续努力为大家带来更多实用的文章!

缓冲 标记 进程 缓冲区 入口 数据 状态 尝试 后台 处理 存储 情况 结构 检查 对象 指针 策略 缓存 页面 选择 数据库的安全要保护哪些东西 数据库安全各自的含义是什么 生产安全数据库录入 数据库的安全性及管理 数据库安全策略包含哪些 海淀数据库安全审计系统 建立农村房屋安全信息数据库 易用的数据库客户端支持安全管理 连接数据库失败ssl安全错误 数据库的锁怎样保障安全 金华比奇网络技术怎么样 asp显示数据库记录 苹果明明有网却说无法连接服务器 智能交通的无线网络技术 app手机软件开发图片 2021校园网络安全宣传视频 北斗定位软件开发 轻量服务器做网站访问量 多益网络手机软件开发工程师 金融生态下的网络安全 黄浦区网络软件开发程序 黄浦区推广软件开发供应商哪个好 吉林省诺达网络技术 服务器管理人员工作职责 服务器死机后怎么开机 淄博市网络安全检查 阿里云服务器的核心技术 厦门网邦尚诺网络技术有限公司 什么是数据库模型的型 怎么修改数据库的时间戳 数据库的监听启动不了 苹果明明有网却说无法连接服务器 网络仿真靶场 网络安全 深圳依帮网络技术有限公司怎么样 网络安全告知牌 教育行业软件开发哪家好 王者荣耀海外服务器 为了民众方便建数据库如何建 2020网络安全大会周鸿祎讲话 裴志勇网络安全专家
0