千家信息网

PostgreSQL 源码解读(247)- HTAB动态扩展图解#1

发表于:2024-09-22 作者:千家信息网编辑
千家信息网最后更新 2024年09月22日,本节简单介绍了PostgreSQL中的HTAB如何动态扩展,这是第1部分.一、数据结构/* * Top control structure for a hashtable --- in a share
千家信息网最后更新 2024年09月22日PostgreSQL 源码解读(247)- HTAB动态扩展图解#1

本节简单介绍了PostgreSQL中的HTAB如何动态扩展,这是第1部分.

一、数据结构

/* * Top control structure for a hashtable --- in a shared table, each backend * has its own copy (OK since no fields change at runtime) * 哈希表的顶层控制结构. * 在这个共享哈希表中,每一个后台进程都有自己的拷贝 * (之所以没有问题是因为fork出来后,在运行期没有字段会变化) */struct HTAB{    //指向共享的控制信息    HASHHDR    *hctl;           /* => shared control information */    //段目录    HASHSEGMENT *dir;           /* directory of segment starts */    //哈希函数    HashValueFunc hash;         /* hash function */    //哈希键比较函数    HashCompareFunc match;      /* key comparison function */    //哈希键拷贝函数    HashCopyFunc keycopy;       /* key copying function */    //内存分配器    HashAllocFunc alloc;        /* memory allocator */    //内存上下文    MemoryContext hcxt;         /* memory context if default allocator used */    //表名(用于错误信息)    char       *tabname;        /* table name (for error messages) */    //如在共享内存中,则为T    bool        isshared;       /* true if table is in shared memory */    //如为T,则固定大小不能扩展    bool        isfixed;        /* if true, don't enlarge */    /* freezing a shared table isn't allowed, so we can keep state here */    //不允许冻结共享表,因此这里会保存相关状态    bool        frozen;         /* true = no more inserts allowed */    /* We keep local copies of these fixed values to reduce contention */    //保存这些固定值的本地拷贝,以减少冲突    //哈希键长度(以字节为单位)    Size        keysize;        /* hash key length in bytes */    //段大小,必须为2的幂    long        ssize;          /* segment size --- must be power of 2 */    //段偏移,ssize的对数    int         sshift;         /* segment shift = log2(ssize) */};/* * Header structure for a hash table --- contains all changeable info * 哈希表的头部结构 -- 存储所有可变信息 * * In a shared-memory hash table, the HASHHDR is in shared memory, while * each backend has a local HTAB struct.  For a non-shared table, there isn't * any functional difference between HASHHDR and HTAB, but we separate them * anyway to share code between shared and non-shared tables. * 在共享内存哈希表中,HASHHDR位于共享内存中,每一个后台进程都有一个本地HTAB结构. * 对于非共享哈希表,HASHHDR和HTAB没有任何功能性的不同, * 但无论如何,我们还是把它们区分为共享和非共享表. */struct HASHHDR{    /*     * The freelist can become a point of contention in high-concurrency hash     * tables, so we use an array of freelists, each with its own mutex and     * nentries count, instead of just a single one.  Although the freelists     * normally operate independently, we will scavenge entries from freelists     * other than a hashcode's default freelist when necessary.     * 在高并发的哈希表中,空闲链表会成为竞争热点,因此我们使用空闲链表数组,     *   数组中的每一个元素都有自己的mutex和条目统计,而不是使用一个.     *     * If the hash table is not partitioned, only freeList[0] is used and its     * spinlock is not used at all; callers' locking is assumed sufficient.     * 如果哈希表没有分区,那么只有freelist[0]元素是有用的,自旋锁没有任何用处;     * 调用者锁定被认为已足够OK.     */    /* Number of freelists to be used for a partitioned hash table. */    //#define NUM_FREELISTS           32    FreeListData freeList[NUM_FREELISTS];    /* These fields can change, but not in a partitioned table */    //这些域字段可以改变,但不适用于分区表    /* Also, dsize can't change in a shared table, even if unpartitioned */    //同时,就算是非分区表,共享表的dsize也不能改变    //目录大小    long        dsize;          /* directory size */    //已分配的段大小(<= dsize)    long        nsegs;          /* number of allocated segments (<= dsize) */    //正在使用的最大桶ID    uint32      max_bucket;     /* ID of maximum bucket in use */    //进入整个哈希表的模掩码    uint32      high_mask;      /* mask to modulo into entire table */    //进入低位哈希表的模掩码    uint32      low_mask;       /* mask to modulo into lower half of table */    /* These fields are fixed at hashtable creation */    //下面这些字段在哈希表创建时已固定    //哈希键大小(以字节为单位)    Size        keysize;        /* hash key length in bytes */    //所有用户元素大小(以字节为单位)    Size        entrysize;      /* total user element size in bytes */    //分区个数(2的幂),或者为0    long        num_partitions; /* # partitions (must be power of 2), or 0 */    //目标的填充因子    long        ffactor;        /* target fill factor */    //如目录是固定大小,则该值为dsize的上限值    long        max_dsize;      /* 'dsize' limit if directory is fixed size */    //段大小,必须是2的幂    long        ssize;          /* segment size --- must be power of 2 */    //段偏移,ssize的对数    int         sshift;         /* segment shift = log2(ssize) */    //一次性分配的条目个数    int         nelem_alloc;    /* number of entries to allocate at once */#ifdef HASH_STATISTICS    /*     * Count statistics here.  NB: stats code doesn't bother with mutex, so     * counts could be corrupted a bit in a partitioned table.     * 统计信息.     * 注意:统计相关的代码不会影响mutex,因此对于分区表,统计可能有一点点问题     */    long        accesses;    long        collisions;#endif};/* * Per-freelist data. * 空闲链表数据. * * In a partitioned hash table, each freelist is associated with a specific * set of hashcodes, as determined by the FREELIST_IDX() macro below. * nentries tracks the number of live hashtable entries having those hashcodes * (NOT the number of entries in the freelist, as you might expect). * 在一个分区哈希表中,每一个空闲链表与特定的hashcodes集合相关,通过下面的FREELIST_IDX()宏进行定义. * nentries跟踪有这些hashcodes的仍存活的hashtable条目个数. * (注意不要搞错,不是空闲的条目个数) * * The coverage of a freelist might be more or less than one partition, so it * needs its own lock rather than relying on caller locking.  Relying on that * wouldn't work even if the coverage was the same, because of the occasional * need to "borrow" entries from another freelist; see get_hash_entry(). * 空闲链表的覆盖范围可能比一个分区多或少,因此需要自己的锁而不能仅仅依赖调用者的锁. * 依赖调用者锁在覆盖面一样的情况下也不会起效,因为偶尔需要从另一个自由列表"借用"条目,详细参见get_hash_entry() * * Using an array of FreeListData instead of separate arrays of mutexes, * nentries and freeLists helps to reduce sharing of cache lines between * different mutexes. * 使用FreeListData数组而不是一个独立的mutexes,nentries和freelists数组有助于减少不同mutexes之间的缓存线共享. */typedef struct{    //该空闲链表的自旋锁    slock_t     mutex;          /* spinlock for this freelist */    //相关桶中的条目个数    long        nentries;       /* number of entries in associated buckets */    //空闲元素链    HASHELEMENT *freeList;      /* chain of free elements */} FreeListData;/* * HASHELEMENT is the private part of a hashtable entry.  The caller's data * follows the HASHELEMENT structure (on a MAXALIGN'd boundary).  The hash key * is expected to be at the start of the caller's hash entry data structure. * HASHELEMENT是哈希表条目的私有部分. * 调用者的数据按照HASHELEMENT结构组织(位于MAXALIGN的边界). * 哈希键应位于调用者hash条目数据结构的开始位置. */typedef struct HASHELEMENT{    //链接到相同桶中的下一个条目    struct HASHELEMENT *link;   /* link to next entry in same bucket */    //该条目的哈希函数结果    uint32      hashvalue;      /* hash function result for this entry */} HASHELEMENT;/* Hash table header struct is an opaque type known only within dynahash.c *///哈希表头部结构,非透明类型,用于dynahash.ctypedef struct HASHHDR HASHHDR;/* Hash table control struct is an opaque type known only within dynahash.c *///哈希表控制结构,非透明类型,用于dynahash.ctypedef struct HTAB HTAB;/* Parameter data structure for hash_create *///hash_create使用的参数数据结构/* Only those fields indicated by hash_flags need be set *///根据hash_flags标记设置相应的字段typedef struct HASHCTL{    //分区个数(必须是2的幂)    long        num_partitions; /* # partitions (must be power of 2) */    //段大小    long        ssize;          /* segment size */    //初始化目录大小    long        dsize;          /* (initial) directory size */    //dsize上限    long        max_dsize;      /* limit to dsize if dir size is limited */    //填充因子    long        ffactor;        /* fill factor */    //哈希键大小(字节为单位)    Size        keysize;        /* hash key length in bytes */    //参见上述数据结构注释    Size        entrysize;      /* total user element size in bytes */    //    HashValueFunc hash;         /* hash function */    HashCompareFunc match;      /* key comparison function */    HashCopyFunc keycopy;       /* key copying function */    HashAllocFunc alloc;        /* memory allocator */    MemoryContext hcxt;         /* memory context to use for allocations */    //共享内存中的哈希头部结构地址    HASHHDR    *hctl;           /* location of header in shared mem */} HASHCTL;/* A hash bucket is a linked list of HASHELEMENTs *///哈希桶是HASHELEMENTs链表typedef HASHELEMENT *HASHBUCKET;/* A hash segment is an array of bucket headers *///hash segment是桶数组typedef HASHBUCKET *HASHSEGMENT;/* * Hash functions must have this signature. * Hash函数必须有它自己的标识 */typedef uint32 (*HashValueFunc) (const void *key, Size keysize); /* * Key comparison functions must have this signature.  Comparison functions * return zero for match, nonzero for no match.  (The comparison function * definition is designed to allow memcmp() and strncmp() to be used directly * as key comparison functions.) * 哈希键对比函数必须有自己的标识. * 如匹配则对比函数返回0,不匹配返回非0. * (对比函数定义被设计为允许在对比键值时可直接使用memcmp()和strncmp()) */typedef int (*HashCompareFunc) (const void *key1, const void *key2, Size keysize); /* * Key copying functions must have this signature.  The return value is not * used.  (The definition is set up to allow memcpy() and strlcpy() to be * used directly.) * 键拷贝函数必须有自己的标识. * 返回值无用. */typedef void *(*HashCopyFunc) (void *dest, const void *src, Size keysize);/* * Space allocation function for a hashtable --- designed to match malloc(). * Note: there is no free function API; can't destroy a hashtable unless you * use the default allocator. * 哈希表的恐惧分配函数 -- 被设计为与malloc()函数匹配. * 注意:这里没有释放函数API;不能销毁哈希表,除非使用默认的分配器. */typedef void *(*HashAllocFunc) (Size request);

其结构如下图所示:

扩展后的结构如下图所示:

二、跟踪分析

N/A

三、参考资料

N/A

0