千家信息网

怎么使用PostgreSQL的tuplesort_performsort函数

发表于:2025-01-22 作者:千家信息网编辑
千家信息网最后更新 2025年01月22日,本篇内容主要讲解"怎么使用PostgreSQL的tuplesort_performsort函数",感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷,实用性强。下面就让小编来带大家学习"怎么使用Pos
千家信息网最后更新 2025年01月22日怎么使用PostgreSQL的tuplesort_performsort函数

本篇内容主要讲解"怎么使用PostgreSQL的tuplesort_performsort函数",感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷,实用性强。下面就让小编来带大家学习"怎么使用PostgreSQL的tuplesort_performsort函数"吧!

TupleTableSlot
执行器在"tuple table"中存储元组,这个表是各自独立的TupleTableSlots链表.

/*---------- * The executor stores tuples in a "tuple table" which is a List of * independent TupleTableSlots.  There are several cases we need to handle: *      1. physical tuple in a disk buffer page *      2. physical tuple constructed in palloc'ed memory *      3. "minimal" physical tuple constructed in palloc'ed memory *      4. "virtual" tuple consisting of Datum/isnull arrays * 执行器在"tuple table"中存储元组,这个表是各自独立的TupleTableSlots链表. * 有以下情况需要处理: *      1. 磁盘缓存页中的物理元组 *      2. 在已分配内存中构造的物理元组 *      3. 在已分配内存中构造的"minimal"物理元组 *      4. 含有Datum/isnull数组的"virtual"虚拟元组 * * The first two cases are similar in that they both deal with "materialized" * tuples, but resource management is different.  For a tuple in a disk page * we need to hold a pin on the buffer until the TupleTableSlot's reference * to the tuple is dropped; while for a palloc'd tuple we usually want the * tuple pfree'd when the TupleTableSlot's reference is dropped. * 最上面2种情况跟"物化"元组的处理方式类似,但资源管理是不同的. * 对于在磁盘页中的元组,需要pin在缓存中直至TupleTableSlot依赖的元组被清除, *   而对于通过palloc分配的元组在TupleTableSlot依赖被清除后通常希望使用pfree释放 * * A "minimal" tuple is handled similarly to a palloc'd regular tuple. * At present, minimal tuples never are stored in buffers, so there is no * parallel to case 1.  Note that a minimal tuple has no "system columns". * (Actually, it could have an OID, but we have no need to access the OID.) * "minimal"元组与通常的palloc分配的元组处理类似. * 截止目前为止,"minimal"元组不会存储在缓存中,因此对于第一种情况不会存在并行的问题. * 注意"minimal"没有"system columns"系统列 * (实际上,可以有OID,但不需要访问OID列) * * A "virtual" tuple is an optimization used to minimize physical data * copying in a nest of plan nodes.  Any pass-by-reference Datums in the * tuple point to storage that is not directly associated with the * TupleTableSlot; generally they will point to part of a tuple stored in * a lower plan node's output TupleTableSlot, or to a function result * constructed in a plan node's per-tuple econtext.  It is the responsibility * of the generating plan node to be sure these resources are not released * for as long as the virtual tuple needs to be valid.  We only use virtual * tuples in the result slots of plan nodes --- tuples to be copied anywhere * else need to be "materialized" into physical tuples.  Note also that a * virtual tuple does not have any "system columns". * "virtual"元组是用于在嵌套计划节点中拷贝时最小化物理数据的优化. * 所有通过引用传递指向与TupleTableSlot非直接相关的存储的元组的Datums使用, *   通常它们会指向存储在低层节点输出的TupleTableSlot中的元组的一部分, *   或者指向在计划节点的per-tuple内存上下文econtext中构造的函数结果. * 产生计划节点的时候有责任确保这些资源未被释放,确保virtual元组是有效的. * 我们使用计划节点中的结果slots中的虚拟元组 --- 元组会拷贝到其他地方需要"物化"到物理元组中. * 注意virtual元组不需要有"system columns" * * It is also possible for a TupleTableSlot to hold both physical and minimal * copies of a tuple.  This is done when the slot is requested to provide * the format other than the one it currently holds.  (Originally we attempted * to handle such requests by replacing one format with the other, but that * had the fatal defect of invalidating any pass-by-reference Datums pointing * into the existing slot contents.)  Both copies must contain identical data * payloads when this is the case. * TupleTableSlot包含物理和minimal元组拷贝是可能的. * 在slot需要提供格式化而不是当前持有的格式时会出现这种情况. * (原始的情况是我们准备通过另外一种格式进行替换来处理这种请求,但在校验引用传递Datums时会出现致命错误) * 同时在这种情况下,拷贝必须含有唯一的数据payloads. * * The Datum/isnull arrays of a TupleTableSlot serve double duty.  When the * slot contains a virtual tuple, they are the authoritative data.  When the * slot contains a physical tuple, the arrays contain data extracted from * the tuple.  (In this state, any pass-by-reference Datums point into * the physical tuple.)  The extracted information is built "lazily", * ie, only as needed.  This serves to avoid repeated extraction of data * from the physical tuple. * TupleTableSlot中的Datum/isnull数组有双重职责. * 在slot包含虚拟元组时,它们是authoritative(权威)数据. * 在slot包含物理元组时,时包含从元组中提取的数据的数组. * (在这种情况下,所有通过引用传递的Datums指向物理元组) * 提取的信息通过'lazily'在需要的时候才构建. * 这样可以避免从物理元组的重复数据提取. * * A TupleTableSlot can also be "empty", holding no valid data.  This is * the only valid state for a freshly-created slot that has not yet had a * tuple descriptor assigned to it.  In this state, tts_isempty must be * true, tts_shouldFree false, tts_tuple NULL, tts_buffer InvalidBuffer, * and tts_nvalid zero. * TupleTableSlot可能为"empty",没有有效数据. * 对于新鲜创建仍未分配描述的的slot来说这是唯一有效的状态. * 在这种状态下,tts_isempty必须为T,tts_shouldFree为F, tts_tuple为NULL, *   tts_buffer为InvalidBuffer,tts_nvalid为0. * * The tupleDescriptor is simply referenced, not copied, by the TupleTableSlot * code.  The caller of ExecSetSlotDescriptor() is responsible for providing * a descriptor that will live as long as the slot does.  (Typically, both * slots and descriptors are in per-query memory and are freed by memory * context deallocation at query end; so it's not worth providing any extra * mechanism to do more.  However, the slot will increment the tupdesc * reference count if a reference-counted tupdesc is supplied.) * tupleDescriptor只是简单的引用并没有通过TupleTableSlot中的代码进行拷贝. * ExecSetSlotDescriptor()的调用者有责任提供与slot生命周期一样的描述符. * (典型的,不管是slots还是描述符会在per-query内存中, *  并且会在查询结束时通过内存上下文的析构器释放,因此不需要提供额外的机制来处理. *  但是,如果使用了引用计数型tupdesc,slot会增加tupdesc引用计数) * * When tts_shouldFree is true, the physical tuple is "owned" by the slot * and should be freed when the slot's reference to the tuple is dropped. * 在tts_shouldFree为T的情况下,物理元组由slot持有,并且在slot引用元组被清除时释放内存. * * If tts_buffer is not InvalidBuffer, then the slot is holding a pin * on the indicated buffer page; drop the pin when we release the * slot's reference to that buffer.  (tts_shouldFree should always be * false in such a case, since presumably tts_tuple is pointing at the * buffer page.) * 如tts_buffer不是InvalidBuffer,那么slot持有缓存页中的pin,在释放引用该buffer的slot时会清除该pin. * (tts_shouldFree通常来说应为F,因为tts_tuple会指向缓存页) * * tts_nvalid indicates the number of valid columns in the tts_values/isnull * arrays.  When the slot is holding a "virtual" tuple this must be equal * to the descriptor's natts.  When the slot is holding a physical tuple * this is equal to the number of columns we have extracted (we always * extract columns from left to right, so there are no holes). * tts_nvalid指示了tts_values/isnull数组中的有效列数. * 如果slot含有虚拟元组,该字段必须跟描述符的natts一样. * 在slot含有物理元组时,该字段等于我们提取的列数. * (我们通常从左到右提取列,因此不会有空洞存在) * * tts_values/tts_isnull are allocated when a descriptor is assigned to the * slot; they are of length equal to the descriptor's natts. * 在描述符分配给slot时tts_values/tts_isnull会被分配内存,长度与描述符natts长度一样. * * tts_mintuple must always be NULL if the slot does not hold a "minimal" * tuple.  When it does, tts_mintuple points to the actual MinimalTupleData * object (the thing to be pfree'd if tts_shouldFreeMin is true).  If the slot * has only a minimal and not also a regular physical tuple, then tts_tuple * points at tts_minhdr and the fields of that struct are set correctly * for access to the minimal tuple; in particular, tts_minhdr.t_data points * MINIMAL_TUPLE_OFFSET bytes before tts_mintuple.  This allows column * extraction to treat the case identically to regular physical tuples. * 如果slot没有包含minimal元组,tts_mintuple通常必须为NULL. * 如含有,则tts_mintuple执行实际的MinimalTupleData对象(如tts_shouldFreeMin为T,则需要通过pfree释放内存). * 如果slot只有一个minimal而没有通常的物理元组,那么tts_tuple指向tts_minhdr, *   结构体的其他字段会被正确的设置为用于访问minimal元组. *   特别的, tts_minhdr.t_data指向tts_mintuple前的MINIMAL_TUPLE_OFFSET字节. * 这可以让列提取可以独立处理通常的物理元组. * * tts_slow/tts_off are saved state for slot_deform_tuple, and should not * be touched by any other code. * tts_slow/tts_off用于存储slot_deform_tuple状态,不应通过其他代码修改. *---------- */typedef struct TupleTableSlot{    NodeTag     type;//Node标记    //如slot为空,则为T    bool        tts_isempty;    /* true = slot is empty */    //是否需要pfree tts_tuple?    bool        tts_shouldFree; /* should pfree tts_tuple? */    //是否需要pfree tts_mintuple?    bool        tts_shouldFreeMin;  /* should pfree tts_mintuple? */#define FIELDNO_TUPLETABLESLOT_SLOW 4    //为slot_deform_tuple存储状态?    bool        tts_slow;       /* saved state for slot_deform_tuple */#define FIELDNO_TUPLETABLESLOT_TUPLE 5    //物理元组,如为虚拟元组则为NULL    HeapTuple   tts_tuple;      /* physical tuple, or NULL if virtual */#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 6    //slot中的元组描述符    TupleDesc   tts_tupleDescriptor;    /* slot's tuple descriptor */    //slot所在的上下文    MemoryContext tts_mcxt;     /* slot itself is in this context */    //元组缓存,如无则为InvalidBuffer    Buffer      tts_buffer;     /* tuple's buffer, or InvalidBuffer */#define FIELDNO_TUPLETABLESLOT_NVALID 9    //tts_values中的有效值    int         tts_nvalid;     /* # of valid values in tts_values */#define FIELDNO_TUPLETABLESLOT_VALUES 10    //当前每个属性的值    Datum      *tts_values;     /* current per-attribute values */#define FIELDNO_TUPLETABLESLOT_ISNULL 11    //isnull数组    bool       *tts_isnull;     /* current per-attribute isnull flags */    //minimal元组,如无则为NULL    MinimalTuple tts_mintuple;  /* minimal tuple, or NULL if none */    //在minimal情况下的工作空间    HeapTupleData tts_minhdr;   /* workspace for minimal-tuple-only case */#define FIELDNO_TUPLETABLESLOT_OFF 14    //slot_deform_tuple的存储状态    uint32      tts_off;        /* saved state for slot_deform_tuple */    //不能被变更的描述符(固定描述符)    bool        tts_fixedTupleDescriptor;   /* descriptor can't be changed */} TupleTableSlot;/* base tuple table slot type */typedef struct TupleTableSlot{    NodeTag     type;//Node标记#define FIELDNO_TUPLETABLESLOT_FLAGS 1    uint16      tts_flags;      /* 布尔状态;Boolean states */#define FIELDNO_TUPLETABLESLOT_NVALID 2    AttrNumber  tts_nvalid;     /* 在tts_values中有多少有效的values;# of valid values in tts_values */    const TupleTableSlotOps *const tts_ops; /* slot的实际实现;implementation of slot */#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 4    TupleDesc   tts_tupleDescriptor;    /* slot的元组描述符;slot's tuple descriptor */#define FIELDNO_TUPLETABLESLOT_VALUES 5    Datum      *tts_values;     /* 当前属性值;current per-attribute values */#define FIELDNO_TUPLETABLESLOT_ISNULL 6    bool       *tts_isnull;     /* 当前属性isnull标记;current per-attribute isnull flags */    MemoryContext tts_mcxt;     /*内存上下文; slot itself is in this context */} TupleTableSlot;/* routines for a TupleTableSlot implementation *///TupleTableSlot的"小程序"struct TupleTableSlotOps{    /* Minimum size of the slot */    //slot的最小化大小    size_t          base_slot_size;    /* Initialization. */    //初始化方法    void (*init)(TupleTableSlot *slot);    /* Destruction. */    //析构方法    void (*release)(TupleTableSlot *slot);    /*     * Clear the contents of the slot. Only the contents are expected to be     * cleared and not the tuple descriptor. Typically an implementation of     * this callback should free the memory allocated for the tuple contained     * in the slot.     * 清除slot中的内容。     * 只希望清除内容,而不希望清除元组描述符。     * 通常,这个回调的实现应该释放为slot中包含的元组分配的内存。     */    void (*clear)(TupleTableSlot *slot);    /*     * Fill up first natts entries of tts_values and tts_isnull arrays with     * values from the tuple contained in the slot. The function may be called     * with natts more than the number of attributes available in the tuple,     * in which case it should set tts_nvalid to the number of returned     * columns.     * 用slot中包含的元组的值填充tts_values和tts_isnull数组的第一个natts条目。     * 在调用该函数时,natts可能多于元组中可用属性的数量,在这种情况下,     *   应该将tts_nvalid设置为返回列的数量。     */    void (*getsomeattrs)(TupleTableSlot *slot, int natts);    /*     * Returns value of the given system attribute as a datum and sets isnull     * to false, if it's not NULL. Throws an error if the slot type does not     * support system attributes.     * 将给定系统属性的值作为基准返回,如果不为NULL,     *   则将isnull设置为false。如果slot类型不支持系统属性,则引发错误。     */    Datum (*getsysattr)(TupleTableSlot *slot, int attnum, bool *isnull);    /*     * Make the contents of the slot solely depend on the slot, and not on     * underlying resources (like another memory context, buffers, etc).     * 使slot的内容完全依赖于slot,而不是底层资源(如另一个内存上下文、缓冲区等)。     */    void (*materialize)(TupleTableSlot *slot);    /*     * Copy the contents of the source slot into the destination slot's own     * context. Invoked using callback of the destination slot.     * 将源slot的内容复制到目标slot自己的上下文中。     * 使用目标slot的回调函数调用。     */    void (*copyslot) (TupleTableSlot *dstslot, TupleTableSlot *srcslot);    /*     * Return a heap tuple "owned" by the slot. It is slot's responsibility to     * free the memory consumed by the heap tuple. If the slot can not "own" a     * heap tuple, it should not implement this callback and should set it as     * NULL.     * 返回slot"拥有"的堆元组。     * slot负责释放堆元组分配的内存。     * 如果slot不能"拥有"堆元组,它不应该实现这个回调函数,应该将它设置为NULL。     */    HeapTuple (*get_heap_tuple)(TupleTableSlot *slot);    /*     * Return a minimal tuple "owned" by the slot. It is slot's responsibility     * to free the memory consumed by the minimal tuple. If the slot can not     * "own" a minimal tuple, it should not implement this callback and should     * set it as NULL.     * 返回slot"拥有"的最小元组。     * slot负责释放最小元组分配的内存。     * 如果slot不能"拥有"最小元组,它不应该实现这个回调函数,应该将它设置为NULL。     */    MinimalTuple (*get_minimal_tuple)(TupleTableSlot *slot);    /*     * Return a copy of heap tuple representing the contents of the slot. The     * copy needs to be palloc'd in the current memory context. The slot     * itself is expected to remain unaffected. It is *not* expected to have     * meaningful "system columns" in the copy. The copy is not be "owned" by     * the slot i.e. the caller has to take responsibilty to free memory     * consumed by the slot.     * 返回表示slot内容的堆元组副本。     * 需要在当前内存上下文中对副本进行内存分配palloc。     * 预计slot本身不会受到影响。     * 它不希望在副本中有有意义的"系统列"。副本不是slot"拥有"的,即调用方必须负责释放slot消耗的内存。     */    HeapTuple (*copy_heap_tuple)(TupleTableSlot *slot);    /*     * Return a copy of minimal tuple representing the contents of the slot. The     * copy needs to be palloc'd in the current memory context. The slot     * itself is expected to remain unaffected. It is *not* expected to have     * meaningful "system columns" in the copy. The copy is not be "owned" by     * the slot i.e. the caller has to take responsibilty to free memory     * consumed by the slot.     * 返回表示slot内容的最小元组的副本。     * 需要在当前内存上下文中对副本进行palloc。     * 预计slot本身不会受到影响。     * 它不希望在副本中有有意义的"系统列"。副本不是slot"拥有"的,即调用方必须负责释放slot消耗的内存。     */    MinimalTuple (*copy_minimal_tuple)(TupleTableSlot *slot);};typedef struct tupleDesc{    int         natts;          /* tuple中的属性数量;number of attributes in the tuple */    Oid         tdtypeid;       /* tuple类型的组合类型ID;composite type ID for tuple type */    int32       tdtypmod;       /* tuple类型的typmode;typmod for tuple type */    int         tdrefcount;     /* 依赖计数,如为-1,则没有依赖;reference count, or -1 if not counting */    TupleConstr *constr;        /* 约束,如无则为NULL;constraints, or NULL if none */    /* attrs[N] is the description of Attribute Number N+1 */    //attrs[N]是第N+1个属性的描述符    FormData_pg_attribute attrs[FLEXIBLE_ARRAY_MEMBER];}  *TupleDesc;

SortState
排序运行期状态信息

/* ---------------- *     SortState information *     排序运行期状态信息 * ---------------- */typedef struct SortState{    //基类    ScanState    ss;                /* its first field is NodeTag */    //是否需要随机访问排序输出?    bool        randomAccess;    /* need random access to sort output? */    //结果集是否存在边界?    bool        bounded;        /* is the result set bounded? */    //如存在边界,需要多少个元组?    int64        bound;            /* if bounded, how many tuples are needed */    //是否已完成排序?    bool        sort_Done;        /* sort completed yet? */    //是否使用有界值?    bool        bounded_Done;    /* value of bounded we did the sort with */    //使用的有界值?    int64        bound_Done;        /* value of bound we did the sort with */    //tuplesort.c的私有状态    void       *tuplesortstate; /* private state of tuplesort.c */    //是否worker?    bool        am_worker;        /* are we a worker? */    //每个worker对应一个条目    SharedSortInfo *shared_info;    /* one entry per worker */} SortState;/* ---------------- *     Shared memory container for per-worker sort information *     per-worker排序信息的共享内存容器 * ---------------- */typedef struct SharedSortInfo{    //worker个数?    int            num_workers;    //排序机制    TuplesortInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];} SharedSortInfo;

TuplesortInstrumentation
报告排序统计的数据结构.

/* * Data structures for reporting sort statistics.  Note that * TuplesortInstrumentation can't contain any pointers because we * sometimes put it in shared memory. * 报告排序统计的数据结构. * 注意TuplesortInstrumentation不能包含指针因为有时候会把该结构体放在共享内存中. */typedef enum{    SORT_TYPE_STILL_IN_PROGRESS = 0,//仍然在排序中    SORT_TYPE_TOP_N_HEAPSORT,//TOP N 堆排序    SORT_TYPE_QUICKSORT,//快速排序    SORT_TYPE_EXTERNAL_SORT,//外部排序    SORT_TYPE_EXTERNAL_MERGE//外部排序后的合并} TuplesortMethod;//排序方法typedef enum{    SORT_SPACE_TYPE_DISK,//需要用上磁盘    SORT_SPACE_TYPE_MEMORY//使用内存} TuplesortSpaceType;typedef struct TuplesortInstrumentation{    //使用的排序算法    TuplesortMethod sortMethod; /* sort algorithm used */    //排序使用空间类型    TuplesortSpaceType spaceType;    /* type of space spaceUsed represents */    //空间消耗(以K为单位)    long        spaceUsed;        /* space consumption, in kB */} TuplesortInstrumentation;

二、源码解读

tuplesort_performsort是排序的实现.

/* * All tuples have been provided; finish the sort. * 已存在元组,执行排序! */voidtuplesort_performsort(Tuplesortstate *state){    MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);#ifdef TRACE_SORT    if (trace_sort)        elog(LOG, "performsort of worker %d starting: %s",             state->worker, pg_rusage_show(&state->ru_start));#endif    //根据状态执行不同的逻辑    switch (state->status)    {        case TSS_INITIAL:            /*             * We were able to accumulate all the tuples within the allowed             * amount of memory, or leader to take over worker tapes             * 可以在允许的内存大小中积累所有的元组,或者让协调者接管工作tapes.             */            if (SERIAL(state))            {                /* Just qsort 'em and we're done */                //快速排序                tuplesort_sort_memtuples(state);                state->status = TSS_SORTEDINMEM;            }            else if (WORKER(state))            {                /*                 * Parallel workers must still dump out tuples to tape.  No                 * merge is required to produce single output run, though.                 * 并行worker必须dump元组到磁盘上.                 * 但是,生成单个输出运行不需要合并.                 */                inittapes(state, false);                dumptuples(state, true);                worker_nomergeruns(state);                state->status = TSS_SORTEDONTAPE;            }            else            {                /*                 * Leader will take over worker tapes and merge worker runs.                 * Note that mergeruns sets the correct state->status.                 * 并行协调器会接管工作进程的数据并合并工作线程运行.                 * 注意mergeruns会设置正确的状态:state->status                 */                leader_takeover_tapes(state);                mergeruns(state);            }            state->current = 0;            state->eof_reached = false;            state->markpos_block = 0L;            state->markpos_offset = 0;            state->markpos_eof = false;            break;        case TSS_BOUNDED://堆排序            /*             * We were able to accumulate all the tuples required for output             * in memory, using a heap to eliminate excess tuples.  Now we             * have to transform the heap to a properly-sorted array.             * 使用堆来消除多余的元组,在内存可以积累所有的元组用于输出.             * 现在我们必须转换堆为已排序的数组.             */            sort_bounded_heap(state);            state->current = 0;            state->eof_reached = false;            state->markpos_offset = 0;            state->markpos_eof = false;            state->status = TSS_SORTEDINMEM;            break;        case TSS_BUILDRUNS:            /*             * Finish tape-based sort.  First, flush all tuples remaining in             * memory out to tape; then merge until we have a single remaining             * run (or, if !randomAccess and !WORKER(), one run per tape).             * Note that mergeruns sets the correct state->status.             * 完成tape-based排序.             * 首先刷新所有在内存的元组到tape(持久化存储)上,然后合并直至只留下一个在运行.             * (否则,如果!randomAccess 且 !WORKER(),一个tape运行一次)             */            //全部刷到磁盘上            dumptuples(state, true);            //合并执行            mergeruns(state);            state->eof_reached = false;            state->markpos_block = 0L;            state->markpos_offset = 0;            state->markpos_eof = false;            break;        default:            elog(ERROR, "invalid tuplesort state");            break;    }#ifdef TRACE_SORT    if (trace_sort)    {        if (state->status == TSS_FINALMERGE)            elog(LOG, "performsort of worker %d done (except %d-way final merge): %s",                 state->worker, state->activeTapes,                 pg_rusage_show(&state->ru_start));        else            elog(LOG, "performsort of worker %d done: %s",                 state->worker, pg_rusage_show(&state->ru_start));    }#endif    MemoryContextSwitchTo(oldcontext);}

三、跟踪分析

测试脚本

select * from t_sort order by c1,c2;

跟踪分析

(gdb) b tuplesort_begin_heapBreakpoint 1 at 0xa6ffa1: file tuplesort.c, line 812.(gdb) b tuplesort_puttupleslotBreakpoint 2 at 0xa7119d: file tuplesort.c, line 1436.(gdb) b tuplesort_performsortBreakpoint 3 at 0xa71f45: file tuplesort.c, line 1792.(gdb) cContinuing.Breakpoint 1, tuplesort_begin_heap (tupDesc=0x208fa40, nkeys=2, attNums=0x2081858, sortOperators=0x2081878,     sortCollations=0x2081898, nullsFirstFlags=0x20818b8, workMem=4096, coordinate=0x0, randomAccess=false)    at tuplesort.c:812812        Tuplesortstate *state = tuplesort_begin_common(workMem, coordinate,(gdb)

tuplesort_begin_heap
输入参数

(gdb) p *tupDesc$1 = {natts = 7, tdtypeid = 2249, tdtypmod = -1, tdhasoid = false, tdrefcount = -1, constr = 0x0, attrs = 0x208fa60}(gdb) p *tupDesc->attrs$2 = {attrelid = 0, attname = {data = '\000' }, atttypid = 1043, attstattarget = -1, attlen = -1,   attnum = 1, attndims = 0, attcacheoff = -1, atttypmod = 24, attbyval = false, attstorage = 120 'x', attalign = 105 'i',   attnotnull = false, atthasdef = false, atthasmissing = false, attidentity = 0 '\000', attisdropped = false,   attislocal = true, attinhcount = 0, attcollation = 100}(gdb) p *attNums$3 = 2(gdb) p *sortOperators$4 = 97(gdb) p *sortCollations$5 = 0(gdb) p nullsFirstFlags$6 = (_Bool *) 0x20818b8(gdb) p *nullsFirstFlags$7 = false(gdb)

获取排序状态,status = TSS_INITIAL

(gdb) p *state$8 = {status = TSS_INITIAL, nKeys = 0, randomAccess = false, bounded = false, boundUsed = false, bound = 0, tuples = true,   availMem = 4169704, allowedMem = 4194304, maxTapes = 0, tapeRange = 0, sortcontext = 0x2093290, tuplecontext = 0x20992c0,   tapeset = 0x0, comparetup = 0x0, copytup = 0x0, writetup = 0x0, readtup = 0x0, memtuples = 0x209b310, memtupcount = 0,   memtupsize = 1024, growmemtuples = true, slabAllocatorUsed = false, slabMemoryBegin = 0x0, slabMemoryEnd = 0x0,   slabFreeHead = 0x0, read_buffer_size = 0, lastReturnedTuple = 0x0, currentRun = 0, mergeactive = 0x0, Level = 0,   destTape = 0, tp_fib = 0x0, tp_runs = 0x0, tp_dummy = 0x0, tp_tapenum = 0x0, activeTapes = 0, result_tape = -1,   current = 0, eof_reached = false, markpos_block = 0, markpos_offset = 0, markpos_eof = false, worker = -1, shared = 0x0,   nParticipants = -1, tupDesc = 0x0, sortKeys = 0x0, onlyKey = 0x0, abbrevNext = 0, indexInfo = 0x0, estate = 0x0,   heapRel = 0x0, indexRel = 0x0, enforceUnique = false, high_mask = 0, low_mask = 0, max_buckets = 0, datumType = 0,   datumTypeLen = 0, ru_start = {tv = {tv_sec = 0, tv_usec = 0}, ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {        tv_sec = 0, tv_usec = 0}, {ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {        ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 0, __ru_minflt_word = 0}, {        ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0},       {ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0,         __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 0, __ru_nvcsw_word = 0}, {        ru_nivcsw = 0, __ru_nivcsw_word = 0}}}}

设置运行状态

(gdb) n819        AssertArg(nkeys > 0);(gdb) 822        if (trace_sort)(gdb) 828        state->nKeys = nkeys;(gdb) 830        TRACE_POSTGRESQL_SORT_START(HEAP_SORT,(gdb) 837        state->comparetup = comparetup_heap;(gdb) 838        state->copytup = copytup_heap;(gdb) 839        state->writetup = writetup_heap;(gdb) 840        state->readtup = readtup_heap;(gdb) 842        state->tupDesc = tupDesc;    /* assume we need not copy tupDesc */(gdb) 843        state->abbrevNext = 10;(gdb) 846        state->sortKeys = (SortSupport) palloc0(nkeys * sizeof(SortSupportData));(gdb) 848        for (i = 0; i < nkeys; i++)(gdb) p *state$9 = {status = TSS_INITIAL, nKeys = 2, randomAccess = false, bounded = false, boundUsed = false, bound = 0, tuples = true,   availMem = 4169704, allowedMem = 4194304, maxTapes = 0, tapeRange = 0, sortcontext = 0x2093290, tuplecontext = 0x20992c0,   tapeset = 0x0, comparetup = 0xa7525b , copytup = 0xa76247 ,   writetup = 0xa76de1 , readtup = 0xa76ec6 , memtuples = 0x209b310, memtupcount = 0,   memtupsize = 1024, growmemtuples = true, slabAllocatorUsed = false, slabMemoryBegin = 0x0, slabMemoryEnd = 0x0,   slabFreeHead = 0x0, read_buffer_size = 0, lastReturnedTuple = 0x0, currentRun = 0, mergeactive = 0x0, Level = 0,   destTape = 0, tp_fib = 0x0, tp_runs = 0x0, tp_dummy = 0x0, tp_tapenum = 0x0, activeTapes = 0, result_tape = -1,   current = 0, eof_reached = false, markpos_block = 0, markpos_offset = 0, markpos_eof = false, worker = -1, shared = 0x0,   nParticipants = -1, tupDesc = 0x208fa40, sortKeys = 0x20937c0, onlyKey = 0x0, abbrevNext = 10, indexInfo = 0x0,   estate = 0x0, heapRel = 0x0, indexRel = 0x0, enforceUnique = false, high_mask = 0, low_mask = 0, max_buckets = 0,   datumType = 0, datumTypeLen = 0, ru_start = {tv = {tv_sec = 0, tv_usec = 0}, ru = {ru_utime = {tv_sec = 0, tv_usec = 0},       ru_stime = {tv_sec = 0, tv_usec = 0}, {ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {        ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 0, __ru_minflt_word = 0}, {        ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0},       {ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0,         __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 0, __ru_nvcsw_word = 0}, {        ru_nivcsw = 0, __ru_nivcsw_word = 0}}}}(gdb)

为每一列(c1&c2)准备SortSupport数据(分配内存空间)

(gdb) n850            SortSupport sortKey = state->sortKeys + i;(gdb) 852            AssertArg(attNums[i] != 0);(gdb) p *state->sortKeys$10 = {ssup_cxt = 0x0, ssup_collation = 0, ssup_reverse = false, ssup_nulls_first = false, ssup_attno = 0,   ssup_extra = 0x0, comparator = 0x0, abbreviate = false, abbrev_converter = 0x0, abbrev_abort = 0x0,   abbrev_full_comparator = 0x0}(gdb) n853            AssertArg(sortOperators[i] != 0);(gdb) 855            sortKey->ssup_cxt = CurrentMemoryContext;(gdb) 856            sortKey->ssup_collation = sortCollations[i];(gdb) 857            sortKey->ssup_nulls_first = nullsFirstFlags[i];(gdb) 858            sortKey->ssup_attno = attNums[i];(gdb) 860            sortKey->abbreviate = (i == 0);(gdb) 862            PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);(gdb) 848        for (i = 0; i < nkeys; i++)(gdb) 850            SortSupport sortKey = state->sortKeys + i;(gdb) 852            AssertArg(attNums[i] != 0);(gdb) 853            AssertArg(sortOperators[i] != 0);(gdb) 855            sortKey->ssup_cxt = CurrentMemoryContext;(gdb) 856            sortKey->ssup_collation = sortCollations[i];(gdb) 857            sortKey->ssup_nulls_first = nullsFirstFlags[i];(gdb) 858            sortKey->ssup_attno = attNums[i];(gdb) 860            sortKey->abbreviate = (i == 0);(gdb) 862            PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);(gdb) 848        for (i = 0; i < nkeys; i++)(gdb)

完成初始化,返回state

(gdb) 871        if (nkeys == 1 && !state->sortKeys->abbrev_converter)(gdb) n874        MemoryContextSwitchTo(oldcontext);(gdb) 876        return state;(gdb) p *state$11 = {status = TSS_INITIAL, nKeys = 2, randomAccess = false, bounded = false, boundUsed = false, bound = 0, tuples = true,   availMem = 4169704, allowedMem = 4194304, maxTapes = 0, tapeRange = 0, sortcontext = 0x2093290, tuplecontext = 0x20992c0,   tapeset = 0x0, comparetup = 0xa7525b , copytup = 0xa76247 ,   writetup = 0xa76de1 , readtup = 0xa76ec6 , memtuples = 0x209b310, memtupcount = 0,   memtupsize = 1024, growmemtuples = true, slabAllocatorUsed = false, slabMemoryBegin = 0x0, slabMemoryEnd = 0x0,   slabFreeHead = 0x0, read_buffer_size = 0, lastReturnedTuple = 0x0, currentRun = 0, mergeactive = 0x0, Level = 0,   destTape = 0, tp_fib = 0x0, tp_runs = 0x0, tp_dummy = 0x0, tp_tapenum = 0x0, activeTapes = 0, result_tape = -1,   current = 0, eof_reached = false, markpos_block = 0, markpos_offset = 0, markpos_eof = false, worker = -1, shared = 0x0,   nParticipants = -1, tupDesc = 0x208fa40, sortKeys = 0x20937c0, onlyKey = 0x0, abbrevNext = 10, indexInfo = 0x0,   estate = 0x0, heapRel = 0x0, indexRel = 0x0, enforceUnique = false, high_mask = 0, low_mask = 0, max_buckets = 0,   datumType = 0, datumTypeLen = 0, ru_start = {tv = {tv_sec = 0, tv_usec = 0}, ru = {ru_utime = {tv_sec = 0, tv_usec = 0},       ru_stime = {tv_sec = 0, tv_usec = 0}, {ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {        ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 0, __ru_minflt_word = 0}, {        ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0},       {ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0,         __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 0, __ru_nvcsw_word = 0}, {        ru_nivcsw = 0, __ru_nivcsw_word = 0}}}}(gdb)

tuplesort_puttupleslot
出现在循环中

for (;;)        {            //从outer plan中获取元组            slot = ExecProcNode(outerNode);            if (TupIsNull(slot))                break;//直至全部获取完毕            //排序            tuplesort_puttupleslot(tuplesortstate, slot);        }

以其中一个slot为例说明

(gdb) cContinuing.Breakpoint 2, tuplesort_puttupleslot (state=0x20933a8, slot=0x208f8c8) at tuplesort.c:14361436        MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);

输入参数,state为先前调用begin_heap返回的state,slot为outer node返回的元组slot

(gdb) p *slot$12 = {type = T_TupleTableSlot, tts_isempty = false, tts_shouldFree = false, tts_shouldFreeMin = false, tts_slow = false,   tts_tuple = 0x2090678, tts_tupleDescriptor = 0x7f061a300380, tts_mcxt = 0x208f270, tts_buffer = 103, tts_nvalid = 0,   tts_values = 0x208f928, tts_isnull = 0x208f960, tts_mintuple = 0x0, tts_minhdr = {t_len = 0, t_self = {ip_blkid = {        bi_hi = 0, bi_lo = 0}, ip_posid = 0}, t_tableOid = 0, t_data = 0x0}, tts_off = 0, tts_fixedTupleDescriptor = true}(gdb)

slot中的元组数据

(gdb) p *slot->tts_values$13 = 0(gdb) p *slot->tts_tuple$14 = {t_len = 56, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 1}, t_tableOid = 286759, t_data = 0x7f05ee0c4648}(gdb) p *slot->tts_tuple->t_data$15 = {t_choice = {t_heap = {t_xmin = 839, t_xmax = 0, t_field3 = {t_cid = 0, t_xvac = 0}}, t_datum = {datum_len_ = 839,       datum_typmod = 0, datum_typeid = 0}}, t_ctid = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 1}, t_infomask2 = 7,   t_infomask = 2306, t_hoff = 24 '\030', t_bits = 0x7f05ee0c465f ""}(gdb) p *slot->tts_tuple->t_data->t_bits$16 = 0 '\000'(gdb) x/16ux *slot->tts_tuple->t_data->t_bits0x0:    Cannot access memory at address 0x0(gdb) x/16ux slot->tts_tuple->t_data->t_bits0x7f05ee0c465f:    0x5a470b00    0x00003130    0x00000100    0x000001000x7f05ee0c466f:    0x00000100    0x00000100    0x00000100    0x000001000x7f05ee0c467f:    0x00000000    0x8f282800    0x000000da    0x400238000x7f05ee0c468f:    0x04200002    0x00000020    0x709fc800    0x709f9000(gdb) x/16bx slot->tts_tuple->t_data->t_bits0x7f05ee0c465f:    0x00    0x0b    0x47    0x5a    0x30    0x31    0x00    0x000x7f05ee0c4667:    0x00    0x01    0x00    0x00    0x00    0x01    0x00    0x00(gdb) x/16bc slot->tts_tuple->t_data->t_bits0x7f05ee0c465f:    0 '\000'    11 '\v'    71 'G'    90 'Z'    48 '0'    49 '1'    0 '\000'    0 '\000'0x7f05ee0c4667:    0 '\000'    1 '\001'    0 '\000'    0 '\000'    0 '\000'    1 '\001'    0 '\000'    0 '\000'(gdb) p *slot->tts_tupleDescriptor$17 = {natts = 7, tdtypeid = 286761, tdtypmod = -1, tdhasoid = false, tdrefcount = 2, constr = 0x0, attrs = 0x7f061a3003a0}(gdb) p *slot$18 = {type = T_TupleTableSlot, tts_isempty = false, tts_shouldFree = false, tts_shouldFreeMin = false, tts_slow = false,   tts_tuple = 0x2090678, tts_tupleDescriptor = 0x7f061a300380, tts_mcxt = 0x208f270, tts_buffer = 103, tts_nvalid = 0,   tts_values = 0x208f928, tts_isnull = 0x208f960, tts_mintuple = 0x0, tts_minhdr = {t_len = 0, t_self = {ip_blkid = {        bi_hi = 0, bi_lo = 0}, ip_posid = 0}, t_tableOid = 0, t_data = 0x0}, tts_off = 0, tts_fixedTupleDescriptor = true}(gdb) p *slot->tts_values[0]Cannot access memory at address 0x0(gdb) p slot->tts_values[0]$19 = 0(gdb) x/32bc slot->tts_tuple->t_data->t_bits0x7f05ee0c465f:    0 '\000'    11 '\v'    71 'G'    90 'Z'    48 '0'    49 '1'    0 '\000'    0 '\000'0x7f05ee0c4667:    0 '\000'    1 '\001'    0 '\000'    0 '\000'    0 '\000'    1 '\001'    0 '\000'    0 '\000'0x7f05ee0c466f:    0 '\000'    1 '\001'    0 '\000'    0 '\000'    0 '\000'    1 '\001'    0 '\000'    0 '\000'0x7f05ee0c4677:    0 '\000'    1 '\001'    0 '\000'    0 '\000'    0 '\000'    1 '\001'    0 '\000'    0 '\000'(gdb) x/32bx slot->tts_tuple->t_data->t_bits0x7f05ee0c465f:    0x00    0x0b    0x47    0x5a    0x30    0x31    0x00    0x000x7f05ee0c4667:    0x00    0x01    0x00    0x00    0x00    0x01    0x00    0x000x7f05ee0c466f:    0x00    0x01    0x00    0x00    0x00    0x01    0x00    0x000x7f05ee0c4677:    0x00    0x01    0x00    0x00    0x00    0x01    0x00    0x00

拷贝元组,并放到state->memtuples中

(gdb) n1443        COPYTUP(state, &stup, (void *) slot);(gdb) 1445        puttuple_common(state, &stup);(gdb) stepputtuple_common (state=0x20933a8, tuple=0x7ffe890e0b00) at tuplesort.c:16391639        Assert(!LEADER(state));(gdb) n1641        switch (state->status)(gdb) p state->status$20 = TSS_INITIAL(gdb) n1652                if (state->memtupcount >= state->memtupsize - 1)(gdb) p state->memtupcount$21 = 0(gdb) p state->memtupsize - 1$22 = 1023(gdb) n1657                state->memtuples[state->memtupcount++] = *tuple;(gdb) 1671                if (state->bounded &&(gdb) p state->bounded$23 = false(gdb) n1688                if (state->memtupcount < state->memtupsize && !LACKMEM(state))(gdb) 1689                    return;(gdb) 1743    }(gdb) tuplesort_puttupleslot (state=0x20933a8, slot=0x208f8c8) at tuplesort.c:14471447        MemoryContextSwitchTo(oldcontext);(gdb) 1448    }(gdb) (gdb) p state->memtuples[0]$25 = {tuple = 0x20993d8, datum1 = 1, isnull1 = false, tupindex = 0}

tuplesort_performsort

(gdb) info bNum     Type           Disp Enb Address            What1       breakpoint     keep y   0x0000000000a6ffa1 in tuplesort_begin_heap at tuplesort.c:812    breakpoint already hit 1 time2       breakpoint     keep y   0x0000000000a7119d in tuplesort_puttupleslot at tuplesort.c:1436    breakpoint already hit 1 time3       breakpoint     keep y   0x0000000000a71f45 in tuplesort_performsort at tuplesort.c:1792(gdb) del 2(gdb) cContinuing.Breakpoint 3, tuplesort_performsort (state=0x20933a8) at tuplesort.c:17921792        MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);(gdb)

输入参数

(gdb) p *state$27 = {status = TSS_BUILDRUNS, nKeys = 2, randomAccess = false, bounded = false, boundUsed = false, bound = 0,   tuples = true, availMem = 824360, allowedMem = 4194304, maxTapes = 16, tapeRange = 15, sortcontext = 0x2093290,   tuplecontext = 0x20992c0, tapeset = 0x2093a00, comparetup = 0xa7525b ,   copytup = 0xa76247 , writetup = 0xa76de1 , readtup = 0xa76ec6 ,   memtuples = 0x2611570, memtupcount = 26592, memtupsize = 37448, growmemtuples = false, slabAllocatorUsed = false,   slabMemoryBegin = 0x0, slabMemoryEnd = 0x0, slabFreeHead = 0x0, read_buffer_size = 0, lastReturnedTuple = 0x0,   currentRun = 2, mergeactive = 0x2093878, Level = 1, destTape = 2, tp_fib = 0x20938a0, tp_runs = 0x20938f8,   tp_dummy = 0x2093950, tp_tapenum = 0x20939a8, activeTapes = 0, result_tape = -1, current = 0, eof_reached = false,   markpos_block = 0, markpos_offset = 0, markpos_eof = false, worker = -1, shared = 0x0, nParticipants = -1,   tupDesc = 0x208fa40, sortKeys = 0x20937c0, onlyKey = 0x0, abbrevNext = 10, indexInfo = 0x0, estate = 0x0, heapRel = 0x0,   indexRel = 0x0, enforceUnique = false, high_mask = 0, low_mask = 0, max_buckets = 0, datumType = 0, datumTypeLen = 0,   ru_start = {tv = {tv_sec = 0, tv_usec = 0}, ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0,         tv_usec = 0}, {ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 0,         __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 0, __ru_minflt_word = 0}, {ru_majflt = 0,         __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0}, {        ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0,         __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 0, __ru_nvcsw_word = 0}, {        ru_nivcsw = 0, __ru_nivcsw_word = 0}}}}(gdb) p state->memtupsize$28 = 37448(gdb)

state->status状态已切换为TSS_BUILDRUNS

(gdb) n1795        if (trace_sort)(gdb) 1800        switch (state->status)(gdb) p state->status$29 = TSS_BUILDRUNS(gdb)

全部刷到磁盘上,归并排序

(gdb) n1864                dumptuples(state, true);(gdb) 1865                mergeruns(state);(gdb) 1866                state->eof_reached = false;(gdb) 1867                state->markpos_block = 0L;(gdb) 1868                state->markpos_offset = 0;(gdb) 1869                state->markpos_eof = false;(gdb) 1870                break;(gdb) 1878        if (trace_sort)(gdb) 1890        MemoryContextSwitchTo(oldcontext);(gdb) 1891    }(gdb)

到此,相信大家对"怎么使用PostgreSQL的tuplesort_performsort函数"有了更深的了解,不妨来实际操作一番吧!这里是网站,更多相关内容可以进入相关频道进行查询,关注我们,继续学习!

0