PostgreSQL怎么调用mergeruns函数
发表于:2025-01-20 作者:千家信息网编辑
千家信息网最后更新 2025年01月20日,这篇文章主要介绍"PostgreSQL怎么调用mergeruns函数",在日常操作中,相信很多人在PostgreSQL怎么调用mergeruns函数问题上存在疑惑,小编查阅了各式资料,整理出简单好用的
千家信息网最后更新 2025年01月20日PostgreSQL怎么调用mergeruns函数
这篇文章主要介绍"PostgreSQL怎么调用mergeruns函数",在日常操作中,相信很多人在PostgreSQL怎么调用mergeruns函数问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答"PostgreSQL怎么调用mergeruns函数"的疑惑有所帮助!接下来,请跟着小编一起来学习吧!
TupleTableSlot
执行器在"tuple table"中存储元组,这个表是各自独立的TupleTableSlots链表.
/*---------- * The executor stores tuples in a "tuple table" which is a List of * independent TupleTableSlots. There are several cases we need to handle: * 1. physical tuple in a disk buffer page * 2. physical tuple constructed in palloc'ed memory * 3. "minimal" physical tuple constructed in palloc'ed memory * 4. "virtual" tuple consisting of Datum/isnull arrays * 执行器在"tuple table"中存储元组,这个表是各自独立的TupleTableSlots链表. * 有以下情况需要处理: * 1. 磁盘缓存页中的物理元组 * 2. 在已分配内存中构造的物理元组 * 3. 在已分配内存中构造的"minimal"物理元组 * 4. 含有Datum/isnull数组的"virtual"虚拟元组 * * The first two cases are similar in that they both deal with "materialized" * tuples, but resource management is different. For a tuple in a disk page * we need to hold a pin on the buffer until the TupleTableSlot's reference * to the tuple is dropped; while for a palloc'd tuple we usually want the * tuple pfree'd when the TupleTableSlot's reference is dropped. * 最上面2种情况跟"物化"元组的处理方式类似,但资源管理是不同的. * 对于在磁盘页中的元组,需要pin在缓存中直至TupleTableSlot依赖的元组被清除, * 而对于通过palloc分配的元组在TupleTableSlot依赖被清除后通常希望使用pfree释放 * * A "minimal" tuple is handled similarly to a palloc'd regular tuple. * At present, minimal tuples never are stored in buffers, so there is no * parallel to case 1. Note that a minimal tuple has no "system columns". * (Actually, it could have an OID, but we have no need to access the OID.) * "minimal"元组与通常的palloc分配的元组处理类似. * 截止目前为止,"minimal"元组不会存储在缓存中,因此对于第一种情况不会存在并行的问题. * 注意"minimal"没有"system columns"系统列 * (实际上,可以有OID,但不需要访问OID列) * * A "virtual" tuple is an optimization used to minimize physical data * copying in a nest of plan nodes. Any pass-by-reference Datums in the * tuple point to storage that is not directly associated with the * TupleTableSlot; generally they will point to part of a tuple stored in * a lower plan node's output TupleTableSlot, or to a function result * constructed in a plan node's per-tuple econtext. It is the responsibility * of the generating plan node to be sure these resources are not released * for as long as the virtual tuple needs to be valid. We only use virtual * tuples in the result slots of plan nodes --- tuples to be copied anywhere * else need to be "materialized" into physical tuples. Note also that a * virtual tuple does not have any "system columns". * "virtual"元组是用于在嵌套计划节点中拷贝时最小化物理数据的优化. * 所有通过引用传递指向与TupleTableSlot非直接相关的存储的元组的Datums使用, * 通常它们会指向存储在低层节点输出的TupleTableSlot中的元组的一部分, * 或者指向在计划节点的per-tuple内存上下文econtext中构造的函数结果. * 产生计划节点的时候有责任确保这些资源未被释放,确保virtual元组是有效的. * 我们使用计划节点中的结果slots中的虚拟元组 --- 元组会拷贝到其他地方需要"物化"到物理元组中. * 注意virtual元组不需要有"system columns" * * It is also possible for a TupleTableSlot to hold both physical and minimal * copies of a tuple. This is done when the slot is requested to provide * the format other than the one it currently holds. (Originally we attempted * to handle such requests by replacing one format with the other, but that * had the fatal defect of invalidating any pass-by-reference Datums pointing * into the existing slot contents.) Both copies must contain identical data * payloads when this is the case. * TupleTableSlot包含物理和minimal元组拷贝是可能的. * 在slot需要提供格式化而不是当前持有的格式时会出现这种情况. * (原始的情况是我们准备通过另外一种格式进行替换来处理这种请求,但在校验引用传递Datums时会出现致命错误) * 同时在这种情况下,拷贝必须含有唯一的数据payloads. * * The Datum/isnull arrays of a TupleTableSlot serve double duty. When the * slot contains a virtual tuple, they are the authoritative data. When the * slot contains a physical tuple, the arrays contain data extracted from * the tuple. (In this state, any pass-by-reference Datums point into * the physical tuple.) The extracted information is built "lazily", * ie, only as needed. This serves to avoid repeated extraction of data * from the physical tuple. * TupleTableSlot中的Datum/isnull数组有双重职责. * 在slot包含虚拟元组时,它们是authoritative(权威)数据. * 在slot包含物理元组时,时包含从元组中提取的数据的数组. * (在这种情况下,所有通过引用传递的Datums指向物理元组) * 提取的信息通过'lazily'在需要的时候才构建. * 这样可以避免从物理元组的重复数据提取. * * A TupleTableSlot can also be "empty", holding no valid data. This is * the only valid state for a freshly-created slot that has not yet had a * tuple descriptor assigned to it. In this state, tts_isempty must be * true, tts_shouldFree false, tts_tuple NULL, tts_buffer InvalidBuffer, * and tts_nvalid zero. * TupleTableSlot可能为"empty",没有有效数据. * 对于新鲜创建仍未分配描述的的slot来说这是唯一有效的状态. * 在这种状态下,tts_isempty必须为T,tts_shouldFree为F, tts_tuple为NULL, * tts_buffer为InvalidBuffer,tts_nvalid为0. * * The tupleDescriptor is simply referenced, not copied, by the TupleTableSlot * code. The caller of ExecSetSlotDescriptor() is responsible for providing * a descriptor that will live as long as the slot does. (Typically, both * slots and descriptors are in per-query memory and are freed by memory * context deallocation at query end; so it's not worth providing any extra * mechanism to do more. However, the slot will increment the tupdesc * reference count if a reference-counted tupdesc is supplied.) * tupleDescriptor只是简单的引用并没有通过TupleTableSlot中的代码进行拷贝. * ExecSetSlotDescriptor()的调用者有责任提供与slot生命周期一样的描述符. * (典型的,不管是slots还是描述符会在per-query内存中, * 并且会在查询结束时通过内存上下文的析构器释放,因此不需要提供额外的机制来处理. * 但是,如果使用了引用计数型tupdesc,slot会增加tupdesc引用计数) * * When tts_shouldFree is true, the physical tuple is "owned" by the slot * and should be freed when the slot's reference to the tuple is dropped. * 在tts_shouldFree为T的情况下,物理元组由slot持有,并且在slot引用元组被清除时释放内存. * * If tts_buffer is not InvalidBuffer, then the slot is holding a pin * on the indicated buffer page; drop the pin when we release the * slot's reference to that buffer. (tts_shouldFree should always be * false in such a case, since presumably tts_tuple is pointing at the * buffer page.) * 如tts_buffer不是InvalidBuffer,那么slot持有缓存页中的pin,在释放引用该buffer的slot时会清除该pin. * (tts_shouldFree通常来说应为F,因为tts_tuple会指向缓存页) * * tts_nvalid indicates the number of valid columns in the tts_values/isnull * arrays. When the slot is holding a "virtual" tuple this must be equal * to the descriptor's natts. When the slot is holding a physical tuple * this is equal to the number of columns we have extracted (we always * extract columns from left to right, so there are no holes). * tts_nvalid指示了tts_values/isnull数组中的有效列数. * 如果slot含有虚拟元组,该字段必须跟描述符的natts一样. * 在slot含有物理元组时,该字段等于我们提取的列数. * (我们通常从左到右提取列,因此不会有空洞存在) * * tts_values/tts_isnull are allocated when a descriptor is assigned to the * slot; they are of length equal to the descriptor's natts. * 在描述符分配给slot时tts_values/tts_isnull会被分配内存,长度与描述符natts长度一样. * * tts_mintuple must always be NULL if the slot does not hold a "minimal" * tuple. When it does, tts_mintuple points to the actual MinimalTupleData * object (the thing to be pfree'd if tts_shouldFreeMin is true). If the slot * has only a minimal and not also a regular physical tuple, then tts_tuple * points at tts_minhdr and the fields of that struct are set correctly * for access to the minimal tuple; in particular, tts_minhdr.t_data points * MINIMAL_TUPLE_OFFSET bytes before tts_mintuple. This allows column * extraction to treat the case identically to regular physical tuples. * 如果slot没有包含minimal元组,tts_mintuple通常必须为NULL. * 如含有,则tts_mintuple执行实际的MinimalTupleData对象(如tts_shouldFreeMin为T,则需要通过pfree释放内存). * 如果slot只有一个minimal而没有通常的物理元组,那么tts_tuple指向tts_minhdr, * 结构体的其他字段会被正确的设置为用于访问minimal元组. * 特别的, tts_minhdr.t_data指向tts_mintuple前的MINIMAL_TUPLE_OFFSET字节. * 这可以让列提取可以独立处理通常的物理元组. * * tts_slow/tts_off are saved state for slot_deform_tuple, and should not * be touched by any other code. * tts_slow/tts_off用于存储slot_deform_tuple状态,不应通过其他代码修改. *---------- */typedef struct TupleTableSlot{ NodeTag type;//Node标记 //如slot为空,则为T bool tts_isempty; /* true = slot is empty */ //是否需要pfree tts_tuple? bool tts_shouldFree; /* should pfree tts_tuple? */ //是否需要pfree tts_mintuple? bool tts_shouldFreeMin; /* should pfree tts_mintuple? */#define FIELDNO_TUPLETABLESLOT_SLOW 4 //为slot_deform_tuple存储状态? bool tts_slow; /* saved state for slot_deform_tuple */#define FIELDNO_TUPLETABLESLOT_TUPLE 5 //物理元组,如为虚拟元组则为NULL HeapTuple tts_tuple; /* physical tuple, or NULL if virtual */#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 6 //slot中的元组描述符 TupleDesc tts_tupleDescriptor; /* slot's tuple descriptor */ //slot所在的上下文 MemoryContext tts_mcxt; /* slot itself is in this context */ //元组缓存,如无则为InvalidBuffer Buffer tts_buffer; /* tuple's buffer, or InvalidBuffer */#define FIELDNO_TUPLETABLESLOT_NVALID 9 //tts_values中的有效值 int tts_nvalid; /* # of valid values in tts_values */#define FIELDNO_TUPLETABLESLOT_VALUES 10 //当前每个属性的值 Datum *tts_values; /* current per-attribute values */#define FIELDNO_TUPLETABLESLOT_ISNULL 11 //isnull数组 bool *tts_isnull; /* current per-attribute isnull flags */ //minimal元组,如无则为NULL MinimalTuple tts_mintuple; /* minimal tuple, or NULL if none */ //在minimal情况下的工作空间 HeapTupleData tts_minhdr; /* workspace for minimal-tuple-only case */#define FIELDNO_TUPLETABLESLOT_OFF 14 //slot_deform_tuple的存储状态 uint32 tts_off; /* saved state for slot_deform_tuple */ //不能被变更的描述符(固定描述符) bool tts_fixedTupleDescriptor; /* descriptor can't be changed */} TupleTableSlot;/* base tuple table slot type */typedef struct TupleTableSlot{ NodeTag type;//Node标记#define FIELDNO_TUPLETABLESLOT_FLAGS 1 uint16 tts_flags; /* 布尔状态;Boolean states */#define FIELDNO_TUPLETABLESLOT_NVALID 2 AttrNumber tts_nvalid; /* 在tts_values中有多少有效的values;# of valid values in tts_values */ const TupleTableSlotOps *const tts_ops; /* slot的实际实现;implementation of slot */#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 4 TupleDesc tts_tupleDescriptor; /* slot的元组描述符;slot's tuple descriptor */#define FIELDNO_TUPLETABLESLOT_VALUES 5 Datum *tts_values; /* 当前属性值;current per-attribute values */#define FIELDNO_TUPLETABLESLOT_ISNULL 6 bool *tts_isnull; /* 当前属性isnull标记;current per-attribute isnull flags */ MemoryContext tts_mcxt; /*内存上下文; slot itself is in this context */} TupleTableSlot;/* routines for a TupleTableSlot implementation *///TupleTableSlot的"小程序"struct TupleTableSlotOps{ /* Minimum size of the slot */ //slot的最小化大小 size_t base_slot_size; /* Initialization. */ //初始化方法 void (*init)(TupleTableSlot *slot); /* Destruction. */ //析构方法 void (*release)(TupleTableSlot *slot); /* * Clear the contents of the slot. Only the contents are expected to be * cleared and not the tuple descriptor. Typically an implementation of * this callback should free the memory allocated for the tuple contained * in the slot. * 清除slot中的内容。 * 只希望清除内容,而不希望清除元组描述符。 * 通常,这个回调的实现应该释放为slot中包含的元组分配的内存。 */ void (*clear)(TupleTableSlot *slot); /* * Fill up first natts entries of tts_values and tts_isnull arrays with * values from the tuple contained in the slot. The function may be called * with natts more than the number of attributes available in the tuple, * in which case it should set tts_nvalid to the number of returned * columns. * 用slot中包含的元组的值填充tts_values和tts_isnull数组的第一个natts条目。 * 在调用该函数时,natts可能多于元组中可用属性的数量,在这种情况下, * 应该将tts_nvalid设置为返回列的数量。 */ void (*getsomeattrs)(TupleTableSlot *slot, int natts); /* * Returns value of the given system attribute as a datum and sets isnull * to false, if it's not NULL. Throws an error if the slot type does not * support system attributes. * 将给定系统属性的值作为基准返回,如果不为NULL, * 则将isnull设置为false。如果slot类型不支持系统属性,则引发错误。 */ Datum (*getsysattr)(TupleTableSlot *slot, int attnum, bool *isnull); /* * Make the contents of the slot solely depend on the slot, and not on * underlying resources (like another memory context, buffers, etc). * 使slot的内容完全依赖于slot,而不是底层资源(如另一个内存上下文、缓冲区等)。 */ void (*materialize)(TupleTableSlot *slot); /* * Copy the contents of the source slot into the destination slot's own * context. Invoked using callback of the destination slot. * 将源slot的内容复制到目标slot自己的上下文中。 * 使用目标slot的回调函数调用。 */ void (*copyslot) (TupleTableSlot *dstslot, TupleTableSlot *srcslot); /* * Return a heap tuple "owned" by the slot. It is slot's responsibility to * free the memory consumed by the heap tuple. If the slot can not "own" a * heap tuple, it should not implement this callback and should set it as * NULL. * 返回slot"拥有"的堆元组。 * slot负责释放堆元组分配的内存。 * 如果slot不能"拥有"堆元组,它不应该实现这个回调函数,应该将它设置为NULL。 */ HeapTuple (*get_heap_tuple)(TupleTableSlot *slot); /* * Return a minimal tuple "owned" by the slot. It is slot's responsibility * to free the memory consumed by the minimal tuple. If the slot can not * "own" a minimal tuple, it should not implement this callback and should * set it as NULL. * 返回slot"拥有"的最小元组。 * slot负责释放最小元组分配的内存。 * 如果slot不能"拥有"最小元组,它不应该实现这个回调函数,应该将它设置为NULL。 */ MinimalTuple (*get_minimal_tuple)(TupleTableSlot *slot); /* * Return a copy of heap tuple representing the contents of the slot. The * copy needs to be palloc'd in the current memory context. The slot * itself is expected to remain unaffected. It is *not* expected to have * meaningful "system columns" in the copy. The copy is not be "owned" by * the slot i.e. the caller has to take responsibilty to free memory * consumed by the slot. * 返回表示slot内容的堆元组副本。 * 需要在当前内存上下文中对副本进行内存分配palloc。 * 预计slot本身不会受到影响。 * 它不希望在副本中有有意义的"系统列"。副本不是slot"拥有"的,即调用方必须负责释放slot消耗的内存。 */ HeapTuple (*copy_heap_tuple)(TupleTableSlot *slot); /* * Return a copy of minimal tuple representing the contents of the slot. The * copy needs to be palloc'd in the current memory context. The slot * itself is expected to remain unaffected. It is *not* expected to have * meaningful "system columns" in the copy. The copy is not be "owned" by * the slot i.e. the caller has to take responsibilty to free memory * consumed by the slot. * 返回表示slot内容的最小元组的副本。 * 需要在当前内存上下文中对副本进行palloc。 * 预计slot本身不会受到影响。 * 它不希望在副本中有有意义的"系统列"。副本不是slot"拥有"的,即调用方必须负责释放slot消耗的内存。 */ MinimalTuple (*copy_minimal_tuple)(TupleTableSlot *slot);};typedef struct tupleDesc{ int natts; /* tuple中的属性数量;number of attributes in the tuple */ Oid tdtypeid; /* tuple类型的组合类型ID;composite type ID for tuple type */ int32 tdtypmod; /* tuple类型的typmode;typmod for tuple type */ int tdrefcount; /* 依赖计数,如为-1,则没有依赖;reference count, or -1 if not counting */ TupleConstr *constr; /* 约束,如无则为NULL;constraints, or NULL if none */ /* attrs[N] is the description of Attribute Number N+1 */ //attrs[N]是第N+1个属性的描述符 FormData_pg_attribute attrs[FLEXIBLE_ARRAY_MEMBER];} *TupleDesc;
SortState
排序运行期状态信息
/* ---------------- * SortState information * 排序运行期状态信息 * ---------------- */typedef struct SortState{ //基类 ScanState ss; /* its first field is NodeTag */ //是否需要随机访问排序输出? bool randomAccess; /* need random access to sort output? */ //结果集是否存在边界? bool bounded; /* is the result set bounded? */ //如存在边界,需要多少个元组? int64 bound; /* if bounded, how many tuples are needed */ //是否已完成排序? bool sort_Done; /* sort completed yet? */ //是否使用有界值? bool bounded_Done; /* value of bounded we did the sort with */ //使用的有界值? int64 bound_Done; /* value of bound we did the sort with */ //tuplesort.c的私有状态 void *tuplesortstate; /* private state of tuplesort.c */ //是否worker? bool am_worker; /* are we a worker? */ //每个worker对应一个条目 SharedSortInfo *shared_info; /* one entry per worker */} SortState;/* ---------------- * Shared memory container for per-worker sort information * per-worker排序信息的共享内存容器 * ---------------- */typedef struct SharedSortInfo{ //worker个数? int num_workers; //排序机制 TuplesortInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];} SharedSortInfo;
TuplesortInstrumentation
报告排序统计的数据结构.
/* * Data structures for reporting sort statistics. Note that * TuplesortInstrumentation can't contain any pointers because we * sometimes put it in shared memory. * 报告排序统计的数据结构. * 注意TuplesortInstrumentation不能包含指针因为有时候会把该结构体放在共享内存中. */typedef enum{ SORT_TYPE_STILL_IN_PROGRESS = 0,//仍然在排序中 SORT_TYPE_TOP_N_HEAPSORT,//TOP N 堆排序 SORT_TYPE_QUICKSORT,//快速排序 SORT_TYPE_EXTERNAL_SORT,//外排序 SORT_TYPE_EXTERNAL_MERGE//外排序后的合并} TuplesortMethod;//排序方法typedef enum{ SORT_SPACE_TYPE_DISK,//需要用上磁盘 SORT_SPACE_TYPE_MEMORY//使用内存} TuplesortSpaceType;typedef struct TuplesortInstrumentation{ //使用的排序算法 TuplesortMethod sortMethod; /* sort algorithm used */ //排序使用空间类型 TuplesortSpaceType spaceType; /* type of space spaceUsed represents */ //空间消耗(以K为单位) long spaceUsed; /* space consumption, in kB */} TuplesortInstrumentation;
二、源码解读
mergeruns归并所有已完成初始轮的数据.
/* * mergeruns -- merge all the completed initial runs. * mergeruns -- 归并所有已完成的数据. * * This implements steps D5, D6 of Algorithm D. All input data has * already been written to initial runs on tape (see dumptuples). * 实现了算法D中的D5和D6. * 所有输入数据已写入到磁盘上(dumptuples函数负责完成). */static voidmergeruns(Tuplesortstate *state){ int tapenum, svTape, svRuns, svDummy; int numTapes; int numInputTapes; Assert(state->status == TSS_BUILDRUNS); Assert(state->memtupcount == 0); if (state->sortKeys != NULL && state->sortKeys->abbrev_converter != NULL) { /* * If there are multiple runs to be merged, when we go to read back * tuples from disk, abbreviated keys will not have been stored, and * we don't care to regenerate them. Disable abbreviation from this * point on. * 如果从磁盘上读回元组时存在多个运行需要被归并, * 缩写键不会被存储,并不关系是否需要重新生成它们. * 在这一刻起,禁用缩写. */ state->sortKeys->abbrev_converter = NULL; state->sortKeys->comparator = state->sortKeys->abbrev_full_comparator; /* Not strictly necessary, but be tidy */ //非严格性需要,但需要tidy state->sortKeys->abbrev_abort = NULL; state->sortKeys->abbrev_full_comparator = NULL; } /* * Reset tuple memory. We've freed all the tuples that we previously * allocated. We will use the slab allocator from now on. * 重置元组内存. * 已释放了先前分配的内存.从现在起使用slab分配器. */ MemoryContextDelete(state->tuplecontext); state->tuplecontext = NULL; /* * We no longer need a large memtuples array. (We will allocate a smaller * one for the heap later.) * 不再需要大块的memtuples数组.(将为后面的堆分配更小块的内存) */ FREEMEM(state, GetMemoryChunkSpace(state->memtuples)); pfree(state->memtuples); state->memtuples = NULL; /* * If we had fewer runs than tapes, refund the memory that we imagined we * would need for the tape buffers of the unused tapes. * 比起tapes,如果runs要少, 退还我们认为需要用于tape缓存但其实用不上的内存. * * numTapes and numInputTapes reflect the actual number of tapes we will * use. Note that the output tape's tape number is maxTapes - 1, so the * tape numbers of the used tapes are not consecutive, and you cannot just * loop from 0 to numTapes to visit all used tapes! * numTapes和numInputTapes反映了实际的使用tapes数. * 注意输出的tape编号是maxTapes - 1,因此已使用的tape编号不是连续的, * 不能简单的从0 - numTapes循环访问所有已使用的tapes. */ if (state->Level == 1) { numInputTapes = state->currentRun; numTapes = numInputTapes + 1; FREEMEM(state, (state->maxTapes - numTapes) * TAPE_BUFFER_OVERHEAD); } else { numInputTapes = state->tapeRange; numTapes = state->maxTapes; } /* * Initialize the slab allocator. We need one slab slot per input tape, * for the tuples in the heap, plus one to hold the tuple last returned * from tuplesort_gettuple. (If we're sorting pass-by-val Datums, * however, we don't need to do allocate anything.) * 初始化slab分配器.每一个输入的tape都有一个slab slot,对于堆中的元组, * 外加1用于保存最后从tuplesort_gettuple返回的元组. * (但是,如果通过传值的方式传递Datums,不需要执行内存分配) * * From this point on, we no longer use the USEMEM()/LACKMEM() mechanism * to track memory usage of individual tuples. * 从这点起,不再使用USEMEM()/LACKMEM()这种机制来跟踪独立元组的内存使用. */ if (state->tuples) init_slab_allocator(state, numInputTapes + 1); else init_slab_allocator(state, 0); /* * Allocate a new 'memtuples' array, for the heap. It will hold one tuple * from each input tape. * 为堆分配新的'memtuples'数组 * 对于每一个输入的tape,都会保存有一个元组. */ state->memtupsize = numInputTapes; state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple)); USEMEM(state, GetMemoryChunkSpace(state->memtuples)); /* * Use all the remaining memory we have available for read buffers among * the input tapes. * 使用所有可使用的剩余内存读取输入tapes之间的缓存. * * We don't try to "rebalance" the memory among tapes, when we start a new * merge phase, even if some tapes are inactive in the new phase. That * would be hard, because logtape.c doesn't know where one run ends and * another begins. When a new merge phase begins, and a tape doesn't * participate in it, its buffer nevertheless already contains tuples from * the next run on same tape, so we cannot release the buffer. That's OK * in practice, merge performance isn't that sensitive to the amount of * buffers used, and most merge phases use all or almost all tapes, * anyway. * 在新的阶段就算存在某些tapes不再活动,在开始新的归并阶段时,不再尝试在tapes之间重平衡内存. * 这是比较难以实现的,因为logtape.c不知道某个运行在哪里结束了,那个运行在哪里开始. * 在新的归并阶段开始时,tape不需要分享,尽管如此,它的缓冲区已包含来自同一tape上下一次运行需要的元组, * 因此不需要释放缓冲区. * 实践中,这是没有问题的,归并的性能对于缓存的使用不是性能敏感的,大多数归并阶段使用所有或大多数的tapes. */#ifdef TRACE_SORT if (trace_sort) elog(LOG, "worker %d using " INT64_FORMAT " KB of memory for read buffers among %d input tapes", state->worker, state->availMem / 1024, numInputTapes);#endif state->read_buffer_size = Max(state->availMem / numInputTapes, 0); USEMEM(state, state->read_buffer_size * numInputTapes); /* End of step D2: rewind all output tapes to prepare for merging */ //D2完成,倒回所有输出tapes准备归并 for (tapenum = 0; tapenum < state->tapeRange; tapenum++) LogicalTapeRewindForRead(state->tapeset, tapenum, state->read_buffer_size); for (;;) { //------------- 循环 /* * At this point we know that tape[T] is empty. If there's just one * (real or dummy) run left on each input tape, then only one merge * pass remains. If we don't have to produce a materialized sorted * tape, we can stop at this point and do the final merge on-the-fly. * 在这时候,我们已知tape[T]是空的. * 如果正好在每一个输入tape上只剩下某个run(实际或者虚拟的),那么只剩下一次归并. * 如果不需要产生物化排序后的tape,这时候可以停止并执行内存中的最终归并. */ if (!state->randomAccess && !WORKER(state)) { bool allOneRun = true; Assert(state->tp_runs[state->tapeRange] == 0); for (tapenum = 0; tapenum < state->tapeRange; tapenum++) { if (state->tp_runs[tapenum] + state->tp_dummy[tapenum] != 1) { allOneRun = false; break; } } if (allOneRun) { /* Tell logtape.c we won't be writing anymore */ //通知logtape.c,不再写入. LogicalTapeSetForgetFreeSpace(state->tapeset); /* Initialize for the final merge pass */ //为最终的归并做准备 beginmerge(state); state->status = TSS_FINALMERGE; return; } } /* Step D5: merge runs onto tape[T] until tape[P] is empty */ //步骤D5:归并runs到tape[T]中直至tape[P]为空 while (state->tp_runs[state->tapeRange - 1] || state->tp_dummy[state->tapeRange - 1]) { bool allDummy = true; for (tapenum = 0; tapenum < state->tapeRange; tapenum++) { if (state->tp_dummy[tapenum] == 0) { allDummy = false; break; } } if (allDummy) { state->tp_dummy[state->tapeRange]++; for (tapenum = 0; tapenum < state->tapeRange; tapenum++) state->tp_dummy[tapenum]--; } else mergeonerun(state); } /* Step D6: decrease level */ //步骤D6:往上层汇总 if (--state->Level == 0) break; /* rewind output tape T to use as new input */ //倒回输入的Tape T作为新的输入 LogicalTapeRewindForRead(state->tapeset, state->tp_tapenum[state->tapeRange], state->read_buffer_size); /* rewind used-up input tape P, and prepare it for write pass */ //倒回使用上的输入tape P,并为写入轮准备 LogicalTapeRewindForWrite(state->tapeset, state->tp_tapenum[state->tapeRange - 1]); state->tp_runs[state->tapeRange - 1] = 0; /* * reassign tape units per step D6; note we no longer care about A[] * 每一个步骤D6,重分配tape单元. * 注意我们不再关心A[]了. */ svTape = state->tp_tapenum[state->tapeRange]; svDummy = state->tp_dummy[state->tapeRange]; svRuns = state->tp_runs[state->tapeRange]; for (tapenum = state->tapeRange; tapenum > 0; tapenum--) { state->tp_tapenum[tapenum] = state->tp_tapenum[tapenum - 1]; state->tp_dummy[tapenum] = state->tp_dummy[tapenum - 1]; state->tp_runs[tapenum] = state->tp_runs[tapenum - 1]; } state->tp_tapenum[0] = svTape; state->tp_dummy[0] = svDummy; state->tp_runs[0] = svRuns; } /* * Done. Knuth says that the result is on TAPE[1], but since we exited * the loop without performing the last iteration of step D6, we have not * rearranged the tape unit assignment, and therefore the result is on * TAPE[T]. We need to do it this way so that we can freeze the final * output tape while rewinding it. The last iteration of step D6 would be * a waste of cycles anyway... * 大功告成!结果位于TAPE[1]中,但因为没有执行步骤D6中最后一个迭代就退出了循环, * 因此不需要重新整理tape单元分配,因此结果在TAPE[T]中. * 通过这种方法来处理一遍可以在倒回时冻结结果输出TAPE. * 步骤D6的最后一轮迭代会是浪费. */ state->result_tape = state->tp_tapenum[state->tapeRange]; if (!WORKER(state)) LogicalTapeFreeze(state->tapeset, state->result_tape, NULL); else worker_freeze_result_tape(state); state->status = TSS_SORTEDONTAPE; /* Release the read buffers of all the other tapes, by rewinding them. */ //通过倒回tapes,释放所有其他tapes的读缓存 for (tapenum = 0; tapenum < state->maxTapes; tapenum++) { if (tapenum != state->result_tape) LogicalTapeRewindForWrite(state->tapeset, tapenum); }}
三、跟踪分析
测试脚本
select * from t_sort order by c1,c2;
跟踪分析
(gdb) b mergerunsBreakpoint 1 at 0xa73508: file tuplesort.c, line 2570.(gdb) Note: breakpoint 1 also set at pc 0xa73508.Breakpoint 2 at 0xa73508: file tuplesort.c, line 2570.
输入参数
(gdb) cContinuing.Breakpoint 1, mergeruns (state=0x2b808a8) at tuplesort.c:25702570 Assert(state->status == TSS_BUILDRUNS);(gdb) p *state$1 = {status = TSS_BUILDRUNS, nKeys = 2, randomAccess = false, bounded = false, boundUsed = false, bound = 0, tuples = true, availMem = 3164456, allowedMem = 4194304, maxTapes = 16, tapeRange = 15, sortcontext = 0x2b80790, tuplecontext = 0x2b827a0, tapeset = 0x2b81480, comparetup = 0xa7525b, copytup = 0xa76247 , writetup = 0xa76de1 , readtup = 0xa76ec6 , memtuples = 0x7f0cfeb14050, memtupcount = 0, memtupsize = 37448, growmemtuples = false, slabAllocatorUsed = false, slabMemoryBegin = 0x0, slabMemoryEnd = 0x0, slabFreeHead = 0x0, read_buffer_size = 0, lastReturnedTuple = 0x0, currentRun = 3, mergeactive = 0x2b81350, Level = 1, destTape = 2, tp_fib = 0x2b80d58, tp_runs = 0x2b81378, tp_dummy = 0x2b813d0, tp_tapenum = 0x2b81428, activeTapes = 0, result_tape = -1, current = 0, eof_reached = false, markpos_block = 0, markpos_offset = 0, markpos_eof = false, worker = -1, shared = 0x0, nParticipants = -1, tupDesc = 0x2b67ae0, sortKeys = 0x2b80cc0, onlyKey = 0x0, abbrevNext = 10, indexInfo = 0x0, estate = 0x0, heapRel = 0x0, indexRel = 0x0, enforceUnique = false, high_mask = 0, low_mask = 0, max_buckets = 0, datumType = 0, datumTypeLen = 0, ru_start = {tv = {tv_sec = 0, tv_usec = 0}, ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, {ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 0, __ru_minflt_word = 0}, {ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0}, { ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 0, __ru_nvcsw_word = 0}, { ru_nivcsw = 0, __ru_nivcsw_word = 0}}}}(gdb)
排序键等信息
(gdb) n2571 Assert(state->memtupcount == 0);(gdb) 2573 if (state->sortKeys != NULL && state->sortKeys->abbrev_converter != NULL)(gdb) p *state->sortKeys$2 = {ssup_cxt = 0x2b80790, ssup_collation = 0, ssup_reverse = false, ssup_nulls_first = false, ssup_attno = 2, ssup_extra = 0x0, comparator = 0x4fd4af, abbreviate = true, abbrev_converter = 0x0, abbrev_abort = 0x0, abbrev_full_comparator = 0x0}(gdb) p *state->sortKeys->abbrev_converterCannot access memory at address 0x0
重置元组内存,不再需要大块的memtuples数组.
(gdb) n2593 MemoryContextDelete(state->tuplecontext);(gdb) 2594 state->tuplecontext = NULL;(gdb) (gdb) n2600 FREEMEM(state, GetMemoryChunkSpace(state->memtuples));(gdb) 2601 pfree(state->memtuples);(gdb) 2602 state->memtuples = NULL;(gdb) 2613 if (state->Level == 1)(gdb)
计算Tapes数
(gdb) n2615 numInputTapes = state->currentRun;(gdb) p state->currentRun$3 = 3(gdb) p state->Level$4 = 1(gdb) p state->tapeRange$5 = 15(gdb) p state->maxTapes$6 = 16(gdb) n2616 numTapes = numInputTapes + 1;(gdb) 2617 FREEMEM(state, (state->maxTapes - numTapes) * TAPE_BUFFER_OVERHEAD);(gdb) 2634 if (state->tuples)(gdb) p numInputTapes$7 = 3(gdb) p numTapes$8 = 4(gdb)
初始化slab分配器/为堆分配新的'memtuples'数组/倒回所有输出tapes准备归并
(gdb) n2635 init_slab_allocator(state, numInputTapes + 1);(gdb) n2643 state->memtupsize = numInputTapes;(gdb) 2644 state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple));(gdb) 2645 USEMEM(state, GetMemoryChunkSpace(state->memtuples));(gdb) p state->memtupsize$9 = 3(gdb) n2662 if (trace_sort)(gdb) 2667 state->read_buffer_size = Max(state->availMem / numInputTapes, 0);(gdb) 2668 USEMEM(state, state->read_buffer_size * numInputTapes);(gdb) p state->read_buffer_size$10 = 1385762(gdb) n2671 for (tapenum = 0; tapenum < state->tapeRange; tapenum++)(gdb) 2672 LogicalTapeRewindForRead(state->tapeset, tapenum, state->read_buffer_size);(gdb) p state->tapeRange$11 = 15(gdb) p state->status$12 = TSS_BUILDRUNS(gdb)
进入循环
2671 for (tapenum = 0; tapenum < state->tapeRange; tapenum++)(gdb) 2682 if (!state->randomAccess && !WORKER(state))(gdb) 2684 bool allOneRun = true;(gdb) p state->randomAccess$15 = false(gdb) p WORKER(state)$16 = 0(gdb)
循环判断allOneRun是否为F
2687 for (tapenum = 0; tapenum < state->tapeRange; tapenum++)(gdb) 2695 if (allOneRun)(gdb) p allOneRun$19 = true(gdb)
开始归并,并设置状态,返回
(gdb) n2698 LogicalTapeSetForgetFreeSpace(state->tapeset);(gdb) 2700 beginmerge(state);(gdb) 2701 state->status = TSS_FINALMERGE;(gdb) 2702 return;(gdb) 2779 }(gdb) tuplesort_performsort (state=0x2b808a8) at tuplesort.c:18661866 state->eof_reached = false;(gdb)
完成排序
(gdb) n1867 state->markpos_block = 0L;(gdb) 1868 state->markpos_offset = 0;(gdb) 1869 state->markpos_eof = false;(gdb) 1870 break;(gdb) 1878 if (trace_sort)(gdb) 1890 MemoryContextSwitchTo(oldcontext);(gdb) 1891 }(gdb) ExecSort (pstate=0x2b67640) at nodeSort.c:123123 estate->es_direction = dir;(gdb) cContinuing.
到此,关于"PostgreSQL怎么调用mergeruns函数"的学习就结束了,希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习,快去试试吧!若想继续学习更多相关知识,请继续关注网站,小编会继续努力为大家带来更多实用的文章!
内存
分配
排序
物理
数据
函数
情况
数组
状态
缓存
上下
存储
输入
上下文
副本
属性
指向
处理
最小
有效
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
蚁淘说互联网科技
数据库怎么检查触发器是否运行
上海国动网络技术有限公司
7.2魔兽世界数据库
关于软件开发方法正确的是
蟠龙金甲披风数据库名称
部队手机网络安全隐患
体育科技与互联网技术
当数据库的恢复模式为什么时
华三网络技术大赛基础知识
高速服务器租用
汽车通信网络技术视频
生产管理软件开发人员
广东软件开发解决方案应用
ipad上的数据库怎么使用
服务器负载均衡测试方案
网络安全渗透测试证书
数据库如何关闭文件
roblox进不去好友的服务器
软件开发关键业务是什么
图书数据库合同
教育学外文数据库
武汉掌易乐游网络技术
广西中讯软件开发公司
h2数据库老是重新连接
行业网络安全事件的处置流程
荒野乱斗服务器又崩溃
纯真ip数据库 asp
网络安全报价表
专业的软件开发哪家好