PostgreSQL中mdread函数有什么作用
发表于:2025-01-31 作者:千家信息网编辑
千家信息网最后更新 2025年01月31日,本篇内容主要讲解"PostgreSQL中mdread函数有什么作用",感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷,实用性强。下面就让小编来带大家学习"PostgreSQL中mdread函数有什
千家信息网最后更新 2025年01月31日PostgreSQL中mdread函数有什么作用
本篇内容主要讲解"PostgreSQL中mdread函数有什么作用",感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷,实用性强。下面就让小编来带大家学习"PostgreSQL中mdread函数有什么作用"吧!
PostgreSQL存储管理的mdread函数是magnetic disk存储管理中负责读取的函数.
一、数据结构
smgrsw
f_smgr函数指针结构体定义了独立的存储管理模块和smgr.c之间的API函数.
md是magnetic disk的缩写.
除了md,先前PG还支持Sony WORM optical disk jukebox and persistent main memory这两种存储方式,
但在后面只剩下magnetic disk,其余的已被废弃不再支持.
"magnetic disk"本身的名称也存在误导,实际上md可以支持操作系统提供标准文件系统的任何类型的设备.
/* * This struct of function pointers defines the API between smgr.c and * any individual storage manager module. Note that smgr subfunctions are * generally expected to report problems via elog(ERROR). An exception is * that smgr_unlink should use elog(WARNING), rather than erroring out, * because we normally unlink relations during post-commit/abort cleanup, * and so it's too late to raise an error. Also, various conditions that * would normally be errors should be allowed during bootstrap and/or WAL * recovery --- see comments in md.c for details. * 函数指针结构体定义了独立的存储管理模块和smgr.c之间的API函数. * 注意smgr子函数通常会通过elog(ERROR)报告错误. * 其中一个例外是smgr_unlink应该使用elog(WARNING),而不是把错误抛出, * 因为通过来说在事务提交/回滚清理期间才会解链接(unlinke)关系, * 因此这时候抛出错误就显得太晚了. * 同时,在bootstrap和/或WAL恢复期间,各种可能会出现错误的情况也应被允许 --- 详细可查看md.c中的注释. */typedef struct f_smgr{ void (*smgr_init) (void); /* may be NULL */ void (*smgr_shutdown) (void); /* may be NULL */ void (*smgr_close) (SMgrRelation reln, ForkNumber forknum); void (*smgr_create) (SMgrRelation reln, ForkNumber forknum, bool isRedo); bool (*smgr_exists) (SMgrRelation reln, ForkNumber forknum); void (*smgr_unlink) (RelFileNodeBackend rnode, ForkNumber forknum, bool isRedo); void (*smgr_extend) (SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, char *buffer, bool skipFsync); void (*smgr_prefetch) (SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum); void (*smgr_read) (SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, char *buffer); void (*smgr_write) (SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, char *buffer, bool skipFsync); void (*smgr_writeback) (SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, BlockNumber nblocks); BlockNumber (*smgr_nblocks) (SMgrRelation reln, ForkNumber forknum); void (*smgr_truncate) (SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks); void (*smgr_immedsync) (SMgrRelation reln, ForkNumber forknum); void (*smgr_pre_ckpt) (void); /* may be NULL */ void (*smgr_sync) (void); /* may be NULL */ void (*smgr_post_ckpt) (void); /* may be NULL */} f_smgr;/*md是magnetic disk的缩写.除了md,先前PG还支持Sony WORM optical disk jukebox and persistent main memory这两种存储方式,但在后面只剩下magnetic disk,其余的已被废弃不再支持."magnetic disk"本身的名称也存在误导,实际上md可以支持操作系统提供标准文件系统的任何类型的设备.*/static const f_smgr smgrsw[] = { /* magnetic disk */ { .smgr_init = mdinit, .smgr_shutdown = NULL, .smgr_close = mdclose, .smgr_create = mdcreate, .smgr_exists = mdexists, .smgr_unlink = mdunlink, .smgr_extend = mdextend, .smgr_prefetch = mdprefetch, .smgr_read = mdread, .smgr_write = mdwrite, .smgr_writeback = mdwriteback, .smgr_nblocks = mdnblocks, .smgr_truncate = mdtruncate, .smgr_immedsync = mdimmedsync, .smgr_pre_ckpt = mdpreckpt, .smgr_sync = mdsync, .smgr_post_ckpt = mdpostckpt }};
MdfdVec
magnetic disk存储管理在自己的描述符池中跟踪打开的文件描述符.
之所以这样做是因为便于支持超过os文件大小上限(通常是2GB)的关系.
为了达到这个目的,我们拆分关系为多个比OS文件大小上限要小的"segment"文件.
段大小通过pg_config.h中定义的RELSEG_SIZE配置参数设置.
/* * The magnetic disk storage manager keeps track of open file * descriptors in its own descriptor pool. This is done to make it * easier to support relations that are larger than the operating * system's file size limit (often 2GBytes). In order to do that, * we break relations up into "segment" files that are each shorter than * the OS file size limit. The segment size is set by the RELSEG_SIZE * configuration constant in pg_config.h. * magnetic disk存储管理在自己的描述符池中跟踪打开的文件描述符. * 之所以这样做是因为便于支持超过os文件大小上限(通常是2GB)的关系. * 为了达到这个目的,我们拆分关系为多个比OS文件大小上限要小的"segment"文件. * 段大小通过pg_config.h中定义的RELSEG_SIZE配置参数设置. * * On disk, a relation must consist of consecutively numbered segment * files in the pattern * -- Zero or more full segments of exactly RELSEG_SIZE blocks each * -- Exactly one partial segment of size 0 <= size < RELSEG_SIZE blocks * -- Optionally, any number of inactive segments of size 0 blocks. * The full and partial segments are collectively the "active" segments. * Inactive segments are those that once contained data but are currently * not needed because of an mdtruncate() operation. The reason for leaving * them present at size zero, rather than unlinking them, is that other * backends and/or the checkpointer might be holding open file references to * such segments. If the relation expands again after mdtruncate(), such * that a deactivated segment becomes active again, it is important that * such file references still be valid --- else data might get written * out to an unlinked old copy of a segment file that will eventually * disappear. * 在磁盘上,关系必须由按照某种模式连续编号的segment files组成. * -- 每个RELSEG_SIZE块的另段或多个完整段 * -- 大小满足0 <= size < RELSEG_SIZE blocks的一个部分段 * -- 可选的,大小为0 blocks的N个非活动段 * 完整和部分段统称为活动段.非活动段指的是哪些因为mdtruncate()操作而出现的包含数据但目前不需要的. * 保留这些大小为0的非活动段而不是unlinking的原因是其他进程和/或checkpointer进程可能 * 持有这些段的文件依赖. * 如果关系在mdtruncate()之后再次扩展了,这样一个无效的会重新变为活动段, * 因此文件依赖仍然保持有效是很重要的 * --- 否则数据可能写出到未经链接的旧segment file拷贝上,会时不时的出现数据丢失. * * File descriptors are stored in the per-fork md_seg_fds arrays inside * SMgrRelation. The length of these arrays is stored in md_num_open_segs. * Note that a fork's md_num_open_segs having a specific value does not * necessarily mean the relation doesn't have additional segments; we may * just not have opened the next segment yet. (We could not have "all * segments are in the array" as an invariant anyway, since another backend * could extend the relation while we aren't looking.) We do not have * entries for inactive segments, however; as soon as we find a partial * segment, we assume that any subsequent segments are inactive. * 文件描述符在SMgrRelation中的per-fork md_seg_fds数组存储. * 这些数组的长度存储在md_num_open_segs中. * 注意一个fork的md_num_open_segs有一个特定值并不必要意味着关系不能有额外的段, * 我们只是还没有打开下一个段而已. * (但不管怎样,我们不可能把"所有段都放在数组中"作为一个不变式看待, * 因为其他后台进程在尚未检索时已经扩展了关系) * 但是,我们不需要持有非活动段的条目,只要我们一旦发现部分段,那么就可以假定接下来的段是非活动的. * * The entire MdfdVec array is palloc'd in the MdCxt memory context. * 整个MdfdVec数组通过palloc在MdCxt内存上下文中分配. */typedef struct _MdfdVec{ //文件描述符池中该文件的编号 File mdfd_vfd; /* fd number in fd.c's pool */ //段号,从0起算 BlockNumber mdfd_segno; /* segment number, from 0 */} MdfdVec;
二、源码解读
mdread() - 从relation中读取相应的block.
源码较为简单,主要是调用FileRead函数执行实际的读取操作.
/* * mdread() -- Read the specified block from a relation. * mdread() -- 从relation中读取相应的block */voidmdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, char *buffer){ off_t seekpos;//seek的位置 int nbytes;//bytes MdfdVec *v;//md文件描述符向量数组 TRACE_POSTGRESQL_SMGR_MD_READ_START(forknum, blocknum, reln->smgr_rnode.node.spcNode, reln->smgr_rnode.node.dbNode, reln->smgr_rnode.node.relNode, reln->smgr_rnode.backend); //获取向量数组 v = _mdfd_getseg(reln, forknum, blocknum, false, EXTENSION_FAIL | EXTENSION_CREATE_RECOVERY); //获取block偏移 seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE)); //验证 Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE); //读取文件,读入buffer中,返回读取的字节数 nbytes = FileRead(v->mdfd_vfd, buffer, BLCKSZ, seekpos, WAIT_EVENT_DATA_FILE_READ); //跟踪 TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum, reln->smgr_rnode.node.spcNode, reln->smgr_rnode.node.dbNode, reln->smgr_rnode.node.relNode, reln->smgr_rnode.backend, nbytes, BLCKSZ); if (nbytes != BLCKSZ) { //读取的字节数不等于块大小,报错 if (nbytes < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not read block %u in file \"%s\": %m", blocknum, FilePathName(v->mdfd_vfd)))); /* * Short read: we are at or past EOF, or we read a partial block at * EOF. Normally this is an error; upper levels should never try to * read a nonexistent block. However, if zero_damaged_pages is ON or * we are InRecovery, we should instead return zeroes without * complaining. This allows, for example, the case of trying to * update a block that was later truncated away. * Short read:处于EOF或者在EOF之后,或者在EOF处读取了一个部分块. * 通常来说,这是一个错误,高层代码不应尝试读取一个不存在的block. * 但是,如果zero_damaged_pages参数设置为ON或者处于InRecovery状态,那么应该返回0而不报错. * 比如,这可以允许尝试更新一个块但随后就给截断的情况. */ if (zero_damaged_pages || InRecovery) MemSet(buffer, 0, BLCKSZ); else ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), errmsg("could not read block %u in file \"%s\": read only %d of %d bytes", blocknum, FilePathName(v->mdfd_vfd), nbytes, BLCKSZ))); }}
三、跟踪分析
测试脚本
11:15:11 (xdb@[local]:5432)testdb=# insert into t1(id) select generate_series(100,500);
启动gdb,跟踪
查看调用栈
(gdb) b mdreadBreakpoint 3 at 0x8b669b: file md.c, line 738.(gdb) cContinuing.Breakpoint 3, mdread (reln=0x2d09be0, forknum=MAIN_FORKNUM, blocknum=50, buffer=0x7f3823369c00 "") at md.c:738738 TRACE_POSTGRESQL_SMGR_MD_READ_START(forknum, blocknum,(gdb) bt#0 mdread (reln=0x2d09be0, forknum=MAIN_FORKNUM, blocknum=50, buffer=0x7f3823369c00 "") at md.c:738#1 0x00000000008b92d5 in smgrread (reln=0x2d09be0, forknum=MAIN_FORKNUM, blocknum=50, buffer=0x7f3823369c00 "") at smgr.c:628#2 0x00000000008793f9 in ReadBuffer_common (smgr=0x2d09be0, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=50, mode=RBM_NORMAL, strategy=0x0, hit=0x7ffd5fb2948b) at bufmgr.c:890#3 0x0000000000878cd4 in ReadBufferExtended (reln=0x7f3836e1e788, forkNum=MAIN_FORKNUM, blockNum=50, mode=RBM_NORMAL, strategy=0x0) at bufmgr.c:664#4 0x0000000000878bb1 in ReadBuffer (reln=0x7f3836e1e788, blockNum=50) at bufmgr.c:596#5 0x00000000004eeb96 in ReadBufferBI (relation=0x7f3836e1e788, targetBlock=50, bistate=0x0) at hio.c:87#6 0x00000000004ef387 in RelationGetBufferForTuple (relation=0x7f3836e1e788, len=32, otherBuffer=0, options=0, bistate=0x0, vmbuffer=0x7ffd5fb295ec, vmbuffer_other=0x0) at hio.c:415#7 0x00000000004df1f8 in heap_insert (relation=0x7f3836e1e788, tup=0x2ca6770, cid=0, options=0, bistate=0x0) at heapam.c:2468#8 0x0000000000709dda in ExecInsert (mtstate=0x2ca4c40, slot=0x2ca3418, planSlot=0x2ca3418, estate=0x2ca48d8, canSetTag=true) at nodeModifyTable.c:529#9 0x000000000070c475 in ExecModifyTable (pstate=0x2ca4c40) at nodeModifyTable.c:2159#10 0x00000000006e05cb in ExecProcNodeFirst (node=0x2ca4c40) at execProcnode.c:445#11 0x00000000006d552e in ExecProcNode (node=0x2ca4c40) at ../../../src/include/executor/executor.h:247#12 0x00000000006d7d66 in ExecutePlan (estate=0x2ca48d8, planstate=0x2ca4c40, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x2d41a30, execute_once=true) at execMain.c:1723#13 0x00000000006d5af8 in standard_ExecutorRun (queryDesc=0x2ca24b8, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:364#14 0x00000000006d5920 in ExecutorRun (queryDesc=0x2ca24b8, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:307#15 0x00000000008c1092 in ProcessQuery (plan=0x2d418b8, sourceText=0x2c7eec8 "insert into t1(id) select generate_series(100,500);", params=0x0, queryEnv=0x0, dest=0x2d41a30, ---Typeto continue, or q to quit--- completionTag=0x7ffd5fb29b80 "") at pquery.c:161#16 0x00000000008c29a1 in PortalRunMulti (portal=0x2ce4488, isTopLevel=true, setHoldSnapshot=false, dest=0x2d41a30, altdest=0x2d41a30, completionTag=0x7ffd5fb29b80 "") at pquery.c:1286#17 0x00000000008c1f7a in PortalRun (portal=0x2ce4488, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2d41a30, altdest=0x2d41a30, completionTag=0x7ffd5fb29b80 "") at pquery.c:799#18 0x00000000008bbf16 in exec_simple_query (query_string=0x2c7eec8 "insert into t1(id) select generate_series(100,500);") at postgres.c:1145#19 0x00000000008c01a1 in PostgresMain (argc=1, argv=0x2ca8af8, dbname=0x2ca8960 "testdb", username=0x2c7bba8 "xdb") at postgres.c:4182#20 0x000000000081e07c in BackendRun (port=0x2ca0940) at postmaster.c:4361#21 0x000000000081d7ef in BackendStartup (port=0x2ca0940) at postmaster.c:4033#22 0x0000000000819be9 in ServerLoop () at postmaster.c:1706#23 0x000000000081949f in PostmasterMain (argc=1, argv=0x2c79b60) at postmaster.c:1379#24 0x0000000000742941 in main (argc=1, argv=0x2c79b60) at main.c:228(gdb)
获取读取的偏移
(gdb) n744 v = _mdfd_getseg(reln, forknum, blocknum, false,(gdb) 747 seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));(gdb) p *v$1 = {mdfd_vfd = 26, mdfd_segno = 0}(gdb) p BLCKSZ$2 = 8192(gdb) p blocknum$3 = 50(gdb) p RELSEG_SIZE$4 = 131072(gdb) n749 Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);(gdb) p seekpos$5 = 409600(gdb)
执行读取操作
(gdb) n751 if (FileSeek(v->mdfd_vfd, seekpos, SEEK_SET) != seekpos)(gdb) 757 nbytes = FileRead(v->mdfd_vfd, buffer, BLCKSZ, WAIT_EVENT_DATA_FILE_READ);(gdb) 759 TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum,(gdb) p nbytes$6 = 8192(gdb) p *buffer$7 = 1 '\001'(gdb) n767 if (nbytes != BLCKSZ)(gdb) 792 }(gdb) smgrread (reln=0x2d09be0, forknum=MAIN_FORKNUM, blocknum=50, buffer=0x7f3823369c00 "\001") at smgr.c:629629 }(gdb)
到此,相信大家对"PostgreSQL中mdread函数有什么作用"有了更深的了解,不妨来实际操作一番吧!这里是网站,更多相关内容可以进入相关频道进行查询,关注我们,继续学习!
文件
函数
大小
存储
支持
活动
数组
管理
错误
跟踪
上限
实际
数据
系统
作用
参数
多个
结构
进程
操作系统
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
计算机网络技术硬件还是软件
气相色谱质谱联用数据库
数据库表连接查询
mc拔刀服务器
魔兽世界玛维影歌服务器人口普查
数据库date字段空字符串
mc多人生存服务器
西安软件开发招生
数据库怎么忽略表名大小写
浦东新区安装网络技术售后保障
数据库服务器网络异常
图乐软件开发公司
网络安全工作总结第一季度
网络技术岗位的要求
做软件开发如何接单
数据库名字怎么设置
tomcat如何查看数据库
安装数据库时找不见盘符
天音控股软件开发怎么样
修武法院网络安全活动
服务器开发项目
网络安全企业名单
wds数据库认证
网络安全单位有哪些
安卓国服光遇服务器
戴尔服务器三个键重启
民和租房软件开发
网络安全法的要求
mc0.13.1服务器
网络安全合格证申请报告