分析PostgreSQL中的大表连接
发表于:2025-01-22 作者:千家信息网编辑
千家信息网最后更新 2025年01月22日,这篇文章主要介绍"分析PostgreSQL中的大表连接",在日常操作中,相信很多人在分析PostgreSQL中的大表连接问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答"分
千家信息网最后更新 2025年01月22日分析PostgreSQL中的大表连接
这篇文章主要介绍"分析PostgreSQL中的大表连接",在日常操作中,相信很多人在分析PostgreSQL中的大表连接问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答"分析PostgreSQL中的大表连接"的疑惑有所帮助!接下来,请跟着小编一起来学习吧!
数据库配置
主机CPU 4核,内存4G,PG共享缓存128MB,work_mem 4MB。
测试数据
创建4张表,每张表1000w行,数据量约1G,是PG共享内存的8倍。
drop table t_big_1;drop table t_big_2;drop table t_big_3;drop table t_big_4;create table t_big_1(id int,c1 varchar(30),c2 varchar(30),c3 varchar(30));create table t_big_2(id int,c1 varchar(30),c2 varchar(30),c3 varchar(30));create table t_big_3(id int,c1 varchar(30),c2 varchar(30),c3 varchar(30));create table t_big_4(id int,c1 varchar(30),c2 varchar(30),c3 varchar(30));insert into t_big_1 select x,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3') from generate_series(1,10000000) as x;insert into t_big_2 select x,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3') from generate_series(1,10000000) as x;insert into t_big_3 select x,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3') from generate_series(1,10000000) as x;insert into t_big_4 select x,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3') from generate_series(1,10000000) as x;show shared_buffers;show effective_cache_size;show work_mem;select pg_size_pretty(pg_table_size('t_big_1'));select pg_size_pretty(pg_table_size('t_big_2'));select pg_size_pretty(pg_table_size('t_big_3'));select pg_size_pretty(pg_table_size('t_big_4'));analyze t_big_1,t_big_2,t_big_3,t_big_4;explain verboseselect a.*from t_big_1 a join t_big_2 b on a.c1 = b.c1;explain verboseselect a.id,b.c1,c.c2,d.c3 from t_big_1 a,t_big_2 b,t_big_3 c,t_big_4 dwhere a.id = b.id and b.id = c.id and c.id = d.id;explain verboseselect a.id,b.c1,c.c2,d.c3 from t_big_1 a,t_big_2 b,t_big_3 c,t_big_4 dwhere a.id = b.id and b.c1 = c.c1 and c.c2 = d.c2;
大表连接
未分析数据表前
[local:/data/run/pg12]:5120 pg12@testdb=# explain verbosepg12@testdb-# select a.id,b.c1,c.c2,d.c3 pg12@testdb-# from t_big_1 a,t_big_2 b,t_big_3 c,t_big_4 dpg12@testdb-# where a.id = b.id and b.c1 = c.c1 and c.c2 = d.c2; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Merge Join (cost=164722831406.26..1096915306139605248.00 rows=73127676034285903872 width=238) Output: a.id, b.c1, c.c2, d.c3 Merge Cond: ((b.c1)::text = (c.c1)::text) -> Sort (cost=58799667920.13..59102008117.66 rows=120936079012 width=82) Output: a.id, b.c1 Sort Key: b.c1 -> Merge Join (cost=2124653.55..1816202724.10 rows=120936079012 width=82) Output: a.id, b.c1 Merge Cond: (a.id = b.id) -> Sort (cost=894232.27..906527.40 rows=4918050 width=4) Output: a.id Sort Key: a.id -> Seq Scan on public.t_big_1 a (cost=0.00..213115.50 rows=4918050 width=4) Output: a.id -> Materialize (cost=1230421.27..1255011.52 rows=4918050 width=82) Output: b.c1, b.id -> Sort (cost=1230421.27..1242716.40 rows=4918050 width=82) Output: b.c1, b.id Sort Key: b.id -> Seq Scan on public.t_big_2 b (cost=0.00..213115.50 rows=4918050 width=82) Output: b.c1, b.id -> Materialize (cost=105923163486.13..106527843881.19 rows=120936079012 width=234) Output: c.c2, c.c1, d.c3 -> Sort (cost=105923163486.13..106225503683.66 rows=120936079012 width=234) Output: c.c2, c.c1, d.c3 Sort Key: c.c1 -> Merge Join (cost=3066006.55..1817144077.10 rows=120936079012 width=234) Output: c.c2, c.c1, d.c3 Merge Cond: ((c.c2)::text = (d.c2)::text) -> Sort (cost=1533003.27..1545298.40 rows=4918050 width=156) Output: c.c2, c.c1 Sort Key: c.c2 -> Seq Scan on public.t_big_3 c (cost=0.00..213115.50 rows=4918050 width=156) Output: c.c2, c.c1 -> Materialize (cost=1533003.27..1557593.52 rows=4918050 width=156) Output: d.c3, d.c2 -> Sort (cost=1533003.27..1545298.40 rows=4918050 width=156) Output: d.c3, d.c2 Sort Key: d.c2 -> Seq Scan on public.t_big_4 d (cost=0.00..213115.50 rows=4918050 width=156) Output: d.c3, d.c2(41 rows)
可以看到,未分析前,执行计划使用merge join,计划的cost是一个大数。
执行分析后
[local:/data/run/pg12]:5120 pg12@testdb=# explain (analyze,buffers,verbose)select a.id,b.c1,c.c2,d.c3 from t_big_1 a,t_big_2 b,t_big_3 c,t_big_4 dwhere a.id = b.id and b.c1 = c.c1 and c.c2 = d.c2; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=896126.19..2564935.91 rows=9999844 width=97) (actual time=393803.655..404902.025 rows=10000000 loops=1) Output: a.id, b.c1, c.c2, d.c3 Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=266 read=655676 dirtied=486717 written=486414, temp read=473954 written=486068 -> Parallel Hash Join (cost=895126.19..1563951.51 rows=4166602 width=97) (actual time=393672.896..398825.027 rows=3333333 loops=3) Output: a.id, b.c1, c.c2, d.c3 Hash Cond: ((c.c2)::text = (d.c2)::text) Buffers: shared hit=266 read=655676 dirtied=486717 written=486414, temp read=473954 written=486068 Worker 0: actual time=393629.565..399028.498 rows=3549817 loops=1 Buffers: shared hit=118 read=218079 dirtied=161599 written=161495, temp read=162307 written=161880 Worker 1: actual time=393585.994..399049.295 rows=3609509 loops=1 Buffers: shared hit=119 read=217313 dirtied=161014 written=160913, temp read=163324 written=160736 -> Parallel Hash Join (cost=592683.65..1070481.02 rows=4166681 width=66) (actual time=328335.871..378143.916 rows=3333333 loops=3) Output: a.id, b.c1, c.c2 Hash Cond: ((b.c1)::text = (c.c1)::text) Buffers: shared hit=63 read=491773 dirtied=352782 written=352575, temp read=267125 written=274312 Worker 0: actual time=328475.430..378240.528 rows=3325497 loops=1 Buffers: shared hit=25 read=164024 dirtied=117445 written=117373, temp read=88941 written=91448 Worker 1: actual time=328084.038..377943.176 rows=3311112 loops=1 Buffers: shared hit=29 read=163900 dirtied=117550 written=117481, temp read=88747 written=91320 -> Parallel Hash Join (cost=290238.33..609558.42 rows=4166681 width=35) (actual time=158380.042..198763.345 rows=3333333 loops=3) Output: a.id, b.c1 Hash Cond: (a.id = b.id) Buffers: shared hit=63 read=327838 dirtied=218847 written=218710, temp read=98317 written=100856 Worker 0: actual time=158518.764..199077.411 rows=3331104 loops=1 Buffers: shared hit=25 read=109394 dirtied=72893 written=72845, temp read=32790 written=33668 Worker 1: actual time=158520.409..198920.394 rows=3332824 loops=1 Buffers: shared hit=29 read=109323 dirtied=73002 written=72956, temp read=32934 written=33560 -> Parallel Seq Scan on public.t_big_1 a (cost=0.00..205601.81 rows=4166681 width=4) (actual time=239.830..75704.152 rows=3333333 loops=3) Output: a.id Buffers: shared read=163935 dirtied=109449 written=109391 Worker 0: actual time=239.584..75677.703 rows=3327794 loops=1 Buffers: shared read=54554 dirtied=36489 written=36468 Worker 1: actual time=240.355..75258.837 rows=3347802 loops=1 Buffers: shared read=54882 dirtied=36486 written=36467 -> Parallel Hash (cost=205601.81..205601.81 rows=4166681 width=35) (actual time=65812.428..65812.431 rows=3333333 loops=3) Output: b.c1, b.id Buckets: 65536 Batches: 256 Memory Usage: 3328kB Buffers: shared hit=32 read=163903 dirtied=109398 written=109319, temp written=70136 Worker 0: actual time=65812.900..65812.904 rows=3345876 loops=1 Buffers: shared hit=11 read=54840 dirtied=36404 written=36377, temp written=23428 Worker 1: actual time=65812.873..65812.875 rows=3321816 loops=1 Buffers: shared hit=15 read=54441 dirtied=36516 written=36489, temp written=23320 -> Parallel Seq Scan on public.t_big_2 b (cost=0.00..205601.81 rows=4166681 width=35) (actual time=1.490..47839.237 rows=3333333 loops=3) Output: b.c1, b.id Buffers: shared hit=32 read=163903 dirtied=109398 written=109319 Worker 0: actual time=1.464..47814.446 rows=3345876 loops=1 Buffers: shared hit=11 read=54840 dirtied=36404 written=36377 Worker 1: actual time=1.470..47104.413 rows=3321816 loops=1 Buffers: shared hit=15 read=54441 dirtied=36516 written=36489 -> Parallel Hash (cost=205601.81..205601.81 rows=4166681 width=62) (actual time=113720.080..113720.080 rows=3333333 loops=3) Output: c.c2, c.c1 Buckets: 65536 Batches: 512 Memory Usage: 2432kB Buffers: shared read=163935 dirtied=133935 written=133865, temp written=103856 Worker 0: actual time=113719.124..113719.124 rows=3332395 loops=1 Buffers: shared read=54630 dirtied=44552 written=44528, temp written=34648 Worker 1: actual time=113720.557..113720.558 rows=3329197 loops=1 Buffers: shared read=54577 dirtied=44548 written=44525, temp written=34576 -> Parallel Seq Scan on public.t_big_3 c (cost=0.00..205601.81 rows=4166681 width=62) (actual time=0.126..80608.068 rows=3333333 loops=3) Output: c.c2, c.c1 Buffers: shared read=163935 dirtied=133935 written=133865 Worker 0: actual time=0.260..80737.065 rows=3332395 loops=1 Buffers: shared read=54630 dirtied=44552 written=44528 Worker 1: actual time=0.049..80943.448 rows=3329197 loops=1 Buffers: shared read=54577 dirtied=44548 written=44525 -> Parallel Hash (cost=205601.02..205601.02 rows=4166602 width=62) (actual time=10279.722..10279.722 rows=3333333 loops=3) Output: d.c3, d.c2 Buckets: 65536 Batches: 512 Memory Usage: 2400kB Buffers: shared hit=32 read=163903 dirtied=133935 written=133839, temp written=103004 Worker 0: actual time=10222.812..10222.812 rows=3297904 loops=1 Buffers: shared hit=9 read=54055 dirtied=44154 written=44122, temp written=34236 Worker 1: actual time=10222.839..10222.839 rows=3258559 loops=1 Buffers: shared hit=6 read=53413 dirtied=43464 written=43432, temp written=33504 -> Parallel Seq Scan on public.t_big_4 d (cost=0.00..205601.02 rows=4166602 width=62) (actual time=0.163..7282.409 rows=3333333 loops=3) Output: d.c3, d.c2 Buffers: shared hit=32 read=163903 dirtied=133935 written=133839 Worker 0: actual time=0.108..7244.071 rows=3297904 loops=1 Buffers: shared hit=9 read=54055 dirtied=44154 written=44122 Worker 1: actual time=0.034..7223.191 rows=3258559 loops=1 Buffers: shared hit=6 read=53413 dirtied=43464 written=43432 Planning Time: 1.134 ms Execution Time: 405878.841 ms(83 rows)[local:/data/run/pg12]:5120 pg12@testdb=#
可以看到,执行计划中的成本回归一个正常的数值,算法使用Hash Join。由于内存不足,PG把数据拆分为N份,使用临时表来临时缓存Hash Table,使用不同的Batch来执行Join。
到此,关于"分析PostgreSQL中的大表连接"的学习就结束了,希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习,快去试试吧!若想继续学习更多相关知识,请继续关注网站,小编会继续努力为大家带来更多实用的文章!
分析
数据
学习
内存
更多
缓存
帮助
不同
实用
接下来
主机
大数
成本
数值
数据库
数据表
文章
方法
理论
知识
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
星环科技服务器bios
阿里服务器如何挂mt4
本地策略网络安全设置
彩虹岛2009年服务器国服
医疗软件开发行业背景
1计算机网络技术的发展现状
战地1怎么看服务器数字id
我的世界饥饿服务器职业选择
网络安全审查是否合理
服务器开着 打雷
认识常见网络安全设备
曲洲老师网络安全手抄报简单漂亮
数据库设置关系时可设置的选项
mysql 跨服务器
0基础怎么学网络安全培训
生信数据库id总结及转换方法
服务器raid什么级别好
服务器机房温度多少度开空调
未转变者怎么找到以前的服务器
信息技术网络安全培训班
cs1.6服务器命令
网络安全督查什么时候结束
ebsco数据库基本检索
网络安全监测装置i型
0基础怎么学网络安全培训
服务器raid什么级别好
网络技术基础考试答案配伍题
qq网络安全服务
惠州数字软件开发价格走势
黄梅计算机软件开发怎么收费