分析PostgreSQL中的大表连接
发表于:2024-11-15 作者:千家信息网编辑
千家信息网最后更新 2024年11月15日,这篇文章主要介绍"分析PostgreSQL中的大表连接",在日常操作中,相信很多人在分析PostgreSQL中的大表连接问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答"分
千家信息网最后更新 2024年11月15日分析PostgreSQL中的大表连接
这篇文章主要介绍"分析PostgreSQL中的大表连接",在日常操作中,相信很多人在分析PostgreSQL中的大表连接问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答"分析PostgreSQL中的大表连接"的疑惑有所帮助!接下来,请跟着小编一起来学习吧!
数据库配置
主机CPU 4核,内存4G,PG共享缓存128MB,work_mem 4MB。
测试数据
创建4张表,每张表1000w行,数据量约1G,是PG共享内存的8倍。
drop table t_big_1;drop table t_big_2;drop table t_big_3;drop table t_big_4;create table t_big_1(id int,c1 varchar(30),c2 varchar(30),c3 varchar(30));create table t_big_2(id int,c1 varchar(30),c2 varchar(30),c3 varchar(30));create table t_big_3(id int,c1 varchar(30),c2 varchar(30),c3 varchar(30));create table t_big_4(id int,c1 varchar(30),c2 varchar(30),c3 varchar(30));insert into t_big_1 select x,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3') from generate_series(1,10000000) as x;insert into t_big_2 select x,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3') from generate_series(1,10000000) as x;insert into t_big_3 select x,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3') from generate_series(1,10000000) as x;insert into t_big_4 select x,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3') from generate_series(1,10000000) as x;show shared_buffers;show effective_cache_size;show work_mem;select pg_size_pretty(pg_table_size('t_big_1'));select pg_size_pretty(pg_table_size('t_big_2'));select pg_size_pretty(pg_table_size('t_big_3'));select pg_size_pretty(pg_table_size('t_big_4'));analyze t_big_1,t_big_2,t_big_3,t_big_4;explain verboseselect a.*from t_big_1 a join t_big_2 b on a.c1 = b.c1;explain verboseselect a.id,b.c1,c.c2,d.c3 from t_big_1 a,t_big_2 b,t_big_3 c,t_big_4 dwhere a.id = b.id and b.id = c.id and c.id = d.id;explain verboseselect a.id,b.c1,c.c2,d.c3 from t_big_1 a,t_big_2 b,t_big_3 c,t_big_4 dwhere a.id = b.id and b.c1 = c.c1 and c.c2 = d.c2;
大表连接
未分析数据表前
[local:/data/run/pg12]:5120 pg12@testdb=# explain verbosepg12@testdb-# select a.id,b.c1,c.c2,d.c3 pg12@testdb-# from t_big_1 a,t_big_2 b,t_big_3 c,t_big_4 dpg12@testdb-# where a.id = b.id and b.c1 = c.c1 and c.c2 = d.c2; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Merge Join (cost=164722831406.26..1096915306139605248.00 rows=73127676034285903872 width=238) Output: a.id, b.c1, c.c2, d.c3 Merge Cond: ((b.c1)::text = (c.c1)::text) -> Sort (cost=58799667920.13..59102008117.66 rows=120936079012 width=82) Output: a.id, b.c1 Sort Key: b.c1 -> Merge Join (cost=2124653.55..1816202724.10 rows=120936079012 width=82) Output: a.id, b.c1 Merge Cond: (a.id = b.id) -> Sort (cost=894232.27..906527.40 rows=4918050 width=4) Output: a.id Sort Key: a.id -> Seq Scan on public.t_big_1 a (cost=0.00..213115.50 rows=4918050 width=4) Output: a.id -> Materialize (cost=1230421.27..1255011.52 rows=4918050 width=82) Output: b.c1, b.id -> Sort (cost=1230421.27..1242716.40 rows=4918050 width=82) Output: b.c1, b.id Sort Key: b.id -> Seq Scan on public.t_big_2 b (cost=0.00..213115.50 rows=4918050 width=82) Output: b.c1, b.id -> Materialize (cost=105923163486.13..106527843881.19 rows=120936079012 width=234) Output: c.c2, c.c1, d.c3 -> Sort (cost=105923163486.13..106225503683.66 rows=120936079012 width=234) Output: c.c2, c.c1, d.c3 Sort Key: c.c1 -> Merge Join (cost=3066006.55..1817144077.10 rows=120936079012 width=234) Output: c.c2, c.c1, d.c3 Merge Cond: ((c.c2)::text = (d.c2)::text) -> Sort (cost=1533003.27..1545298.40 rows=4918050 width=156) Output: c.c2, c.c1 Sort Key: c.c2 -> Seq Scan on public.t_big_3 c (cost=0.00..213115.50 rows=4918050 width=156) Output: c.c2, c.c1 -> Materialize (cost=1533003.27..1557593.52 rows=4918050 width=156) Output: d.c3, d.c2 -> Sort (cost=1533003.27..1545298.40 rows=4918050 width=156) Output: d.c3, d.c2 Sort Key: d.c2 -> Seq Scan on public.t_big_4 d (cost=0.00..213115.50 rows=4918050 width=156) Output: d.c3, d.c2(41 rows)
可以看到,未分析前,执行计划使用merge join,计划的cost是一个大数。
执行分析后
[local:/data/run/pg12]:5120 pg12@testdb=# explain (analyze,buffers,verbose)select a.id,b.c1,c.c2,d.c3 from t_big_1 a,t_big_2 b,t_big_3 c,t_big_4 dwhere a.id = b.id and b.c1 = c.c1 and c.c2 = d.c2; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=896126.19..2564935.91 rows=9999844 width=97) (actual time=393803.655..404902.025 rows=10000000 loops=1) Output: a.id, b.c1, c.c2, d.c3 Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=266 read=655676 dirtied=486717 written=486414, temp read=473954 written=486068 -> Parallel Hash Join (cost=895126.19..1563951.51 rows=4166602 width=97) (actual time=393672.896..398825.027 rows=3333333 loops=3) Output: a.id, b.c1, c.c2, d.c3 Hash Cond: ((c.c2)::text = (d.c2)::text) Buffers: shared hit=266 read=655676 dirtied=486717 written=486414, temp read=473954 written=486068 Worker 0: actual time=393629.565..399028.498 rows=3549817 loops=1 Buffers: shared hit=118 read=218079 dirtied=161599 written=161495, temp read=162307 written=161880 Worker 1: actual time=393585.994..399049.295 rows=3609509 loops=1 Buffers: shared hit=119 read=217313 dirtied=161014 written=160913, temp read=163324 written=160736 -> Parallel Hash Join (cost=592683.65..1070481.02 rows=4166681 width=66) (actual time=328335.871..378143.916 rows=3333333 loops=3) Output: a.id, b.c1, c.c2 Hash Cond: ((b.c1)::text = (c.c1)::text) Buffers: shared hit=63 read=491773 dirtied=352782 written=352575, temp read=267125 written=274312 Worker 0: actual time=328475.430..378240.528 rows=3325497 loops=1 Buffers: shared hit=25 read=164024 dirtied=117445 written=117373, temp read=88941 written=91448 Worker 1: actual time=328084.038..377943.176 rows=3311112 loops=1 Buffers: shared hit=29 read=163900 dirtied=117550 written=117481, temp read=88747 written=91320 -> Parallel Hash Join (cost=290238.33..609558.42 rows=4166681 width=35) (actual time=158380.042..198763.345 rows=3333333 loops=3) Output: a.id, b.c1 Hash Cond: (a.id = b.id) Buffers: shared hit=63 read=327838 dirtied=218847 written=218710, temp read=98317 written=100856 Worker 0: actual time=158518.764..199077.411 rows=3331104 loops=1 Buffers: shared hit=25 read=109394 dirtied=72893 written=72845, temp read=32790 written=33668 Worker 1: actual time=158520.409..198920.394 rows=3332824 loops=1 Buffers: shared hit=29 read=109323 dirtied=73002 written=72956, temp read=32934 written=33560 -> Parallel Seq Scan on public.t_big_1 a (cost=0.00..205601.81 rows=4166681 width=4) (actual time=239.830..75704.152 rows=3333333 loops=3) Output: a.id Buffers: shared read=163935 dirtied=109449 written=109391 Worker 0: actual time=239.584..75677.703 rows=3327794 loops=1 Buffers: shared read=54554 dirtied=36489 written=36468 Worker 1: actual time=240.355..75258.837 rows=3347802 loops=1 Buffers: shared read=54882 dirtied=36486 written=36467 -> Parallel Hash (cost=205601.81..205601.81 rows=4166681 width=35) (actual time=65812.428..65812.431 rows=3333333 loops=3) Output: b.c1, b.id Buckets: 65536 Batches: 256 Memory Usage: 3328kB Buffers: shared hit=32 read=163903 dirtied=109398 written=109319, temp written=70136 Worker 0: actual time=65812.900..65812.904 rows=3345876 loops=1 Buffers: shared hit=11 read=54840 dirtied=36404 written=36377, temp written=23428 Worker 1: actual time=65812.873..65812.875 rows=3321816 loops=1 Buffers: shared hit=15 read=54441 dirtied=36516 written=36489, temp written=23320 -> Parallel Seq Scan on public.t_big_2 b (cost=0.00..205601.81 rows=4166681 width=35) (actual time=1.490..47839.237 rows=3333333 loops=3) Output: b.c1, b.id Buffers: shared hit=32 read=163903 dirtied=109398 written=109319 Worker 0: actual time=1.464..47814.446 rows=3345876 loops=1 Buffers: shared hit=11 read=54840 dirtied=36404 written=36377 Worker 1: actual time=1.470..47104.413 rows=3321816 loops=1 Buffers: shared hit=15 read=54441 dirtied=36516 written=36489 -> Parallel Hash (cost=205601.81..205601.81 rows=4166681 width=62) (actual time=113720.080..113720.080 rows=3333333 loops=3) Output: c.c2, c.c1 Buckets: 65536 Batches: 512 Memory Usage: 2432kB Buffers: shared read=163935 dirtied=133935 written=133865, temp written=103856 Worker 0: actual time=113719.124..113719.124 rows=3332395 loops=1 Buffers: shared read=54630 dirtied=44552 written=44528, temp written=34648 Worker 1: actual time=113720.557..113720.558 rows=3329197 loops=1 Buffers: shared read=54577 dirtied=44548 written=44525, temp written=34576 -> Parallel Seq Scan on public.t_big_3 c (cost=0.00..205601.81 rows=4166681 width=62) (actual time=0.126..80608.068 rows=3333333 loops=3) Output: c.c2, c.c1 Buffers: shared read=163935 dirtied=133935 written=133865 Worker 0: actual time=0.260..80737.065 rows=3332395 loops=1 Buffers: shared read=54630 dirtied=44552 written=44528 Worker 1: actual time=0.049..80943.448 rows=3329197 loops=1 Buffers: shared read=54577 dirtied=44548 written=44525 -> Parallel Hash (cost=205601.02..205601.02 rows=4166602 width=62) (actual time=10279.722..10279.722 rows=3333333 loops=3) Output: d.c3, d.c2 Buckets: 65536 Batches: 512 Memory Usage: 2400kB Buffers: shared hit=32 read=163903 dirtied=133935 written=133839, temp written=103004 Worker 0: actual time=10222.812..10222.812 rows=3297904 loops=1 Buffers: shared hit=9 read=54055 dirtied=44154 written=44122, temp written=34236 Worker 1: actual time=10222.839..10222.839 rows=3258559 loops=1 Buffers: shared hit=6 read=53413 dirtied=43464 written=43432, temp written=33504 -> Parallel Seq Scan on public.t_big_4 d (cost=0.00..205601.02 rows=4166602 width=62) (actual time=0.163..7282.409 rows=3333333 loops=3) Output: d.c3, d.c2 Buffers: shared hit=32 read=163903 dirtied=133935 written=133839 Worker 0: actual time=0.108..7244.071 rows=3297904 loops=1 Buffers: shared hit=9 read=54055 dirtied=44154 written=44122 Worker 1: actual time=0.034..7223.191 rows=3258559 loops=1 Buffers: shared hit=6 read=53413 dirtied=43464 written=43432 Planning Time: 1.134 ms Execution Time: 405878.841 ms(83 rows)[local:/data/run/pg12]:5120 pg12@testdb=#
可以看到,执行计划中的成本回归一个正常的数值,算法使用Hash Join。由于内存不足,PG把数据拆分为N份,使用临时表来临时缓存Hash Table,使用不同的Batch来执行Join。
到此,关于"分析PostgreSQL中的大表连接"的学习就结束了,希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习,快去试试吧!若想继续学习更多相关知识,请继续关注网站,小编会继续努力为大家带来更多实用的文章!
分析
数据
学习
内存
更多
缓存
帮助
不同
实用
接下来
主机
大数
成本
数值
数据库
数据表
文章
方法
理论
知识
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
计算机服务器结构测试
帝国时代2 服务器错误
说服务器打瞌睡了请稍后重试
网络安全硬件设备配置和策略
英国网络安全法案
成都diy服务器
服务器接口设置
查看服务器的校时源
计算机网络安全的类型是什么
mac连vpn服务器
辽宁省在哪里找软件开发公司
河北数据库空投箱市场价格
网络安全大队重要吗
网络技术从犯
shell数据库配置文件
英语成绩比较差选软件开发
战斗之夜服务器奖励
超星 网络安全问题
传奇 人物数据库
数据库开发的语言基础教程
软件开发工作量评估合理性
河南科技大学互联网医院
网络技术专业的知识
汕尾专业软件开发
中国工业网络技术
兰考软件开发
软件开发行业分析201
无锡加工软件开发怎么样
其他软件开发行业增值税
株洲众邦互联网科技