Oracle直方图统计信息的应用
Oracle直方图统计信息说明了表中数据的分布情况,用于在表中数据分布十分不均衡的情况下,指导CBO优化器选择最优的执行计划。以下例子说明了这一应用。
创建表
create table scott.t(id number);
创建索引
create index scott.idx_t_id on scott.t(id) compute statistics parallel;
插入数据
begin
for i in 1 .. 29990 loop
insert into scott.t values (1);
end loop;
commit;
end;
/
begin
for i in 29991 .. 30000 loop
insert into scott.t values (mod(i, 7));
end loop;
commit;
end;
/
查看数据分布
select id,
count(*) cardinality,
sum(count(*)) over(order by id range unbounded preceding) sum_cardinality
from scott.t
group by id;
ID CARDINALITY SUM_CARDINALITY
---------- ----------- ---------------
0 1 1
1 29991 29992
2 1 29993
3 2 29995
4 2 29997
5 2 29999
6 1 30000
可以看到表中数据的分布严重不均衡,ID为0、2和6的记录各只有1条,而ID为1的记录有29991条,ID为3、4、5的记录也各只有1条。
在这种情况下执行查询,看执行计划,可以看到,由于谓词ID=1选择性差,导致采用全表扫描
set autot trace exp
select * from scott.t where id=1;
执行计划
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 29991 | 89973 | 15 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 29991 | 89973 | 15 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"=1)
下面查询唯一性高的条件,由于谓词ID=0选择性好,因此采用了索引扫描
select * from scott.t where id=0;
执行计划
----------------------------------------------------------
Plan hash value: 371777749
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| IDX_T_ID | 1 | 3 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=0)
创建与distinct_keys=7相等数量的宽度均衡的直方图
begin
dbms_stats.gather_table_stats(ownname => 'SCOTT',
tabname => 'T',
estimate_percent => 100,
method_opt => 'FOR COLUMNS SIZE 7 ID',
degree => 4,
cascade => true);
end;
/
查询直方图buckets数据分布信息
col owner for a10
col table_name for a20
col column_name for a20
col endpoint_number for a20
col endpoint_value for a20
select h.owner,
h.table_name,
h.column_name,
to_char(h.endpoint_number) endpoint_number,
to_char(h.endpoint_value) endpoint_value
from dba_histograms h
where h.owner = 'SCOTT'
and h.table_name = 'T';
OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
---------- -------------------- -------------------- -------------------- --------------------
SCOTT T ID 1 0
SCOTT T ID 29992 1
SCOTT T ID 29993 2
SCOTT T ID 29995 3
SCOTT T ID 29997 4
SCOTT T ID 29999 5
SCOTT T ID 30000 6
在直方图统计信息的基础上如果不使用绑定变量,查询选择性低的谓词也是不会走索引的
select * from scott.t where id=1;
执行计划
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 29991 | 89973 | 15 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 29991 | 89973 | 15 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"=1)
但如果使用绑定变量,效果就不一样了
var i number
exec :i:=1;
select * from scott.t where id=:i;
执行计划
----------------------------------------------------------
Plan hash value: 371777749
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4286 | 12858 | 9 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| IDX_T_ID | 4286 | 12858 | 9 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=TO_NUMBER(:I))
以上测试说明,在表中数据分布不均衡的情况下执行选择性低的查询,如果有完整准确的直方图统计信息,并且采用绑定变量,CBO会选择索引扫描。如果没有直方图信息,CBO将不走索引而选择全表扫描。此处使用直方图统计信息使得查询性能得到了较大提升。