千家信息网

hbase的四种压缩格式

发表于:2025-01-20 作者:千家信息网编辑
千家信息网最后更新 2025年01月20日,Hbase支持的压缩格式:hbase支持的压缩格式:GZ(GZIP),LZ0,LZ4,SnappyGZ:用于冷数据压缩,与Snappy和LZ0相比,GZIP的压缩率更高,但是更消耗CPU,解压/压缩速
千家信息网最后更新 2025年01月20日hbase的四种压缩格式

Hbase支持的压缩格式:

hbase支持的压缩格式:GZ(GZIP),LZ0,LZ4,Snappy

GZ:用于冷数据压缩,与Snappy和LZ0相比,GZIP的压缩率更高,但是更消耗CPU,解压/压缩速度更慢。

Snappy和LZ0:用于热数据压缩,占用CPU少,解压/压缩速度比GZ快,但是压缩率不如GZ高。

Snappy与LZ0相比,Snappy整体性能优于LZ0,Snappy压缩率比LZ0更低,但是解压/压缩速度更快。

LZ4与LZ0相比,LZ4的压缩率和LZ0的压缩率相差不多,但是LZ4的解压/压缩速度更快。

多数情况下,选择Snppy或LZ0是比较好的选择,因为它们的压缩开销底,能节省空间。

建表时指定压缩格式

hbase(main):013:0> create 'test3',{NAME=>'f1'},{NAME=>'f2',COMPRESSION=>'Snappy'}0 row(s) in 1.2740 seconds => Hbase::Table - test3hbase(main):014:0> desc 'test3'Table test3 is ENABLED                                                                                                                                             test3                                                                                                                                                              COLUMN FAMILIES DESCRIPTION                                                                                                                                        {NAME => 'f1', BLOOMFILTER => 'ROW',VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0',BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE =>'0'}                                                           {NAME => 'f2', BLOOMFILTER => 'ROW',VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0',BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE =>'0'}                                                        2 row(s) in 0.0300 secondshbase(main):002:0> create 'test4' ,{NAME=>'f1'},{NAME=>'f2',COMPRESSION=>'GZ'}0 row(s) in 1.4900 seconds=> Hbase::Table - test4hbase(main):003:0> desc 'test4'Table test4 is ENABLED                                                                                                                                              test4                                                                                                                                                               COLUMN FAMILIES DESCRIPTION                                                                                                                                         {NAME => 'f1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                           {NAME => 'f2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                             2 row(s) in 0.1290 seconds

建表后修改columnfamily压缩格式

正确做法是先disable表,再修改列族压缩格式,enbale表后做major_compact操作。

如下:

hbase(main):004:0> desc 'test1'Table test1 is ENABLED                                                                                                                                             test1                                                                                                                                                              COLUMN FAMILIES DESCRIPTION                                                                                                                                        {NAME => 'f1', BLOOMFILTER => 'ROW',VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0',BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE =>'0'}                                                          {NAME => 'f2', BLOOMFILTER => 'ROW',VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0',BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE =>'0'}                                                          2 row(s) in 0.0230 seconds hbase(main):005:0> disable 'test1'0 row(s) in 2.2870 seconds hbase(main):006:0> alter 'test1' ,{NAME=>'f1',COMPRESSION=>'Snappy'}Updating all regions with the new schema...1/1 regions updated.Done.0 row(s) in 1.9510 seconds hbase(main):007:0> enable 'test1'0 row(s) in 1.2820 seconds hbase(main):008:0> desc 'test1'Table test1 is ENABLED                                                                                                                                              test1                                                                                                                                                              COLUMN FAMILIES DESCRIPTION                                                                                                                                        {NAME => 'f1', BLOOMFILTER => 'ROW',VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0',BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE =>'0'}                                                         {NAME => 'f2', BLOOMFILTER => 'ROW',VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0',BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE =>'0'}                                                          2 row(s) in 0.0310 seconds hbase(main):009:0> major_compact 'test1'0 row(s) in 0.1380 seconds hbase(main):010:0> desc 'test1'Table test1 is ENABLED                                                                                                                                             test1                                                                                                                                                              COLUMN FAMILIES DESCRIPTION                                                                                                                                        {NAME => 'f1', BLOOMFILTER => 'ROW',VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0',BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE =>'0'}                                                        {NAME => 'f2', BLOOMFILTER => 'ROW',VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0',BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE =>'0'}                                                          2 row(s) in 0.0260 seconds

但是没有disable表,也不做major_compact,列族压缩格式也修改成功了(暂时不知道原因)。

hbase(main):001:0> desc 'test'Table test is ENABLED                                                                                                                                               test                                                                                                                                                                COLUMN FAMILIES DESCRIPTION                                                                                                                                         {NAME => 'fam1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                         1 row(s) in 0.3680 secondshbase(main):002:0> alter 'test',{NAME=>'fam1',COMPRESSION=>'LZ4'}Updating all regions with the new schema...1/1 regions updated.Done.0 row(s) in 2.0460 secondshbase(main):003:0> desc 'test'Table test is ENABLED                                                                                                                                               test                                                                                                                                                               COLUMN FAMILIES DESCRIPTION                                                                                                                                        {NAME => 'fam1', BLOOMFILTER =>'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS =>'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'LZ4', MIN_VERSIONS => '0',BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE =>'0'}                                                          1 row(s) in 0.0280 seconds


0