千家信息网

Hive文件压缩测试

发表于:2024-11-11 作者:千家信息网编辑
千家信息网最后更新 2024年11月11日,hive上可以使用多种格式,比如纯文本,lzo、orc等,为了搞清楚它们之间的关系,特意做个测试。一、建立样例表hive> create table tbl( id int, name string
千家信息网最后更新 2024年11月11日Hive文件压缩测试

hive上可以使用多种格式,比如纯文本,lzo、orc等,为了搞清楚它们之间的关系,特意做个测试。


一、建立样例表

hive> create table tbl( id int, name string ) row format delimited fields terminated by '|' stored as textfile;

OK

Time taken: 0.338 seconds


hive> load data local inpath '/home/grid/users.txt' into table tbl;

Copying data from file:/home/grid/users.txt

Copying file: file:/home/grid/users.txt

Loading data to table default.tbl

Table default.tbl stats: [numFiles=1, numRows=0, totalSize=111, rawDataSize=0]

OK

Time taken: 0.567 seconds


hive> select * from tbl;

OK

1 Awyp

2 Azs

3 Als

4 Aww

5 Awyp2

6 Awyp3

7 Awyp4

8 Awyp5

9 Awyp6

10 Awyp7

11 Awyp8

12 Awyp5

13 Awyp9

14 Awyp20

Time taken: 0.237 seconds, Fetched: 14 row(s)

二、测试写入

1、无压缩

hive> set hive.exec.compress.output;

hive.exec.compress.output=false


hive>

>

> create table tbltxt as select * from tbl;

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1498527794024_0001, Tracking URL = http://hadoop1:8088/proxy/application_1498527794024_0001/

Kill Command = /opt/hadoop/bin/hadoop job -kill job_1498527794024_0001

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2017-06-27 10:55:29,906 Stage-1 map = 0%, reduce = 0%

2017-06-27 10:55:39,532 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.66 sec

MapReduce Total cumulative CPU time: 2 seconds 660 msec

Ended Job = job_1498527794024_0001

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-27_10-55-18_962_2187345348997213497-1/-ext-10001

Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/tbltxt

Table default.tbltxt stats: [numFiles=1, numRows=14, totalSize=111, rawDataSize=97]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 2.66 sec HDFS Read: 318 HDFS Write: 181 SUCCESS

Total MapReduce CPU Time Spent: 2 seconds 660 msec

OK

Time taken: 22.056 seconds


hive>

> show create table tbltxt;

OK

CREATE TABLE `tbltxt`(

`id` int,

`name` string)

ROW FORMAT SERDE

'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://hadoop1:9000/user/hive/warehouse/tbltxt'

TBLPROPERTIES (

'COLUMN_STATS_ACCURATE'='true',

'numFiles'='1',

'numRows'='14',

'rawDataSize'='97',

'totalSize'='111',

'transient_lastDdlTime'='1498532140')

Time taken: 0.202 seconds, Fetched: 18 row(s)


hive>

>

> select * from tbltxt;

OK

1 Awyp

2 Azs

3 Als

4 Aww

5 Awyp2

6 Awyp3

7 Awyp4

8 Awyp5

9 Awyp6

10 Awyp7

11 Awyp8

12 Awyp5

13 Awyp9

14 Awyp20

Time taken: 0.059 seconds, Fetched: 14 row(s)


hive>

>

> dfs -ls /user/hive/warehouse/tbltxt;

Found 1 items

-rwxr-xr-x 1 grid supergroup 111 2017-06-27 10:55 /user/hive/warehouse/tbltxt/000000_0


hive>

>

> dfs -cat /user/hive/warehouse/tbltxt/000000_0;

1Awyp

2Azs

3Als

4Aww

5Awyp2

6Awyp3

7Awyp4

8Awyp5

9Awyp6

10Awyp7

11Awyp8

12Awyp5

13Awyp9

14Awyp20


读取和写入的格式为:

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

数据可以正常读出,数据格式为纯文本,可以直接用cat查看


2、使用压缩,格式为默认的压缩

hive>

> set hive.exec.compress.output=true;

hive>

>

> set mapred.output.compression.codec;

mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec


可见当前压缩格式为默认的DefaultCodec。


hive>

> create table tbldefault as select * from tbl;

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1498527794024_0002, Tracking URL = http://hadoop1:8088/proxy/application_1498527794024_0002/

Kill Command = /opt/hadoop/bin/hadoop job -kill job_1498527794024_0002

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2017-06-27 11:14:44,845 Stage-1 map = 0%, reduce = 0%

2017-06-27 11:14:48,964 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.08 sec

MapReduce Total cumulative CPU time: 1 seconds 80 msec

Ended Job = job_1498527794024_0002

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-27_11-14-39_351_6035948930260680086-1/-ext-10001

Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/tbldefault

Table default.tbldefault stats: [numFiles=1, numRows=14, totalSize=76, rawDataSize=97]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 1.08 sec HDFS Read: 318 HDFS Write: 150 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 80 msec

OK

Time taken: 10.842 seconds


hive>

>

> show create table tbldefault;

OK

CREATE TABLE `tbldefault`(

`id` int,

`name` string)

ROW FORMAT SERDE

'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://hadoop1:9000/user/hive/warehouse/tbldefault'

TBLPROPERTIES (

'COLUMN_STATS_ACCURATE'='true',

'numFiles'='1',

'numRows'='14',

'rawDataSize'='97',

'totalSize'='76',

'transient_lastDdlTime'='1498533290')

Time taken: 0.044 seconds, Fetched: 18 row(s)


hive>

>

> select * from tbldefault;

OK

1 Awyp

2 Azs

3 Als

4 Aww

5 Awyp2

6 Awyp3

7 Awyp4

8 Awyp5

9 Awyp6

10 Awyp7

11 Awyp8

12 Awyp5

13 Awyp9

14 Awyp20

Time taken: 0.037 seconds, Fetched: 14 row(s)


hive>

>

> dfs -ls /user/hive/warehouse/tbldefault;

Found 1 items

-rwxr-xr-x 1 grid supergroup 76 2017-06-27 11:14 /user/hive/warehouse/tbldefault/000000_0.deflate

hive>

> dfs -cat /user/hive/warehouse/tbldefault/000000_0.deflate;

xws

dfX0)60K:HBhive>

>

>

可见在默认压缩下,表的读写格式与txt一样,但数据文件是经过默认库压缩的,后缀名为deflate,用户无法直接查看内容。意味着org.apache.hadoop.mapred.TextInputFormat这种input可以根据后缀识别默认压缩,并读出内容。


3、lzo压缩

hive>

> set mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;


hive>

>

> create table tbllzo as select * from tbl;

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1498527794024_0003, Tracking URL = http://hadoop1:8088/proxy/application_1498527794024_0003/

Kill Command = /opt/hadoop/bin/hadoop job -kill job_1498527794024_0003

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2017-06-27 11:29:08,436 Stage-1 map = 0%, reduce = 0%

2017-06-27 11:29:14,638 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec

MapReduce Total cumulative CPU time: 1 seconds 870 msec

Ended Job = job_1498527794024_0003

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-27_11-29-03_249_4340474818139134521-1/-ext-10001

Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/tbllzo

Table default.tbllzo stats: [numFiles=1, numRows=14, totalSize=106, rawDataSize=97]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 1.87 sec HDFS Read: 318 HDFS Write: 176 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 870 msec

OK

Time taken: 13.744 seconds


hive>

>

> show create table tbllzo;

OK

CREATE TABLE `tbllzo`(

`id` int,

`name` string)

ROW FORMAT SERDE

'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://hadoop1:9000/user/hive/warehouse/tbllzo'

TBLPROPERTIES (

'COLUMN_STATS_ACCURATE'='true',

'numFiles'='1',

'numRows'='14',

'rawDataSize'='97',

'totalSize'='106',

'transient_lastDdlTime'='1498534156')

Time taken: 0.044 seconds, Fetched: 18 row(s)


hive>

> select * from tbllzo;

OK

1 Awyp

2 Azs

3 Als

4 Aww

5 Awyp2

6 Awyp3

7 Awyp4

8 Awyp5

9 Awyp6

10 Awyp7

11 Awyp8

12 Awyp5

13 Awyp9

14 Awyp20

Time taken: 0.032 seconds, Fetched: 14 row(s)


hive>

>

> dfs -ls /user/hive/warehouse/tbllzo;

Found 1 items

-rwxr-xr-x 1 grid supergroup 106 2017-06-27 11:29 /user/hive/warehouse/tbllzo/000000_0.lzo_deflate

hive>

>

> dfs -cat /user/hive/warehouse/tbllzo/000000_0.lzo_deflate;

ob1Awyp

2Azs

3Als

4Aww

5Awyp2

6

7

8

9

10

1

125

13Awyp9

14Awyp20


在lz压缩下,表的读写格式仍然是org.apache.hadoop.mapred.TextInputFormat,数据文件后缀名为.lzo_deflate,用户无法直接查看内容。也就是说,org.apache.hadoop.mapred.TextInputFormat这种input可以识别lzo压缩并读出内容。(真强大!)


4、lzop压缩

hive>

> set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;


hive>

> create table tbllzop as select * from tbl;

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1498527794024_0004, Tracking URL = http://hadoop1:8088/proxy/application_1498527794024_0004/

Kill Command = /opt/hadoop/bin/hadoop job -kill job_1498527794024_0004

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2017-06-27 11:37:28,010 Stage-1 map = 0%, reduce = 0%

2017-06-27 11:37:32,127 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.1 sec

MapReduce Total cumulative CPU time: 2 seconds 100 msec

Ended Job = job_1498527794024_0004

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-27_11-37-23_099_3493082162039010112-1/-ext-10001

Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/tbllzop

Table default.tbllzop stats: [numFiles=1, numRows=14, totalSize=148, rawDataSize=97]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 2.1 sec HDFS Read: 318 HDFS Write: 219 SUCCESS

Total MapReduce CPU Time Spent: 2 seconds 100 msec

OK

Time taken: 10.233 seconds


hive>

>

> show create table tbllzop;

OK

CREATE TABLE `tbllzop`(

`id` int,

`name` string)

ROW FORMAT SERDE

'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://hadoop1:9000/user/hive/warehouse/tbllzop'

TBLPROPERTIES (

'COLUMN_STATS_ACCURATE'='true',

'numFiles'='1',

'numRows'='14',

'rawDataSize'='97',

'totalSize'='148',

'transient_lastDdlTime'='1498534653')

Time taken: 0.046 seconds, Fetched: 18 row(s)


hive>

>

>

> select * from tbllzop;

OK

1 Awyp

2 Azs

3 Als

4 Aww

5 Awyp2

6 Awyp3

7 Awyp4

8 Awyp5

9 Awyp6

10 Awyp7

11 Awyp8

12 Awyp5

13 Awyp9

14 Awyp20

Time taken: 0.033 seconds, Fetched: 14 row(s)


hive>

>

> dfs -ls /user/hive/warehouse/tbllzop;

Found 1 items

-rwxr-xr-x 1 grid supergroup 148 2017-06-27 11:37 /user/hive/warehouse/tbllzop/000000_0.lzo

hive>

>

> dfs -cat /user/hive/warehouse/tbllzop/000000_0.lzo;

ob1Awyp

2Azs

3Als

4Aww

5Awyp2

6

7

8

9

10

1

125

13Awyp9

14Awyp20


同样,在lzop压缩下,表的读写格式仍然是org.apache.hadoop.mapred.TextInputFormat,数据文件后缀名为.lzo,用户无法直接查看内容。org.apache.hadoop.mapred.TextInputFormat可以识别lzop压缩并读出内容



从以上几种情况可以看出,不管使用哪种压缩,在hive看来都属于纯文本(只是使用了不同方法压缩而已),使用org.apache.hadoop.mapred.TextInputFormat都可以读取,而且hive在插入时只会根据mapred.output.compression.codec来压缩(而不会管表定义的inputFormat是什么)。以下可以验证一下:


1、set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec时插入数据,数据文件是lzop的压缩,且可以正常读出。


hive> set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;


hive>

> create table tbltest1( id int, name string )

> stored as inputformat 'org.apache.hadoop.mapred.TextInputFormat'

> outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

OK

Time taken: 0.493 seconds


hive>

> insert into table tbltest1 select * from tbl;

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1498660018952_0001, Tracking URL = http://hadoop1:8088/proxy/application_1498660018952_0001/

Kill Command = /opt/hadoop/bin/hadoop job -kill job_1498660018952_0001

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2017-06-28 22:59:27,886 Stage-1 map = 0%, reduce = 0%

2017-06-28 22:59:36,427 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.25 sec

MapReduce Total cumulative CPU time: 2 seconds 250 msec

Ended Job = job_1498660018952_0001

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-28_22-59-14_730_4437480099583255943-1/-ext-10000

Loading data to table default.tbltest1

Table default.tbltest1 stats: [numFiles=1, numRows=14, totalSize=148, rawDataSize=97]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 2.25 sec HDFS Read: 318 HDFS Write: 220 SUCCESS

Total MapReduce CPU Time Spent: 2 seconds 250 msec

OK

Time taken: 24.151 seconds


hive>

> dfs -ls /user/hive/warehouse/tbltest1;

Found 1 items

-rwxr-xr-x 1 grid supergroup 148 2017-06-28 22:59 /user/hive/warehouse/tbltest1/000000_0.lzo


hive>

> select * from tbltest1;

OK

1 Awyp

2 Azs

3 Als

4 Aww

5 Awyp2

6 Awyp3

7 Awyp4

8 Awyp5

9 Awyp6

10 Awyp7

11 Awyp8

12 Awyp5

13 Awyp9

14 Awyp20

Time taken: 0.055 seconds, Fetched: 14 row(s)


2、set mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec时插入数据,数据文件是默认的压缩,且可以正常读出。


hive> set mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec;


hive> create table tbltest2( id int, name string )

> stored as inputformat 'org.apache.hadoop.mapred.TextInputFormat'

> outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

OK

Time taken: 0.142 seconds


hive> insert into table tbltest2 select * from tbl;

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1498660018952_0002, Tracking URL = http://hadoop1:8088/proxy/application_1498660018952_0002/

Kill Command = /opt/hadoop/bin/hadoop job -kill job_1498660018952_0002

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2017-06-28 23:09:06,439 Stage-1 map = 0%, reduce = 0%

2017-06-28 23:09:11,668 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.15 sec

MapReduce Total cumulative CPU time: 1 seconds 150 msec

Ended Job = job_1498660018952_0002

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-28_23-09-01_674_9172062679713398655-1/-ext-10000

Loading data to table default.tbltest2

Table default.tbltest2 stats: [numFiles=1, numRows=14, totalSize=76, rawDataSize=97]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 1.15 sec HDFS Read: 318 HDFS Write: 148 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 150 msec

OK

Time taken: 11.278 seconds


hive>

>

>

> dfs -ls /user/hive/warehouse/tbltest2;

Found 1 items

-rwxr-xr-x 1 grid supergroup 76 2017-06-28 23:09 /user/hive/warehouse/tbltest2/000000_0.deflate


hive>

> select * from tbltest2;

OK

1 Awyp

2 Azs

3 Als

4 Aww

5 Awyp2

6 Awyp3

7 Awyp4

8 Awyp5

9 Awyp6

10 Awyp7

11 Awyp8

12 Awyp5

13 Awyp9

14 Awyp20

Time taken: 0.035 seconds, Fetched: 14 row(s)


3、当表是orc格式时,会按照ORC格式进行压缩,不受mapred.output.compression.codec和hive.exec.compress.output影响。

hive> set hive.exec.compress.output=false;

hive> create table tbltest3( id int, name string )

> stored as orc tblproperties("orc.compress"="SNAPPY");

OK

Time taken: 0.08 seconds


hive> insert into table tbltest3 select * from tbl;

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1498660018952_0003, Tracking URL = http://hadoop1:8088/proxy/application_1498660018952_0003/

Kill Command = /opt/hadoop/bin/hadoop job -kill job_1498660018952_0003

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2017-06-28 23:30:29,865 Stage-1 map = 0%, reduce = 0%

2017-06-28 23:30:34,007 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.14 sec

MapReduce Total cumulative CPU time: 1 seconds 140 msec

Ended Job = job_1498660018952_0003

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-28_23-30-25_350_7458831371800658041-1/-ext-10000

Loading data to table default.tbltest3

Table default.tbltest3 stats: [numFiles=1, numRows=14, totalSize=365, rawDataSize=1288]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 1.14 sec HDFS Read: 318 HDFS Write: 439 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 140 msec

OK

Time taken: 9.963 seconds


hive> dfs -ls /user/hive/warehouse/tbltest3;

Found 1 items

-rwxr-xr-x 1 grid supergroup 365 2017-06-28 23:30 /user/hive/warehouse/tbltest3/000000_0


hive>

> dfs -cat /user/hive/warehouse/tbltest3/000000_0;

ORC

)

9

"

A+_Az_

+@DA+y-Az_A+_A++A+y-2345678,5A+y-9A+y-20

hive>

> show create table tbltest3;

OK

CREATE TABLE `tbltest3`(

`id` int,

`name` string)

ROW FORMAT SERDE

'org.apache.hadoop.hive.ql.io.orc.OrcSerde'

STORED AS INPUTFORMAT

'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

LOCATION

'hdfs://hadoop1:9000/user/hive/warehouse/tbltest3'

TBLPROPERTIES (

'COLUMN_STATS_ACCURATE'='true',

'numFiles'='1',

'numRows'='14',

'orc.compress'='SNAPPY',

'rawDataSize'='1288',

'totalSize'='365',

'transient_lastDdlTime'='1498663835')

Time taken: 0.217 seconds, Fetched: 19 row(s)


hive>

> select * from tbltest3;

OK

1 Awyp

2 Azs

3 Als

4 Aww

5 Awyp2

6 Awyp3

7 Awyp4

8 Awyp5

9 Awyp6

10 Awyp7

11 Awyp8

12 Awyp5

13 Awyp9

14 Awyp20

Time taken: 0.689 seconds, Fetched: 14 row(s)


可见当orc格式时,插入数据并不受压缩参数的影响。而且inputformat和outputformat已经不再是text。


三、总结

1、不管是无压缩,还是默认压缩,还是lzo和lzop等格式,对hive来说都是文本格式,可以根据数据文件的后缀名自动识别,写入时根据参数决定是否压缩以及压缩成什么格式

2、orc对hive来说是另外一种格式,不管参数如何指定,都会按照建表语名指定的格式来读取和写入。


格式 数据 文件 内容 后缀 文本 参数 用户 测试 还是 影响 不同 强大 之间 也就是 也就是说 只是 多种 情况 意味 数据库的安全要保护哪些东西 数据库安全各自的含义是什么 生产安全数据库录入 数据库的安全性及管理 数据库安全策略包含哪些 海淀数据库安全审计系统 建立农村房屋安全信息数据库 易用的数据库客户端支持安全管理 连接数据库失败ssl安全错误 数据库的锁怎样保障安全 广西无忧网络技术有限公司 计算机网络技术应用题技巧 2018企业网络安全报告 网络安全etf能买吗 数据库属性窗口在哪里设置 深圳市仟讯网络技术公司 上海网络技术服务方案 软著是软件开发之前可以申请吗 广宗实验小学网络安全宣传周 华为服务器管理口ip忘了 成都交友软件开发大概多少钱 维护中俄两国网络安全 软件开发劳动协议转外包协议 关系型数据库管理系统是哪个 高考填报志愿无法连接数据库 vmware服务器虚拟化解决方案 软件开发评分标准及评分细则 虚拟机本地数据库安装后启动不了 全球网络安全概念 江西软件开发的基本工资是多少 企业网络安全防范措施 工业网络技术考试论文 大数据视角下的网络安全攻防战 中关村软件开发 方舟官服服务器都有哪些 海南科技产业互联网有哪些 原软件开发的4种常见模型 山东正盈通达网络技术有限公司 关系型数据库管理系统是哪个 mc服务器多少钱
0