0010-Hive多分隔符支持示例
发表于:2024-11-24 作者:千家信息网编辑
千家信息网最后更新 2024年11月24日,温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。1.问题描述如何将多个字符作为字段分割符的数据文件加载到Hive表中,事例数据如下:字段分隔符为"@#$"test1@#$test1nam
千家信息网最后更新 2024年11月24日0010-Hive多分隔符支持示例
温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。
1.问题描述
如何将多个字符作为字段分割符的数据文件加载到Hive表中,事例数据如下:
字段分隔符为"@#$"
test1@#$test1name@#$test2valuetest2@#$test2name@#$test2valuetest3@#$test3name@#$test4value
如何将上述事例数据加载到Hive表(multi_delimiter_test)中,表结构如下:
字段名 | 字段类型 |
---|---|
s1 | String |
s2 | String |
s3 | String |
2.Hive多分隔符支持
Hive在0.14及以后版本支持字段的多分隔符,参考https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe
3.实现方式
测试环境说明
测试环境为CDH5.11.1Hive版本为1.1.0操作系统为RedHat6.5
- 操作步骤
1.准备多分隔符文件并装载到HDFS对应目录
[ec2-user@ip-172-31-8-141 ~]$ cat multi_delimiter_test.dattest1@#$test1name@#$test2valuetest2@#$test2name@#$test2valuetest3@#$test3name@#$test4value [ec2-user@ip-172-31-8-141 ~]$ hadoop dfs -put multi_delimiter_test.dat /fayson/multi_delimiter_test[ec2-user@ip-172-31-8-141 ~]$ hadoop dfs -ls /fayson/multi_delimiter_testDEPRECATED: Use of this script to execute hdfs command is deprecated.Instead use the hdfs command for it.Found 1 items-rw-r--r-- 3 user_r supergroup 93 2017-08-23 03:24 /fayson/multi_delimiter_test/multi_delimiter_test.dat[ec2-user@ip-172-31-8-141 ~]$
2.基于准备好的多分隔符文件建表
create external table multi_delimiter_test(s1 string,s2 string,s3 string)ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="@#$")stored as textfile location '/fayson/multi_delimiter_test';
3.测试
2: jdbc:hive2://localhost:10000/default> select * from multi_delimiter_test;+--------------------------+--------------------------+--------------------------+--+| multi_delimiter_test.s1 | multi_delimiter_test.s2 | multi_delimiter_test.s3 |+--------------------------+--------------------------+--------------------------+--+| test1 | test1name | test2value || test2 | test2name | test2value || test3 | test3name | test4value |+--------------------------+--------------------------+--------------------------+--+
2: jdbc:hive2://localhost:10000/default> select count(*) from multi_delimiter_test;INFO : Ended Job = job_1503469952834_0006INFO : MapReduce Jobs Launched:INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.25 sec HDFS Read: 6755 HDFS Write: 2 SUCCESSINFO : Total MapReduce CPU Time Spent: 3 seconds 250 msecINFO : Completed executing command(queryId=hive_20170823041818_ce58aae2-e6db-4eed-b6af-652235a6e66a); Time taken: 33.286 secondsINFO : OK+------+--+| _c0 |+------+--+| 3 |+------+--+1 row selected (33.679 seconds)2: jdbc:hive2://localhost:10000/def
4.常见问题
1.执行count查询时报错
- 异常日志
通过beeline执行count查询时报错
2: jdbc:hive2://localhost:10000/default> select count(*) from multi_delimiter_test;INFO : Compiling command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97): select count(*) from multi_delimiter_testINFO : Semantic Analysis CompletedINFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)INFO : Completed compiling command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97); Time taken: 0.291 secondsINFO : Executing command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97): select count(*) from multi_delimiter_testINFO : Query ID = hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97INFO : Total jobs = 1INFO : Launching Job 1 out of 1INFO : Starting task [Stage-1:MAPRED] in serial modeINFO : Number of reduce tasks determined at compile time: 1INFO : In order to change the average load for a reducer (in bytes):INFO : set hive.exec.reducers.bytes.per.reducer=INFO : In order to limit the maximum number of reducers:INFO : set hive.exec.reducers.max=INFO : In order to set a constant number of reducers:INFO : set mapreduce.job.reduces=INFO : number of splits:1INFO : Submitting tokens for job: job_1503469952834_0002INFO : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (token for hive: HDFS_DELEGATION_TOKEN owner=hive/ip-172-31-8-141.ap-southeast-1.compute.internal@CLOUDERA.COM, renewer=yarn, realUser=, issueDate=1503475160778, maxDate=1504079960778, sequenceNumber=27, masterKeyId=9)INFO : The url to track the job: http://ip-172-31-9-186.ap-southeast-1.compute.internal:8088/proxy/application_1503469952834_0002/INFO : Starting Job = job_1503469952834_0002, Tracking URL = http://ip-172-31-9-186.ap-southeast-1.compute.internal:8088/proxy/application_1503469952834_0002/INFO : Kill Command = /opt/cloudera/parcels/CDH-5.10.2-1.cdh6.10.2.p0.5/lib/hadoop/bin/hadoop job -kill job_1503469952834_0002INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1INFO : 2017-08-23 03:59:32,039 Stage-1 map = 0%, reduce = 0%INFO : 2017-08-23 04:00:08,106 Stage-1 map = 100%, reduce = 100%ERROR : Ended Job = job_1503469952834_0002 with errorsERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTaskINFO : MapReduce Jobs Launched:INFO : Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAILINFO : Total MapReduce CPU Time Spent: 0 msecINFO : Completed executing command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97); Time taken: 48.737 secondsError: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
使用Hive的shell操作报错如下
Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 moreCaused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 moreCaused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 moreCaused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:147) ... 22 moreCaused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:323) at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:333) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:116) ... 22 moreCaused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105) at org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:140) at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:297) ... 24 moreFAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTaskMapReduce Jobs Launched:Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAILTotal MapReduce CPU Time Spent: 0 ms
- 问题原因分析
org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe类是hive-contrib.jar包里。
在执行非聚合类操作查询时,sql能正常执行,在进行聚合类函数操作时报错,说明在执行MapReduce任务时缺少jar依赖包;MapReduce属于yarn作业,所以yarn运行环境缺少hive-contrib.jar的依赖包。
- 解决方法
在CDH集群的所有节点一下操作,将hive-contrib-1.1.0-cdh6.10.2.jar包拷贝到yarn的lib目录下
sudo scp -r /opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib-1.1.0-cdh6.10.2.jar /opt/cloudera/parcels/CDH/lib/hadoop-yarn/lib/
重新运行count语句,执行成功
醉酒鞭名马,少年多浮夸! 岭南浣溪沙,呕吐酒肆下!挚友不肯放,数据玩的花!
温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。
分隔符
字段
数据
文件
时报
环境
问题
查询
测试
支持
温馨
事例
图片
手机
版本
目录
高清
无码
准备
提示
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
公安网络安全保密遵守
魔兽世界怀旧服各服务器战斗
网络技术工程师工资
数据库技术基础实验报告
ncre网络安全技术
软件开发兴趣爱好及特长
互联网科技感名片
徐汇区上门软件开发厂家售后保障
unity 游戏服务器
雅马哈373音乐数据库怎么用
三国志战略版服务器满了进不去
数据库purge是什么
抵赖行为是网络安全威胁吗
兰州博瑞四季软件开发有限公司
谷歌手机服务器怎么删除
云服务器 学习
网络安全教育内容讲座
商务数据库如何连编项目
方舟搭服务器主机
网络安全等级保护定级流程
三级网络技术 怎么准备
滴滴打车软件开发的代码
网游服务器延时
erp软件开发合同正式免费版
网络安全红蓝对抗总结
软件开发方法可以独立开发
数据库分区存储过程每天执行
设计一个学号数据库代码
服务器桌面有两个启动项
鹤壁市网络安全宣传周