0010-Hive多分隔符支持示例
发表于:2025-02-03 作者:千家信息网编辑
千家信息网最后更新 2025年02月03日,温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。1.问题描述如何将多个字符作为字段分割符的数据文件加载到Hive表中,事例数据如下:字段分隔符为"@#$"test1@#$test1nam
千家信息网最后更新 2025年02月03日0010-Hive多分隔符支持示例
温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。
1.问题描述
如何将多个字符作为字段分割符的数据文件加载到Hive表中,事例数据如下:
字段分隔符为"@#$"
test1@#$test1name@#$test2valuetest2@#$test2name@#$test2valuetest3@#$test3name@#$test4value
如何将上述事例数据加载到Hive表(multi_delimiter_test)中,表结构如下:
字段名 | 字段类型 |
---|---|
s1 | String |
s2 | String |
s3 | String |
2.Hive多分隔符支持
Hive在0.14及以后版本支持字段的多分隔符,参考https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe
3.实现方式
测试环境说明
测试环境为CDH5.11.1Hive版本为1.1.0操作系统为RedHat6.5
- 操作步骤
1.准备多分隔符文件并装载到HDFS对应目录
[ec2-user@ip-172-31-8-141 ~]$ cat multi_delimiter_test.dattest1@#$test1name@#$test2valuetest2@#$test2name@#$test2valuetest3@#$test3name@#$test4value [ec2-user@ip-172-31-8-141 ~]$ hadoop dfs -put multi_delimiter_test.dat /fayson/multi_delimiter_test[ec2-user@ip-172-31-8-141 ~]$ hadoop dfs -ls /fayson/multi_delimiter_testDEPRECATED: Use of this script to execute hdfs command is deprecated.Instead use the hdfs command for it.Found 1 items-rw-r--r-- 3 user_r supergroup 93 2017-08-23 03:24 /fayson/multi_delimiter_test/multi_delimiter_test.dat[ec2-user@ip-172-31-8-141 ~]$
2.基于准备好的多分隔符文件建表
create external table multi_delimiter_test(s1 string,s2 string,s3 string)ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="@#$")stored as textfile location '/fayson/multi_delimiter_test';
3.测试
2: jdbc:hive2://localhost:10000/default> select * from multi_delimiter_test;+--------------------------+--------------------------+--------------------------+--+| multi_delimiter_test.s1 | multi_delimiter_test.s2 | multi_delimiter_test.s3 |+--------------------------+--------------------------+--------------------------+--+| test1 | test1name | test2value || test2 | test2name | test2value || test3 | test3name | test4value |+--------------------------+--------------------------+--------------------------+--+
2: jdbc:hive2://localhost:10000/default> select count(*) from multi_delimiter_test;INFO : Ended Job = job_1503469952834_0006INFO : MapReduce Jobs Launched:INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.25 sec HDFS Read: 6755 HDFS Write: 2 SUCCESSINFO : Total MapReduce CPU Time Spent: 3 seconds 250 msecINFO : Completed executing command(queryId=hive_20170823041818_ce58aae2-e6db-4eed-b6af-652235a6e66a); Time taken: 33.286 secondsINFO : OK+------+--+| _c0 |+------+--+| 3 |+------+--+1 row selected (33.679 seconds)2: jdbc:hive2://localhost:10000/def
4.常见问题
1.执行count查询时报错
- 异常日志
通过beeline执行count查询时报错
2: jdbc:hive2://localhost:10000/default> select count(*) from multi_delimiter_test;INFO : Compiling command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97): select count(*) from multi_delimiter_testINFO : Semantic Analysis CompletedINFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)INFO : Completed compiling command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97); Time taken: 0.291 secondsINFO : Executing command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97): select count(*) from multi_delimiter_testINFO : Query ID = hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97INFO : Total jobs = 1INFO : Launching Job 1 out of 1INFO : Starting task [Stage-1:MAPRED] in serial modeINFO : Number of reduce tasks determined at compile time: 1INFO : In order to change the average load for a reducer (in bytes):INFO : set hive.exec.reducers.bytes.per.reducer=INFO : In order to limit the maximum number of reducers:INFO : set hive.exec.reducers.max=INFO : In order to set a constant number of reducers:INFO : set mapreduce.job.reduces=INFO : number of splits:1INFO : Submitting tokens for job: job_1503469952834_0002INFO : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (token for hive: HDFS_DELEGATION_TOKEN owner=hive/ip-172-31-8-141.ap-southeast-1.compute.internal@CLOUDERA.COM, renewer=yarn, realUser=, issueDate=1503475160778, maxDate=1504079960778, sequenceNumber=27, masterKeyId=9)INFO : The url to track the job: http://ip-172-31-9-186.ap-southeast-1.compute.internal:8088/proxy/application_1503469952834_0002/INFO : Starting Job = job_1503469952834_0002, Tracking URL = http://ip-172-31-9-186.ap-southeast-1.compute.internal:8088/proxy/application_1503469952834_0002/INFO : Kill Command = /opt/cloudera/parcels/CDH-5.10.2-1.cdh6.10.2.p0.5/lib/hadoop/bin/hadoop job -kill job_1503469952834_0002INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1INFO : 2017-08-23 03:59:32,039 Stage-1 map = 0%, reduce = 0%INFO : 2017-08-23 04:00:08,106 Stage-1 map = 100%, reduce = 100%ERROR : Ended Job = job_1503469952834_0002 with errorsERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTaskINFO : MapReduce Jobs Launched:INFO : Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAILINFO : Total MapReduce CPU Time Spent: 0 msecINFO : Completed executing command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97); Time taken: 48.737 secondsError: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
使用Hive的shell操作报错如下
Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 moreCaused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 moreCaused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 moreCaused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:147) ... 22 moreCaused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:323) at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:333) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:116) ... 22 moreCaused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105) at org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:140) at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:297) ... 24 moreFAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTaskMapReduce Jobs Launched:Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAILTotal MapReduce CPU Time Spent: 0 ms
- 问题原因分析
org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe类是hive-contrib.jar包里。
在执行非聚合类操作查询时,sql能正常执行,在进行聚合类函数操作时报错,说明在执行MapReduce任务时缺少jar依赖包;MapReduce属于yarn作业,所以yarn运行环境缺少hive-contrib.jar的依赖包。
- 解决方法
在CDH集群的所有节点一下操作,将hive-contrib-1.1.0-cdh6.10.2.jar包拷贝到yarn的lib目录下
sudo scp -r /opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib-1.1.0-cdh6.10.2.jar /opt/cloudera/parcels/CDH/lib/hadoop-yarn/lib/
重新运行count语句,执行成功
醉酒鞭名马,少年多浮夸! 岭南浣溪沙,呕吐酒肆下!挚友不肯放,数据玩的花!
温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。
分隔符
字段
数据
文件
时报
环境
问题
查询
测试
支持
温馨
事例
图片
手机
版本
目录
高清
无码
准备
提示
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
河池软件开发制作
光明区综合网络技术服务费
力控怎么删除未引用数据库变量
app软件软件开发外包
搜索数据库占用率最大的表
网络安全伴我行主持
广州市芸晴互联网科技有限公司
网络安全口令教学视频
青岛易简付网络技术公司
网络安全规定会议内容
软件开发与设计 英语
电子商务网络技术网页制作
bim软件开发领头企业
odbc接数据库
静安区参考网络技术咨询热线
传递函数依赖 数据库
小学信息技术网络安全常识
远程服务器端口不安全
网络安全第三方是指什么意思
网络安全服务是什么技术
软件开发和网络营销的区别
上海蓝盟网络技术
ai 炒股软件开发
对软件开发过程的理解写论文
公司业务 软件开发
华晨网络技术有限公司
日照安卓软件开发哪家靠谱
北京数据库日志审计功能
网络安全综合显示大屏
嵌入式软件开发工作描述