千家信息网

如何解决hive查询hdfs数据时遇到的两个hadoop配置问题

发表于:2024-11-19 作者:千家信息网编辑
千家信息网最后更新 2024年11月19日,这篇文章主要介绍如何解决hive查询hdfs数据时遇到的两个hadoop配置问题,文中介绍的非常详细,具有一定的参考价值,感兴趣的小伙伴们一定要看完!在用hive查询hdfs数据时,一条简单的语句se
千家信息网最后更新 2024年11月19日如何解决hive查询hdfs数据时遇到的两个hadoop配置问题

这篇文章主要介绍如何解决hive查询hdfs数据时遇到的两个hadoop配置问题,文中介绍的非常详细,具有一定的参考价值,感兴趣的小伙伴们一定要看完!

在用hive查询hdfs数据时,一条简单的语句select * from logs where time = '2014-10-16';导致hive报错,信息如下

Error: java.io.IOException: java.io.IOException: java.io.IOException: Filesystem closed        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256)        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171)        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198)        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184)        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:422)        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)Caused by: java.io.IOException: java.io.IOException: Filesystem closed        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)        at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344)        at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)        at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)        at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122)        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254)        ... 11 moreCaused by: java.io.IOException: Filesystem closed        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)        at java.io.DataInputStream.read(DataInputStream.java:100)        at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)        at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)        at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:209)        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)        at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339)        ... 15 more

网上查了下,大概意思是hadoop运行mapreduce任务时,由于多个datanode节点需要读取hdfs filesystem,此时,如果一个节点因为网络或者其他原因关掉了该filesystem,而其他节点仍然使用的cache中的filesystem,导致触发IOException,解决办法由两个:http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201207.mbox/%3CCAL=yAAE1mM-JRb=eJGkAtxWQ7AJ3e7WJCT9BhgWq7XDTNxrwfw@mail.gmail.com%3E

  1. 关闭jvm重用(没试过)

  2. 关闭hdfs filesystem重用。在hadoop配置文件core-site.xml中添



    fs.hdfs.impl.disable.cache    true

重启hadoop集群,重新运行该sql语句,IOException解决了,但是又出现新的问题

Container [pid=820,containerID=container_1413449189065_0001_01_000008] is running beyond virtual memory limits. Current usage: 182.7 MB of 1 GB physical memory used; 2.3 GB of 2.1 GB virtual memory used. Killing container.Dump of the process-tree for container_1413449189065_0001_01_000008 :        |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE        |- 820 319 820 820 (bash) 0 1 9695232 297 /bin/bash -c /opt/tools/jdk1.8.0_05/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  -Xmx200m -Djava.io.tmpdir=/opt/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1413449189065_0001/container_1413449189065_0001_01_000008/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/opt/tools/hadoop-2.4.1/logs/userlogs/application_1413449189065_0001/container_1413449189065_0001_01_000008 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 172.17.0.10 38988 attempt_1413449189065_0001_m_000001_3 8 1>/opt/tools/hadoop-2.4.1/logs/userlogs/application_1413449189065_0001/container_1413449189065_0001_01_000008/stdout 2>/opt/tools/hadoop-2.4.1/logs/userlogs/application_1413449189065_0001/container_1413449189065_0001_01_000008/stderr          |- 826 820 820 820 (java) 2007 146 2414481408 46477 /opt/tools/jdk1.8.0_05/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx200m -Djava.io.tmpdir=/opt/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1413449189065_0001/container_1413449189065_0001_01_000008/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/opt/tools/hadoop-2.4.1/logs/userlogs/application_1413449189065_0001/container_1413449189065_0001_01_000008 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 172.17.0.10 38988 attempt_1413449189065_0001_m_000001_3 8 Container killed on request. Exit code is 143Container exited with a non-zero exit code 143FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTaskMapReduce Jobs Launched: Job 0: Map: 2   Cumulative CPU: 6.02 sec   HDFS Read: 282824 HDFS Write: 8022 FAILTotal MapReduce CPU Time Spent: 6 seconds 20 msec

这个错误是说虚拟内存溢出,而物理内存才用了180多MB。解决办法是配置文件yarn-site.xml中关掉虚拟内存检测

    yarn.nodemanager.vmem-check-enabled    false    yarn.nodemanager.pmem-check-enabled    false

重启hadoop,问题解决。

以上是"如何解决hive查询hdfs数据时遇到的两个hadoop配置问题"这篇文章的所有内容,感谢各位的阅读!希望分享的内容对大家有帮助,更多相关知识,欢迎关注行业资讯频道!

0