使用happybase访问HBase出现Broken pipe问题---两个“惊天”大bug
发表于:2025-01-24 作者:千家信息网编辑
千家信息网最后更新 2025年01月24日,来源使用happybase通过thrift接口向HBase读取、写入数据时,出现Broken pipe的错误。排查步骤:1、查看hbase的日志:Java HotSpot(TM) 64-Bit Ser
千家信息网最后更新 2025年01月24日使用happybase访问HBase出现Broken pipe问题---两个“惊天”大bug
来源
使用happybase通过thrift接口向HBase读取、写入数据时,出现Broken pipe的错误。排查步骤:
1、查看hbase的日志:
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release17/05/12 18:08:41 INFO util.VersionInfo: HBase 1.2.0-cdh6.10.117/05/12 18:08:41 INFO util.VersionInfo: Source code repository file:///data/jenkins/workspace/generic-package-centos64-7-0/topdir/BUILD/hbase-1.2.0-cdh6.10.1 revision=Unknown17/05/12 18:08:41 INFO util.VersionInfo: Compiled by jenkins on Mon Mar 20 02:46:09 PDT 201717/05/12 18:08:41 INFO util.VersionInfo: From source with checksum c6d9864e1358df7e7f39d39a40338b4e17/05/12 18:08:41 INFO thrift.ThriftServerRunner: Using default thrift server type17/05/12 18:08:41 INFO thrift.ThriftServerRunner: Using thrift server type threadpool17/05/12 18:08:42 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-hbase.properties,hadoop-metrics2.properties17/05/12 18:08:42 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).17/05/12 18:08:42 INFO impl.MetricsSystemImpl: HBase metrics system started17/05/12 18:08:42 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog17/05/12 18:08:42 INFO http.HttpRequestLog: Http request log for http.requests.thrift is not defined17/05/12 18:08:42 INFO http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter)17/05/12 18:08:42 INFO http.HttpServer: Added global filter 'clickjackingprevention' (class=org.apache.hadoop.hbase.http.ClickjackingPreventionFilter)17/05/12 18:08:42 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context thrift17/05/12 18:08:42 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context static17/05/12 18:08:42 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs17/05/12 18:08:42 INFO http.HttpServer: Jetty bound to port 909517/05/12 18:08:42 INFO mortbay.log: jetty-6.1.26.cloudera.417/05/12 18:08:42 WARN mortbay.log: Can't reuse /tmp/Jetty_0_0_0_0_9095_thrift____.vqpz9l, using /tmp/Jetty_0_0_0_0_9095_thrift____.vqpz9l_512017503248018505817/05/12 18:08:43 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:909517/05/12 18:08:43 INFO thrift.ThriftServerRunner: starting TBoundedThreadPoolServer on /0.0.0.0:9090 with readTimeout 300000ms; min worker threads=128, max worker threads=1000, max queued requests=1000.../05/08 15:05:51 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x645132bf connecting to ZooKeeper ensemble=cdh-master-slave1:2181,cdh-slave2:2181,cdh-slave3:218117/05/08 15:05:51 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=cdh-master-slave1:2181,cdh-slave2:2181,cdh-slave3:2181 sessionTimeout=60000 watcher=hconnection-0x64513-master-slave1:2181,cdh-slave2:2181,cdh-slave3:2181, baseZNode=/hbase17/05/08 15:05:51 INFO zookeeper.ClientCnxn: Opening socket connection to server cdh-slave3/192.168.10.219:2181. Will not attempt to authenticate using SASL (unknown error)17/05/08 15:05:51 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.10.23:43170, server: cdh-slave3/192.168.10.219:218117/05/08 15:05:51 INFO zookeeper.ClientCnxn: Session establishment complete on server cdh-slave3/192.168.10.219:2181, sessionid = 0x35bd74a77802148, negotiated timeout = 60000[caitinggui@cdh-master-slave1 example]$ 17/05/08 15:32:50 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x35bd74a7780214817/05/08 15:32:51 INFO zookeeper.ZooKeeper: Session: 0x35bd74a77802148 closed17/05/08 15:32:51 INFO zookeeper.ClientCnxn: EventThread shut down17/05/08 15:38:53 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0xb876351 connecting to ZooKeeper ensemble=cdh-master-slave1:2181,cdh-slave2:2181,cdh-slave3:218117/05/08 15:38:53 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=cdh-master-slave1:2181,cdh-slave2:2181,cdh-slave3:2181 sessionTimeout=60000 watcher=hconnection-0xb8763510x0, quorum=cdh-master-slave1:2181,cdh-slave2:2181,cdh-slave3:2181, baseZNode=/hbase17/05/08 15:38:53 INFO zookeeper.ClientCnxn: Opening socket connection to server cdh-master-slave1/192.168.10.23:2181. Will not attempt to authenticate using SASL (unknown error)17/05/08 15:38:53 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.10.23:35526, server: cdh-master-slave1/192.168.10.23:218117/05/08 15:38:53 INFO zookeeper.ClientCnxn: Session establishment complete on server cdh-master-slave1/192.168.10.23:2181, sessionid = 0x15ba3ddc6cc90d4, negotiated timeout = 60000
初步推断是hbase设置了某个超时时间,导致连接断开
2、查看官方文档,但是没有发现很有意义的timeout参数
3、Google相似问题
查看相似的内容:
Uploaded image for project: 'HBase' HBaseHBASE-14926Hung ThriftServer; no timeout on read from client; if client crashes, worker thread gets stuck readingAgile Board ExportDetailsType: BugStatus:RESOLVEDPriority: MajorResolution: FixedAffects Version/s:2.0.0, 1.2.0, 1.1.2, 1.3.0, 1.0.3, 0.98.16Fix Version/s:2.0.0, 1.2.0, 1.3.0, 0.98.17Component/s:ThriftLabels:NoneHadoop Flags:ReviewedRelease Note: Adds a timeout to server read from clients. Adds new configs hbase.thrift.server.socket.read.timeout for setting read timeout on server socket in milliseconds. Default is 60000;DescriptionThrift server is hung. All worker threads are doing this:"thrift-worker-0" daemon prio=10 tid=0x00007f0bb95c2800 nid=0xf6a7 runnable [0x00007f0b956e0000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked <0x000000066d859490> (a java.io.BufferedInputStream) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601) at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) at org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289) at org.apache.hadoop.hbase.thrift.CallQueue$Call.run(CallQueue.java:64) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)They never recover.I don't have client side logs.We've been here before: HBASE-4967 "connected client thrift sockets should have a server side read timeout" but this patch only got applied to fb branch (and thrift has changed since then)ps:来源https://issues.apache.org/jira/browse/HBASE-14926
4、Google "hbase.thrift.server.socket.read.timeout"
可以看到一个网页内容:
问题背景测试环境是三台服务器搭建的Hadoop分布式环境。Hadoop版本是:hadoop-2.7.3;Hbase-1.2.4; zookeeper-3.4.9。 使用thrift c++接口向hbase中写入数据,每次都是刚开始写入正常,过一段时间就开始报错。 但之前使用的hbase-0.94.27版本就没遇到过该问题,配置也相同,一直用的好好地。thrift接口报错解决办法通过抓包可以看出,hbase server响应了RST包,导致连接中断。 通过 bin/hbase thrift start -threadpool命令可以readTimeout的设置为60s。thriftpool经过验证却是和这个设置有关,配置中没有配置过该项,通过查看代码发现60s是默认值,如果没有配置即按照以该值为准。因此在conf/hbase-site.xml中添加上配置即可: hbase.thrift.server.socket.read.timeout 6000000 eg:milisecond ps:来源http://blog.csdn.net/wwlhz/article/details/56012053
所以添加参数后,重启hbase thrift,发现问题解决
5、查看源码,可以看到
#https://github.com/apache/hbase/blob/master/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java... public static final String THRIFT_SERVER_SOCKET_READ_TIMEOUT_KEY = "hbase.thrift.server.socket.read.timeout"; public static final int THRIFT_SERVER_SOCKET_READ_TIMEOUT_DEFAULT = 60000;... int readTimeout = conf.getInt(THRIFT_SERVER_SOCKET_READ_TIMEOUT_KEY, THRIFT_SERVER_SOCKET_READ_TIMEOUT_DEFAULT); TServerTransport serverTransport = new TServerSocket( new TServerSocket.ServerSocketTransportArgs(). bindAddr(new InetSocketAddress(listenAddress, listenPort)). backlog(backlog). clientTimeout(readTimeout));
问题解决~~~
6、然而问题解决了吗?
实际上还是有问题,一段时间发现连续scan
大概20多分钟后,连接又被断开了,又是一次艰难的搜索,发现是hbase该版本自带的问题,它将所有连接(不管有没有在使用)都默认为idle的状态,然后有个hbase.thrift.connection.max-idletime
的配置,所以我将此项配置为31104000
(一年),如果是在CDH中,应该在管理页面配置,如图:
遇到问题一般步骤:
技术进步型:
1、查看日志,查看报错的地方,初步定位问题
2、查看官方文档
3、Google相似的问题,或者查看源码去定位问题
快速解决问题型:
1、查看日志,查看报错的地方,初步定位问题
2、Google相似问题
3、查看官方文档,或者查看源码
参考:
- [1]HBase thrift/thrift2 使用指南
问题
配置
相似
官方
接口
文档
日志
时间
来源
源码
版本
定位
内容
参数
地方
数据
步骤
环境
相同
艰难
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
大规模软件开发定制价格
数据库账号设置
数据库火车售票管理系统代码
第八届服务器联赛冠军
盐城企业软件开发常见问题
软件开发公司ceo简介
关于网络安全小学生表演节目
数据库系统评估安全性能
网络安全的法律规范
计算机考试数据库操作
linq取重复数据库
ibm刀塔服务器维修
最终幻想14最新服务器
服务器品牌排行前十名
宁波信息网络技术创新服务
河北网络技术服务五星服务
云服务器不能安装qq
服务器硬盘都怎么分区
软件开发和计算机视觉哪个好
oracle数据库怎么改实例名
金华科创板网络安全
华为服务器配置
二级分销系统软件开发公司
平阳oa软件开发团队
计算机网络技术基础课程总结
手游我的世界怎么做服务器
服务器打开8082端口
友好合作网络安全
威海联想服务器代理公司
win服务器安装补丁