千家信息网

Hbase集群挂掉的示例分析

发表于:2024-11-16 作者:千家信息网编辑
千家信息网最后更新 2024年11月16日,这篇文章主要介绍了Hbase集群挂掉的示例分析,具有一定借鉴价值,感兴趣的朋友可以参考下,希望大家阅读完这篇文章之后大有收获,下面让小编带着大家一起了解一下。版本信息cdh-6.0.1hadoop-3
千家信息网最后更新 2024年11月16日Hbase集群挂掉的示例分析

这篇文章主要介绍了Hbase集群挂掉的示例分析,具有一定借鉴价值,感兴趣的朋友可以参考下,希望大家阅读完这篇文章之后大有收获,下面让小编带着大家一起了解一下。

版本信息

  • cdh-6.0.1

  • hadoop-3.0

  • hbase-2.0.0

问题

想在空闲时候重启一下hbase释放一下内存,顺便修改一下yarn的一些配置,结果停掉后,hbase起不来了,错误信息就是hbase:namespace表is not online,master一直初始化,具体错误信息:

15:41:59.313 [ProcExecTimeout] WARN  org.apache.hadoop.hbase.master.assignment.AssignmentManager - STUCK Region-In-Transition rit=OPENING, location=node4,16020,1589648302672, table=real_time_data, region=74cac15d22e99800ad0ace14c9ed74d6 15:41:59.313 [ProcExecTimeout] WARN  org.apache.hadoop.hbase.master.assignment.AssignmentManager - STUCK Region-In-Transition rit=OPENING, location=node3,16020,1596598630022, table=real_time_data, region=8e68891d5826c09974d81ad5d705c3b6 15:41:59.313 [ProcExecTimeout] WARN  org.apache.hadoop.hbase.master.assignment.AssignmentManager - STUCK Region-In-Transition rit=OPENING, location=node3,16020,1596598630022, table=real_time_data, region=75c42d75e2556bf70ff527f2425e8509 15:41:59.313 [ProcExecTimeout] WARN  org.apache.hadoop.hbase.master.assignment.AssignmentManager - STUCK Region-In-Transition rit=OPENING, location=node3,16020,1596598630022, table=real_time_data, region=2eee04869ac2c35984d4d22e6e9f2f31 15:42:08.264 [master/node3:16000] INFO  org.apache.hadoop.hbase.client.RpcRetryingCallerImpl - Call exception, tries=15, retries=15, started=128887 ms ago, cancelled=false, msg=org.apache.hadoop.hbase.NotServingRegionException: hbase:namespace,,1558205786137.40562c48c9210c06813adce48773cb6a. is not online on node1,16020,1596957741742     at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3273)     at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3250)     at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)     at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2446)     at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)     at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)     at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)     at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) , details=row 'default' on table 'hbase:namespace' at region=hbase:namespace,,1558205786137.40562c48c9210c06813adce48773cb6a., hostname=node1,16020,1589648239142, seqNum=55 ... ... 15:44:58.229 [qtp1792826268-435] WARN  org.eclipse.jetty.servlet.ServletHandler - /master-status org.apache.hadoop.hbase.PleaseHoldException: Master is initializing     at org.apache.hadoop.hbase.master.HMaster.isInMaintenanceMode(HMaster.java:2827) ~[hbase-server-2.0.0.3.0.0.0-1634.jar:2.0.0.3.0.0.0-1634]     at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:271) ~[hbase-server-2.0.0.3.0.0.0-1634.jar:2.0.0.3.0.0.0-1634]     at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:389) ~[hbase-server-2.0.0.3.0.0.0-1634.jar:2.0.0.3.0.0.0-1634]     at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:380) ~[hbase-server-2.0.0.3.0.0.0-1634.jar:2.0.0.3.0.0.0-1634]     at org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:81) ~[hbase-server-2.0.0.3.0.0.0-1634.jar:2.0.0.3.0.0.0-1634]     at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) ~[javax.servlet-api-3.1.0.jar:3.1.0]     at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) ~[javax.servlet-api-3.1.0.jar:3.1.0]     at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) ~[jetty-servlet-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) ~[jetty-servlet-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:112) ~[hbase-http-2.0.0.3.0.0.0-1634.jar:2.0.0.3.0.0.0-1634]     at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) ~[jetty-servlet-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.apache.hadoop.hbase.http.ClickjackingPreventionFilter.doFilter(ClickjackingPreventionFilter.java:48) ~[hbase-http-2.0.0.3.0.0.0-1634.jar:2.0.0.3.0.0.0-1634]     at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) ~[jetty-servlet-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1374) ~[hbase-http-2.0.0.3.0.0.0-1634.jar:2.0.0.3.0.0.0-1634]     at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) ~[jetty-servlet-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49) ~[hbase-http-2.0.0.3.0.0.0-1634.jar:2.0.0.3.0.0.0-1634]     at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) ~[jetty-servlet-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49) ~[hbase-http-2.0.0.3.0.0.0-1634.jar:2.0.0.3.0.0.0-1634]     at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) ~[jetty-servlet-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) [jetty-servlet-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) [jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) [jetty-security-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) [jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) [jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) [jetty-servlet-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) [jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) [jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) [jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.server.Server.handle(Server.java:534) [jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) [jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) [jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) [jetty-io-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) [jetty-io-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) [jetty-io-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) [jetty-util-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) [jetty-util-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) [jetty-util-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) [jetty-util-9.3.19.v20170502.jar:9.3.19.v20170502]     at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) [jetty-util-9.3.19.v20170502.jar:9.3.19.v20170502]     at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]

常规操作

到这里,我尝试使用hbck命令查看详情并修复,发现hbase2.0.0版本hbck已经废弃了修复的命令。

----------------------------------------------------------------------- NOTE: As of HBase version 2.0, the hbck tool is significantly changed. In general, all Read-Only options are supported and can be be used safely. Most -fix/ -repair options are NOT supported. Please see usage below for details on which options are not supported. -----------------------------------------------------------------------  省略若干... 省略若干... 省略若干...  NOTE: Following options are NOT supported as of HBase version 2.0+.    UNSUPPORTED Metadata Repair options: (expert features, use with caution!)    -fix              Try to fix region assignments.  This is for backwards compatiblity    -fixAssignments   Try to fix region assignments.  Replaces the old -fix    -fixMeta          Try to fix meta problems.  This assumes HDFS region info is good.    -fixHdfsHoles     Try to fix region holes in hdfs.    -fixHdfsOrphans   Try to fix region dirs with no .regioninfo file in hdfs    -fixTableOrphans  Try to fix table dirs with no .tableinfo file in hdfs (online mode only)    -fixHdfsOverlaps  Try to fix region overlaps in hdfs.    -maxMerge      When fixing region overlaps, allow at most  regions to merge. (n=5 by default)    -sidelineBigOverlaps  When fixing region overlaps, allow to sideline big overlaps    -maxOverlapsToSideline   When fixing region overlaps, allow at most  regions to sideline per group. (n=2 by default)    -fixSplitParents  Try to force offline split parents to be online.    -removeParents    Try to offline and sideline lingering parents and keep daughter regions.    -fixEmptyMetaCells  Try to fix hbase:meta entries not referencing any region (empty REGIONINFO_QUALIFIER rows)    UNSUPPORTED Metadata Repair shortcuts    -repair           Shortcut for -fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans -fixHdfsOverlaps -fixVersionFile -sidelineBigOverlaps -fixReferenceFiles-fixHFileLinks    -repairHoles      Shortcut for -fixAssignments -fixMeta -fixHdfsHoles

然后,查阅资料看到了hbck2,官方地址:https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2, 这个工具,本来以为抓住了救命的稻草,结果:

=================================================================== HBCK2 Overview HBCK2 is currently a simple tool that does one thing at a time only. In hbase-2.x, the Master is the final arbiter of all state, so a general principal for most HBCK2 commands is that it asks the Master to effect all repair. This means a Master must be up before you can run HBCK2 commands. The HBCK2 implementation approach is to make use of an HbckService hosted on the Master. The Service publishes a few methods for the HBCK2 tool to pull on. Therefore, for HBCK2 commands relying on Master's HbckService facade, first thing HBCK2 does is poke the cluster to ensure the service is available. This will fail if the remote Server does not publish the Service or if the HbckService is lacking the requested method. For the latter case, if you can, update your cluster to obtain more fix facility. HBCK2 versions should be able to work across multiple hbase-2 releases. It will fail with a complaint if it is unable to run. There is no HbckService in versions of hbase before 2.0.3 and 2.1.1. HBCK2 will not work against these versions. Next we look first at how you 'find' issues in your running cluster followed by a section on how you 'fix' found problems. ===================================================================

wtm,服了。hbase2.0.0 ~ 2.0.2以及hbase2.1.0 ~ 2.1.0是不适用的,既不能使用hbck,也不能使用hbck2,这里出现了断层。

解决办法

1. 修复master,让集群正常启动

由于目前master无法初始化,集群无法启动,因为元数据表hbase:meta信息有损坏,hbase:namespace表is not online,首先需要让hbase:namespace表上线,启动hbase集群再说,否则后续的修复工作都进行不了;然后修复那些表(此时内心是崩溃的,都准备重搭建集群了)。

查看hbase源码,发现hbase元数据表hbase:namespace表如果没有会重建,TableNamespaceManager.java:

思路:备份hbase:namespace表hdfs数据,删除hbase:namespace表,启动时让其重建,然后将备份的数据bulkload进新建的hbase:namespace表中去。

删除hbase:meta中hbase:namespace那一行数据,并且mv走hbase:namespace表对应的hdfs目录到临时目录备份,这样相当于把hbase:namespace这个表删除了。

然后,重启hbase集群,namespace表会被重建,集群终于起来了。此时,hbase:namespace这张表里面保存的namespace只有default这个默认的namespace,我们通过bulkload命令,把临时目录里面的hfile文件移到hbase:namespace这张表里面,这样就还原了命名空间表。

2. 修复hbase表

很不容易,hbase集群已经起来了,通过web ui发现,此时里面的表都是空的,无法找到每个region对应的hdfs数据文件。

由于hbase中的hbase:meta表保存所有表的region分配等信息,现在由于集群异常停止,破坏了hbase:meta表,应该是hbase:meta表有损坏,导致hbase:namespace表无法找到对应分配的region。

思路:通过.regioninfo来修复hbase:meta表

感谢你能够认真阅读完这篇文章,希望小编分享的"Hbase集群挂掉的示例分析"这篇文章对大家有帮助,同时也希望大家多多支持,关注行业资讯频道,更多相关知识等着你来学习!

集群 数据 信息 篇文章 命令 备份 目录 示例 分析 思路 数据表 文件 版本 结果 错误 分配 一行 价值 兴趣 内存 数据库的安全要保护哪些东西 数据库安全各自的含义是什么 生产安全数据库录入 数据库的安全性及管理 数据库安全策略包含哪些 海淀数据库安全审计系统 建立农村房屋安全信息数据库 易用的数据库客户端支持安全管理 连接数据库失败ssl安全错误 数据库的锁怎样保障安全 jsp数据库中文乱码 网络安全会议记录活动效果 服务器主机上有多少个硬盘 工信部企业网络安全威胁 学生上网如何保障网络安全 现实生活中网络安全有那些 软件开发就业人员前景 期货原油用什么软件开发 软件开发任务分配模板 文档 网络安全献策 数据库安全管理的含义 南通项目管控软件开发平台 香港英文大学网络安全实验室 信息网络技术包括 党员网络安全培训主题 趣味网络安全技术 数据库的恢复技术有哪些 杭州运程网络技术有限公司 服务器如何分盘 斗鱼提示增强网络安全意识 网络安全隐私保护账号 服务器运行自动断电 北京守灯网络技术有限公司怎么样 软件开发项目周期和流程 期货原油用什么软件开发 流动人口数据库多久更新一次 村级网络安全工作会议记录 软件开发都是什么大学 信安通5g网络安全试验场 失踪儿童数据库系统
0