一次oplog损坏事故
发表于:2024-11-25 作者:千家信息网编辑
千家信息网最后更新 2024年11月25日,问题1)程序报警Execution TimeoutCouldn't get a connection within the time limit2)mongod日志Jun 11 21:48:35 mo
千家信息网最后更新 2024年11月25日一次oplog损坏事故
问题
1)程序报警
Execution TimeoutCouldn't get a connection within the time limit
2)mongod日志
Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.122+0800 I NETWORK [initandlisten] connection accepted from 10.0.0.1:24321 #32 (32 connections now open)Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.136+0800 I ACCESS [conn32] Successfully authenticated as principal __system on localJun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.349+0800 I - [rsSync] Assertion: 10334:BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOOJun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.358+0800 I CONTROL [rsSync]Jun 11 21:48:35 mongod mongod: 0x132c032 0x12c9988 0x12b29f8 0x12b2aac 0x9da659 0xae692f 0x106e4dc 0x1066b1e 0x1066d69 0xfd64f5 0xaea0fe 0xaea621 0xebe304 0xf563ae 0xf57c78 0xf4d29b 0x1b5c330 0x7efd30830dc5 0x7efd3055f73dJun 11 21:48:35 mongod mongod: ----- BEGIN BACKTRACE -----Jun 11 21:48:35 mongod mongod: {"backtrace":[{"b":"400000","o":"F2C032","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"EC9988","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"EB29F8","s":"_ZN5mongo11msgassertedEiPKc"},{"b":"400000","o":"EB2AAC"},{"b":"400000","o":"5DA659","s":"_ZNK5mongo7BSONObj14_assertInvalidEv"},{"b":"400000","o":"6E692F","s":"_ZN5mongo10Collection19aboutToDeleteCappedEPNS_16OperationContextERKNS_8RecordIdENS_10RecordDataE"},{"b":"400000","o":"C6E4DC","s":"_ZN5mongo19CappedRecordStoreV111allocRecordEPNS_16OperationContextEib"},{"b":"400000","o":"C66B1E","s":"_ZN5mongo17RecordStoreV1Base13_insertRecordEPNS_16OperationContextEPKcib"},{"b":"400000","o":"C66D69","s":"_ZN5mongo17RecordStoreV1Base12insertRecordEPNS_16OperationContextEPKcib"},{"b":"400000","o":"BD64F5","s":"_ZN5mongo11RecordStore13insertRecordsEPNS_16OperationContextEPSt6vectorINS_6RecordESaIS4_EEb"},{"b":"400000","o":"6EA0FE","s":"_ZN5mongo10Collection16_insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_b"},{"b":"400000","o":"6EA621","s":"_ZN5mongo10Collection15insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_bb"},{"b":"400000","o":"ABE304","s":"_ZN5mongo4repl15writeOpsToOplogEPNS_16OperationContextERKSt6vectorINS_7BSONObjESaIS4_EE"},{"b":"400000","o":"B563AE","s":"_ZN5mongo4repl8SyncTail10multiApplyEPNS_16OperationContextERKNS1_7OpQueueE"},{"b":"400000","o":"B57C78","s":"_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_16StorageInterfaceE"},{"b":"400000","o":"B4D29B","s":"_ZN5mongo4repl13runSyncThreadEv"},{"b":"400000","o":"175C330","s":"execute_native_thread_routine"},{"b":"7EFD30829000","o":"7DC5"},{"b":"7EFD30468000","o":"F773D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.12", "gitVersion" : "ef3e1bc78e997f0d9f22f45aeb1d8e3b6ac14a14", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-514.6.2.el7.x86_64", "version" : "#1 SMP Thu Feb 23 03:04:39 UTC 2017", "machine" : "x86_64" }, "somap" : [ {Jun 11 21:48:35 mongod mongod: "elfType" : 2, "b" : "400000", "buildId" : "BAAC1A970F6D0F06B88D0DE75BF06E4C260939EC" }, { "b" : "7FFE4D008000", "elfType" : 3, "buildId" : "7112DB1E073B211AB0CF8DE793F40F3FF996F5B4" }, { "b" : "7EFD31753000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "90EAF65D9B0EEEB1424241281F7F197451D4317D" }, { "b" : "7EFD31369000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "7278C69EE161D98DDD0FA00F92B67AD78C7B7F40" }, { "b" : "7EFD31161000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "82E77ADE22BC9FFF8D3458BD37331E7EDF174C28" }, { "b" : "7EFD30F5D000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "C5F560504E1AF52E29679C3B52FF11121015D6BB" }, { "b" : "7EFD30C5B000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "721C7CC9488EFA25F83B48AF713AB27DBE48EF3E" }, { "b" : "7EFD30A45000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "408B46E291B2D4C9612E27C0509D165D7E186D40" }, { "b" : "7EFD30829000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "C3DEB1FA27CD0C1C3CC575B944ABACBA0698B0F2" }, { "b" : "7EFD30468000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "8B2C421716985B927AA0CAF2A05D0B1F452367F7" }, { "b" : "7EFD319C1000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "8F3E366E2DB73C330A3791DEAE31AE9579099B44" }, { "b" : "7EFD3021A000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "A2499C359AA179EE23324ED949C0E508E4434F10" }, { "b" : "7EFD2FF33000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "E09A34D9083DC6FEAF7018C09D55631DEEE2836D" }, { "b" : "7EFD2FD2F000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "BF54B7C8932E450769FBBB8B18864D1DD70BBC67" }, { "b" : "7EFD2FAFD000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "BF8F00D7CB849ADB0B7A4703BC7B8D66AEE6A49C" }, { "b" : "7EFD2F8E7000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "EA8E45DC8E395CC5E26890470112D97A1F1E0B65" }, { "b" : "7EFD2F6D8000", "path" : "/lib64Jun 11 21:48:35 mongod mongod: /libkrb5support.so.0", "elfType" : 3, "buildId" : "1E7A92FDD6FB3871DA97F4BCA2E147E72B6B6E1F" }, { "b" : "7EFD2F4D4000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7EFD2F2BA000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "FE7AE845A123A3DFC0FDC2408BCBC2BA8B61B158" }, { "b" : "7EFD2F093000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "76687CA31A406854DF3BCF8D03055656F56E6892" }, { "b" : "7EFD2EE32000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "AE64AA461A26E01F60408013D361749D56DD0AE1" } ] }}Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x132c032]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10logContextEPKc+0x138) [0x12c9988]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo11msgassertedEiPKc+0x88) [0x12b29f8]Jun 11 21:48:35 mongod mongod: mongod(+0xEB2AAC) [0x12b2aac]Jun 11 21:48:35 mongod mongod: mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x3B9) [0x9da659]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10Collection19aboutToDeleteCappedEPNS_16OperationContextERKNS_8RecordIdENS_10RecordDataE+0xBF) [0xae692f]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo19CappedRecordStoreV111allocRecordEPNS_16OperationContextEib+0x46C) [0x106e4dc]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo17RecordStoreV1Base13_insertRecordEPNS_16OperationContextEPKcib+0x5E) [0x1066b1e]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo17RecordStoreV1Base12insertRecordEPNS_16OperationContextEPKcib+0xA9) [0x1066d69]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo11RecordStore13insertRecordsEPNS_16OperationContextEPSt6vectorINS_6RecordESaIS4_EEb+0xB5) [0xfd64f5]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10Collection16_insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_b+0x16E) [0xaea0fe]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10Collection15insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_bb+0x1B1) [0xaea621]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl15writeOpsToOplogEPNS_16OperationContextERKSt6vectorINS_7BSONObjESaIS4_EE+0x144) [0xebe304]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl8SyncTail10multiApplyEPNS_16OperationContextERKNS1_7OpQueueE+0x98E) [0xf563ae]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_16StorageInterfaceE+0xD08) [0xf57c78]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl13runSyncThreadEv+0x2BB) [0xf4d29b]Jun 11 21:48:35 mongod mongod: mongod(execute_native_thread_routine+0x20) [0x1b5c330]Jun 11 21:48:35 mongod mongod: libpthread.so.0(+0x7DC5) [0x7efd30830dc5]Jun 11 21:48:35 mongod mongod: libc.so.6(clone+0x6D) [0x7efd3055f73d]Jun 11 21:48:35 mongod mongod: ----- END BACKTRACE -----Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.389+0800 F - [rsSync] terminate() called. An exception is active; attempting to gather more informationJun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.393+0800 F - [rsSync] DBException::toString(): 10334 BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOOJun 11 21:48:35 mongod mongod: Actual exception type: mongo::MsgAssertionExceptionJun 11 21:48:35 mongod mongod: 0x132c032 0x132bb82 0x1b14356 0x1b14383 0xf4d3e0 0x1b5c330 0x7efd30830dc5 0x7efd3055f73dJun 11 21:48:35 mongod mongod: ----- BEGIN BACKTRACE -----
一、集群环境
3个mongos、3个mongo config、3个shard(每个分片各三台,一主两从)
二、事故描述
次数1:某次活动,由于并发太多,导致mongod连接数太多,内存消耗太多(每个连接上来会分配一定的内存空间),导致整体query慢,产生大量堆积(应用报超时等错误)。直接重启mongod,重启前未stepDown,其中一个从节点2发现错误。临时剔除节点,打算后续有时间解决(原因未知)
次数2:此时距离上次事故6天,节点2还未修复。同一个分片主节点1意外出现同样的错误(原因未知)此时只有一个主节点,熬夜紧急修复这两个节点
次数3:距离第二次事故两天,同一个分片主节点3又出现了同样的错误(原因未知)此时一阵冷汗,同一个分片轮番坏了一遍,辛亏解决的及时。
三、解决步骤
1、查看报错信息,感觉像是数据块损坏问题。并且此分片不能做更新操作,但是可以正常查询。
2、问题定位,oplog.rs损坏
> db> local db.oplog.rs.find() error: { "$err" : "BSONObj size: 1852142352 (0x1073656E) is invalid. Size must be between 0 and 16793600(16MB) First element: Status: ?type=100", "code" : 10334 }
3、重建oplog
1)找到最近的一条oplog记录
> db.oplog.rs.find( { }, { ts: 1, h: 1 } ).sort( {$natural : -1} ).limit(1).next(){"ts" : Timestamp(1497172747, 46), "h" : NumberLong("4489544342319430008")}
2)保存记录
> db.temp.save(db.oplog.rs.find( { }, { ts: 1, h: 1 } ).sort( {$natural : -1} ).limit(1).next() )//确认操作,很重要>db.temp.find()
3)删除oplog.rs,物理文件不会删除
> db.oplog.rs.drop()
4)建立新的oplog
> db.runCommand( { create: "oplog.rs", capped: true, size: (50 * 1024 * 1024 * 1024) } ){ "ok" : 0, "errmsg" : "not authorized on local to execute command { create: \"oplog.rs\", capped: true, size: 53687091200.0 }", "code" : 13}//很悲催,重建capped类型的表,显示没权限,而oplog.rs必须要是capped类型,可是已经是root最大权限
将shardsvr角色更改为configsvr角色,去掉keyfile认证(相当于去掉权限),重启服务,再次执行
configsvr> db.runCommand( { create: "oplog.rs", capped: true, size: (50 * 1024 * 1024 * 1024) } );{ "ok" : 1 }
为什么要更改角色呢?
--因为shardsvr角色,去掉keyfile认证,服务启动不了。哭~~~
5)将上次操作的最近一条记录写到oplog中
> db.oplog.rs.save( db.temp.findOne() )
6)确认操作
> db.oplog.rs.find(){"ts" : Timestamp(1497172747, 46), "h" : NumberLong("4489544342319430008")}
7)修改配置文件,更改会原来的配置,重启服务。重新加入到集群中。查看延迟状态
rs:SECONDARY> db.printSlaveReplicationInfo();source: mongod122:10000 syncedTo: Mon Jun 12 2017 03:11:41 GMT+0800 (CST) 0 secs (0 hrs) behind the primary source: mongod121:10000 syncedTo: Sun Jun 11 2017 23:52:12 GMT+0800 (CST) 11969 secs (3.32 hrs) behind the primary
待延时追上后,一切归于正常。
至今,还不知道是什么原因导致的该分片频繁发生此类错误,后续如有进展,再更新。
节点
错误
原因
角色
事故
权限
次数
问题
服务
内存
文件
类型
集群
更新
认证
配置
最大
重要
频繁
意外
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
郑州学习软件开发一般要多少钱
建立参展商数据库包括哪些内容
巴适游戏服务器不可用
查重知网的数据库有哪些内容
奇安信网络安全产业学院
网络安全法附则法内容
上位机软件开发教程 c
ssms 连接服务器名称
应届生做单片机软件开发
服务器远程管理界面
成都网络安全培训学院
自学单片机软件开发
如何学习计算机网络技术
攀枝花网络技术要多少钱
蓝科科技服务器
电脑怎么建立新的数据库
oracle数据库设计作业
惠普服务器远程管理接口图片
济南医院软件开发公司
连云港工程软件开发价格大全
网络技术的应用和感想
怎么把数据库断开连接
云南时代网络技术服务推广
大流量服务器攻击网站的方法
东莞法盛互联网科技怎么样
徐州提供网络技术服务电话
棋牌手游软件开发公司
软件开发学习有几个阶段
筑业资料数据库怎么打开
青山定制软件开发价格