一次oplog损坏事故
发表于:2025-01-20 作者:千家信息网编辑
千家信息网最后更新 2025年01月20日,问题1)程序报警Execution TimeoutCouldn't get a connection within the time limit2)mongod日志Jun 11 21:48:35 mo
千家信息网最后更新 2025年01月20日一次oplog损坏事故
问题
1)程序报警
Execution TimeoutCouldn't get a connection within the time limit
2)mongod日志
Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.122+0800 I NETWORK [initandlisten] connection accepted from 10.0.0.1:24321 #32 (32 connections now open)Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.136+0800 I ACCESS [conn32] Successfully authenticated as principal __system on localJun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.349+0800 I - [rsSync] Assertion: 10334:BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOOJun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.358+0800 I CONTROL [rsSync]Jun 11 21:48:35 mongod mongod: 0x132c032 0x12c9988 0x12b29f8 0x12b2aac 0x9da659 0xae692f 0x106e4dc 0x1066b1e 0x1066d69 0xfd64f5 0xaea0fe 0xaea621 0xebe304 0xf563ae 0xf57c78 0xf4d29b 0x1b5c330 0x7efd30830dc5 0x7efd3055f73dJun 11 21:48:35 mongod mongod: ----- BEGIN BACKTRACE -----Jun 11 21:48:35 mongod mongod: {"backtrace":[{"b":"400000","o":"F2C032","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"EC9988","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"EB29F8","s":"_ZN5mongo11msgassertedEiPKc"},{"b":"400000","o":"EB2AAC"},{"b":"400000","o":"5DA659","s":"_ZNK5mongo7BSONObj14_assertInvalidEv"},{"b":"400000","o":"6E692F","s":"_ZN5mongo10Collection19aboutToDeleteCappedEPNS_16OperationContextERKNS_8RecordIdENS_10RecordDataE"},{"b":"400000","o":"C6E4DC","s":"_ZN5mongo19CappedRecordStoreV111allocRecordEPNS_16OperationContextEib"},{"b":"400000","o":"C66B1E","s":"_ZN5mongo17RecordStoreV1Base13_insertRecordEPNS_16OperationContextEPKcib"},{"b":"400000","o":"C66D69","s":"_ZN5mongo17RecordStoreV1Base12insertRecordEPNS_16OperationContextEPKcib"},{"b":"400000","o":"BD64F5","s":"_ZN5mongo11RecordStore13insertRecordsEPNS_16OperationContextEPSt6vectorINS_6RecordESaIS4_EEb"},{"b":"400000","o":"6EA0FE","s":"_ZN5mongo10Collection16_insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_b"},{"b":"400000","o":"6EA621","s":"_ZN5mongo10Collection15insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_bb"},{"b":"400000","o":"ABE304","s":"_ZN5mongo4repl15writeOpsToOplogEPNS_16OperationContextERKSt6vectorINS_7BSONObjESaIS4_EE"},{"b":"400000","o":"B563AE","s":"_ZN5mongo4repl8SyncTail10multiApplyEPNS_16OperationContextERKNS1_7OpQueueE"},{"b":"400000","o":"B57C78","s":"_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_16StorageInterfaceE"},{"b":"400000","o":"B4D29B","s":"_ZN5mongo4repl13runSyncThreadEv"},{"b":"400000","o":"175C330","s":"execute_native_thread_routine"},{"b":"7EFD30829000","o":"7DC5"},{"b":"7EFD30468000","o":"F773D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.12", "gitVersion" : "ef3e1bc78e997f0d9f22f45aeb1d8e3b6ac14a14", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-514.6.2.el7.x86_64", "version" : "#1 SMP Thu Feb 23 03:04:39 UTC 2017", "machine" : "x86_64" }, "somap" : [ {Jun 11 21:48:35 mongod mongod: "elfType" : 2, "b" : "400000", "buildId" : "BAAC1A970F6D0F06B88D0DE75BF06E4C260939EC" }, { "b" : "7FFE4D008000", "elfType" : 3, "buildId" : "7112DB1E073B211AB0CF8DE793F40F3FF996F5B4" }, { "b" : "7EFD31753000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "90EAF65D9B0EEEB1424241281F7F197451D4317D" }, { "b" : "7EFD31369000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "7278C69EE161D98DDD0FA00F92B67AD78C7B7F40" }, { "b" : "7EFD31161000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "82E77ADE22BC9FFF8D3458BD37331E7EDF174C28" }, { "b" : "7EFD30F5D000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "C5F560504E1AF52E29679C3B52FF11121015D6BB" }, { "b" : "7EFD30C5B000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "721C7CC9488EFA25F83B48AF713AB27DBE48EF3E" }, { "b" : "7EFD30A45000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "408B46E291B2D4C9612E27C0509D165D7E186D40" }, { "b" : "7EFD30829000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "C3DEB1FA27CD0C1C3CC575B944ABACBA0698B0F2" }, { "b" : "7EFD30468000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "8B2C421716985B927AA0CAF2A05D0B1F452367F7" }, { "b" : "7EFD319C1000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "8F3E366E2DB73C330A3791DEAE31AE9579099B44" }, { "b" : "7EFD3021A000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "A2499C359AA179EE23324ED949C0E508E4434F10" }, { "b" : "7EFD2FF33000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "E09A34D9083DC6FEAF7018C09D55631DEEE2836D" }, { "b" : "7EFD2FD2F000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "BF54B7C8932E450769FBBB8B18864D1DD70BBC67" }, { "b" : "7EFD2FAFD000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "BF8F00D7CB849ADB0B7A4703BC7B8D66AEE6A49C" }, { "b" : "7EFD2F8E7000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "EA8E45DC8E395CC5E26890470112D97A1F1E0B65" }, { "b" : "7EFD2F6D8000", "path" : "/lib64Jun 11 21:48:35 mongod mongod: /libkrb5support.so.0", "elfType" : 3, "buildId" : "1E7A92FDD6FB3871DA97F4BCA2E147E72B6B6E1F" }, { "b" : "7EFD2F4D4000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7EFD2F2BA000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "FE7AE845A123A3DFC0FDC2408BCBC2BA8B61B158" }, { "b" : "7EFD2F093000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "76687CA31A406854DF3BCF8D03055656F56E6892" }, { "b" : "7EFD2EE32000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "AE64AA461A26E01F60408013D361749D56DD0AE1" } ] }}Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x132c032]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10logContextEPKc+0x138) [0x12c9988]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo11msgassertedEiPKc+0x88) [0x12b29f8]Jun 11 21:48:35 mongod mongod: mongod(+0xEB2AAC) [0x12b2aac]Jun 11 21:48:35 mongod mongod: mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x3B9) [0x9da659]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10Collection19aboutToDeleteCappedEPNS_16OperationContextERKNS_8RecordIdENS_10RecordDataE+0xBF) [0xae692f]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo19CappedRecordStoreV111allocRecordEPNS_16OperationContextEib+0x46C) [0x106e4dc]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo17RecordStoreV1Base13_insertRecordEPNS_16OperationContextEPKcib+0x5E) [0x1066b1e]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo17RecordStoreV1Base12insertRecordEPNS_16OperationContextEPKcib+0xA9) [0x1066d69]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo11RecordStore13insertRecordsEPNS_16OperationContextEPSt6vectorINS_6RecordESaIS4_EEb+0xB5) [0xfd64f5]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10Collection16_insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_b+0x16E) [0xaea0fe]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10Collection15insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_bb+0x1B1) [0xaea621]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl15writeOpsToOplogEPNS_16OperationContextERKSt6vectorINS_7BSONObjESaIS4_EE+0x144) [0xebe304]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl8SyncTail10multiApplyEPNS_16OperationContextERKNS1_7OpQueueE+0x98E) [0xf563ae]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_16StorageInterfaceE+0xD08) [0xf57c78]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl13runSyncThreadEv+0x2BB) [0xf4d29b]Jun 11 21:48:35 mongod mongod: mongod(execute_native_thread_routine+0x20) [0x1b5c330]Jun 11 21:48:35 mongod mongod: libpthread.so.0(+0x7DC5) [0x7efd30830dc5]Jun 11 21:48:35 mongod mongod: libc.so.6(clone+0x6D) [0x7efd3055f73d]Jun 11 21:48:35 mongod mongod: ----- END BACKTRACE -----Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.389+0800 F - [rsSync] terminate() called. An exception is active; attempting to gather more informationJun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.393+0800 F - [rsSync] DBException::toString(): 10334 BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOOJun 11 21:48:35 mongod mongod: Actual exception type: mongo::MsgAssertionExceptionJun 11 21:48:35 mongod mongod: 0x132c032 0x132bb82 0x1b14356 0x1b14383 0xf4d3e0 0x1b5c330 0x7efd30830dc5 0x7efd3055f73dJun 11 21:48:35 mongod mongod: ----- BEGIN BACKTRACE -----
一、集群环境
3个mongos、3个mongo config、3个shard(每个分片各三台,一主两从)
二、事故描述
次数1:某次活动,由于并发太多,导致mongod连接数太多,内存消耗太多(每个连接上来会分配一定的内存空间),导致整体query慢,产生大量堆积(应用报超时等错误)。直接重启mongod,重启前未stepDown,其中一个从节点2发现错误。临时剔除节点,打算后续有时间解决(原因未知)
次数2:此时距离上次事故6天,节点2还未修复。同一个分片主节点1意外出现同样的错误(原因未知)此时只有一个主节点,熬夜紧急修复这两个节点
次数3:距离第二次事故两天,同一个分片主节点3又出现了同样的错误(原因未知)此时一阵冷汗,同一个分片轮番坏了一遍,辛亏解决的及时。
三、解决步骤
1、查看报错信息,感觉像是数据块损坏问题。并且此分片不能做更新操作,但是可以正常查询。
2、问题定位,oplog.rs损坏
> db> local db.oplog.rs.find() error: { "$err" : "BSONObj size: 1852142352 (0x1073656E) is invalid. Size must be between 0 and 16793600(16MB) First element: Status: ?type=100", "code" : 10334 }
3、重建oplog
1)找到最近的一条oplog记录
> db.oplog.rs.find( { }, { ts: 1, h: 1 } ).sort( {$natural : -1} ).limit(1).next(){"ts" : Timestamp(1497172747, 46), "h" : NumberLong("4489544342319430008")}
2)保存记录
> db.temp.save(db.oplog.rs.find( { }, { ts: 1, h: 1 } ).sort( {$natural : -1} ).limit(1).next() )//确认操作,很重要>db.temp.find()
3)删除oplog.rs,物理文件不会删除
> db.oplog.rs.drop()
4)建立新的oplog
> db.runCommand( { create: "oplog.rs", capped: true, size: (50 * 1024 * 1024 * 1024) } ){ "ok" : 0, "errmsg" : "not authorized on local to execute command { create: \"oplog.rs\", capped: true, size: 53687091200.0 }", "code" : 13}//很悲催,重建capped类型的表,显示没权限,而oplog.rs必须要是capped类型,可是已经是root最大权限
将shardsvr角色更改为configsvr角色,去掉keyfile认证(相当于去掉权限),重启服务,再次执行
configsvr> db.runCommand( { create: "oplog.rs", capped: true, size: (50 * 1024 * 1024 * 1024) } );{ "ok" : 1 }
为什么要更改角色呢?
--因为shardsvr角色,去掉keyfile认证,服务启动不了。哭~~~
5)将上次操作的最近一条记录写到oplog中
> db.oplog.rs.save( db.temp.findOne() )
6)确认操作
> db.oplog.rs.find(){"ts" : Timestamp(1497172747, 46), "h" : NumberLong("4489544342319430008")}
7)修改配置文件,更改会原来的配置,重启服务。重新加入到集群中。查看延迟状态
rs:SECONDARY> db.printSlaveReplicationInfo();source: mongod122:10000 syncedTo: Mon Jun 12 2017 03:11:41 GMT+0800 (CST) 0 secs (0 hrs) behind the primary source: mongod121:10000 syncedTo: Sun Jun 11 2017 23:52:12 GMT+0800 (CST) 11969 secs (3.32 hrs) behind the primary
待延时追上后,一切归于正常。
至今,还不知道是什么原因导致的该分片频繁发生此类错误,后续如有进展,再更新。
节点
错误
原因
角色
事故
权限
次数
问题
服务
内存
文件
类型
集群
更新
认证
配置
最大
重要
频繁
意外
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
笔记本远程访问连接管理服务器
产权局数据库
港口行业价格数据库
我的世界服务器网易安全区指令
u8数据库恢复出厂设置
人人车 软件开发工资
手机怎样连接服务器
县网信办网络安全特色做法
服务器网络编号是ip地址吗
汽车企业 技术 软件开发
甘肃手机软件开发定制
数据库有左连接为什么还要右连接
灌篮高手没有服务器
上海机架式服务器厂家供应
2019湖南长沙网络安全会
石家庄企业软件开发哪家好
vf工作区能打开数据库吗
局域网服务器没有权限
嘉定区推广网络技术客户至上
软件开发项目如何谈判
无网络怎么更新病毒数据库
网络安全应急演练落实情况
河南软件开发哪家好又便宜
网络安全法治知识
科技互联网理论
网络安全四个级别
上海java软件开发项目
java软件开发哪个质量好
凤城达圆网络技术
服务器行业投资策略