Oracle数据库块的物理损坏与逻辑损坏
笔者认为不能简单的说物理硬盘的损坏导致块的物理损坏,Oracle bug导致块的逻辑损坏。这篇文章我们来详细讨论一下Oracle数据库的物理损坏以及逻辑损坏的概念。下面是一篇METALINK的文章:
Physical and Logical Block Corruptions. All you wanted to know about it. [ID 840978.1] | |||||
修改时间 21-MAY-2012 类型 BULLETIN 状态 PUBLISHED |
In this Document
Purpose |
Scope |
Details |
Physical Block Corruptions |
Fractured Block |
Bad Checksum |
Block Misplaced |
Logical Block Corruptions |
References |
Applies to:
Oracle Server - Enterprise Edition - Version 9.2.0.1 to 11.2.0.3 [Release 9.2 to 11.2]Information in this document applies to any platform.
***Checked for relevance on 21-May-2012***
Purpose
Oracle classifies the Data File Block corruptions as Physical and Logical. This is also referred as intra block corruptions. This document is intended to provide detailed information and errors example about it.
Scope
This document is intended for Database Administrators.
Details
Physical Block Corruptions
This kind of block corruptions are normally reported by Oracle with error ORA-1578 and the detailed corruption description is printed in the alert log.
Corruption Examples are:
- Bad header - the beginning of the block (cache header) is corrupt with invalid values
- The block is Fractured/Incomplete - header and footer of the block do not match
- The block checksum is invalid
- The block is misplaced
- Zeroed out blocks / ORA-8103
Detailed Corruption Description:
Fractured Block
A Fractured block means that the block is incomplete. Information from the block header does not match the block tail.
Corrupt block relative dba: 0x0380e573 (file 14, block 58739)Fractured block found during buffer read
Data in bad block -
type: 6 format: 2 rdba: 0x0380e573
last change scn: 0x0288.8e5a2f78 seq: 0x1 flg: 0x04
consistency value in tail: 0x00780601
check value in block header: 0x8739, computed block checksum: 0x2f00
spare1: 0x0, spare2: 0x0, spare3: 0x0
***
Reread of rdba: 0x0380e573 (file 14, block 58739) found same corrupted data
Bad Checksum
Block Checksums are used to identify if the block was changed by something external to Oracle and after the block was last written by Oracle.
Checksum is calculated by DBWR or direct loader before writing the block to disk and stored in the block header. Every time that the block is read and if db_block_checksum is different than false, Oracle calculates a checksum and compares it to the one stored in the block header. Reference Note 30706.1
Example of a corrupt block due to invalid checksum:
Bad check value found during buffer read
Data in bad block -
type: 6 format: 2 rdba: 0x0380a58f
last change scn: 0x0288.7784c5ee seq: 0x1 flg: 0x06
consistency value in tail: 0xc5ee0601
check value in block header: 0x68a7, computed block checksum: 0x2f00
spare1: 0x0, spare2: 0x0, spare3: 0x0
***
Reread of rdba: 0x0380a58f (file 14, block 42383) found same corrupted data
A value different than zero (0x0) in "computed block checksum" means that the checksum
differs and the result of this comparison is printed.
Block Misplaced
This is when Oracle detected that the content of the block being read belongs to a different block and the checksum is valid:
Corrupt block relative dba: 0x0d805a89 (file 54, block 23177)Bad header found during buffer read
Data in bad block -
type: 6 format: 2 rdba: 0x0d805b08 ----> Block is different than expected 0x0d805a89
last change scn: 0x0692.86dc08e3 seq: 0x1 flg: 0x04
consistency value in tail: 0x08e30601
check value in block header: 0x2a6e, computed block checksum: 0x0
spare1: 0x0, spare2: 0x0, spare3: 0x0
***
Logical Block Corruptions
This is when block contains a valid checksum and the structure below the beginning of the block is corrupt (Block content is corrupt). It may cause different ORA-600 errors.
The detailed corruption description for Logical Corruptions are not normally printed in the alert.log. DBVerify will report what is logically corrupted in the block.
Corruption Examples are:
- row locked by non-existent transaction - ORA-600 [4512],etc
- the amount of space used is not equal to block size
- avsp bad
- etc.
When db_block_checking is enabled, it may produce the internal errors ORA-600 [kddummy_blkchk] or ORA-600 [kdBlkCheckError].
If db_block_checking is enabled and the block is already logically corrupt on disk, the next block update will mark the block as Soft Corrupt and future reads of this block will produce the error ORA-1578. In that case DBVerify reports this corruption with error "DBV-200: Block, dba
References
NOTE:1088018.1 - Master Note for Handling Oracle Database Corruption IssuesNOTE:1578.1 - OERR: ORA-1578 "ORACLE data block corrupted (file # %s, block # %s)" Master Note
NOTE:28814.1 - Handling Oracle Block Corruptions in Oracle7/8/8i/9i/10g/11g
NOTE:794505.1 - ORA-1578 / ORA-26040 Corrupt blocks by NOLOGGING - Error explanation and solution
NOTE:819533.1 - How to identify the corrupt Object reported by ORA-1578 / RMAN / DBVERIFY
NOTE:836658.1 - Identify the corruption extension using RMAN/DBV/ANALYZE etc
一.物理块损坏: 各种各样的块损坏通常是通过ORA-1578报告出来的,详细的信息被记录在alert日志中。物理损坏的例子包括:>坏头>块破坏/不完整 (Fractured Block)>块的checksum不正确 (Bad Checksum)>块的位置错误 (Block Misplaced)>归零的块/ORA-8103
二.逻辑块损坏: 当块包含一个正确的checksum和结构,但是块头一下的部分(块的内容)被损坏,可能引起不同的ORA-600错误。 逻辑损坏详细的损坏信息通常不打印在alert日志中,DBV将报告逻辑损坏的块。逻辑损坏的例子包括:>行被不存在的事务锁定-ORA-600[4512]等>使用的空间大小不等于块的大小>avsp bad>等等 当启用db_block_checking,可能会产生内部错误ORA-600 [kddummy_blkchk] 或者 ORA-600 [kdBlkCheckError]。
三.DB_BLOCK_CHECKSUM和DB_BLOCK_CHECKING
DB_BLOCK_CHECKSUM用于防止物理IO的损坏,默认值是TYPICAL,在DBWR进程写入磁盘的时候会记录数据的CHECKSUM值,将其存储在块头中,下次在读取的时候会重新计算块的CHECKSUM值,与块头进行比对以判断该块是否损坏。如果将其设置为FULL,还会验证内存中的块的CHECKSUM值,避免内存的问题导致块的损坏。 即使将DB_BLOCK_CHECKSUM值设置为FALSE,对于SYSTEM表空间也会进行相关的验证。
DB_BLOCK_CHECKING用于逻辑控制块, 如果db_block_checking是启用,磁盘的块已经被逻辑损坏,下一次更新块将以软损坏标记块,将来对这个块的读将产生ORA-1578错误,在这种情况下报告损坏错误"DBV-200:Block,dba
四.验证工具介绍
对于坏块的验证主要有两个工具DBV和RMAN。
1.DBV工具
DBV是DBVERIFY的缩写,它是执行物理数据结构完整性检查的外部命令行工具,它能用在脱机或联机的数据库,也可以用在备份文件上。因为DBVERYIFY能对脱机数据库运行,完整性检查速度大幅度提高。DBV检查被限制在缓存管理的块(数据块),DBV只用于数据文件,不能用于控制文件或Redo日志文件的检查。
dbv help=y可以看到DBV工具参数详细的帮助信息。
2.RMAN工具
下面是有关RMAN的VALIDATE命令验证逻辑坏块和物理坏块的内容:
How to check for physical and logical database corruption using "backup validate check logical database" command for database on a non-archivelog mode [ID 466875.1] | |||||
修改时间 30-JUN-2011 类型 FAQ 状�?/em> PUBLISHED |
In this Document
Purpose
Questions and Answers
Purpose
How can we use this command for a non-archivelog database so we can use this as opposed to the dbv command ?RMAN> backup validate check logical database;
Questions and Answers
"backup validate check logical database" RMAN command could NOT be used for a non-archivelog database.RMAN> backup validate check logical database;
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_2 channel at 11/06/2007 14:56:31
ORA-19602: cannot backup or copy active file in NOARCHIVELOG mode
The only way to do that for database on no-archive log mode is to perform. above command while the target database in MOUNT status not OPEN
Additional Information:
------------------------------------
- RMAN restore/backup validate with check logical will ensure that there is no logical and also physical corruption.However , DBV would report only physical corruption.
- If the backup validate discovers corrupt blocks, then RMAN updates the V$DATABASE_BLOCK_CORRUPTION view with rows describing the corruptions.
- After corrupt block is repaired the row identifies this block is deleted from the view.
从上面的描述可以看出,DBV只报告物理损坏,validate check logical database可以验证物理损坏和逻辑损坏,执行VALIDATE命令后通过查询V$DATABASE_BLOCK_CORRUPTION视图可以获得坏块的详细信息,块被修复后V$DATABASE_BLOCK_CORRUPTION中相应的记录会被删除。该命令只能运行在归档模式下的数据库,如果要在非归档模式下执行该命令,必须使数据库在MOUNT模式。
执行下面的命令能验证归档日志的逻辑损坏和物理损坏:
RMAN > BACKUP VALIDATE CHECK LOGICAL DATABASE ARCHIVELOG ALL;
执行下面的命令验证数据库物理损坏的块:
RMAN > BACKUP VALIDATE DATABASE|ARCHIVELOG ALL;
没有CHECK LOGICAL关键字的BACKUP命令只会验证物理损坏。