Oracle OCR/VF磁盘组故障恢复
OCR/VF磁盘组故障的恢复:
备份准备:(采用手工导出方式)
[[email protected] ~]# /oracle/grid/crs_1/bin/ocrconfig -export ocr_export
[[email protected] ~]# ll ocr_export
-rw------- 1 root root 123903 Jul 22 17:43 ocr_export
集群正常运行
[[email protected] ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3240
Available space (kbytes) : 258880
ID : 785902757
Device/File Name : +OCR
Device/File integrity check succeeded
[[email protected] ~]$ crsctl query css votedisk;
## STATE File Universal Id File Name Disk group
\1. ONLINE 17e2d627b8c84fabbffc1f533d533d0d (/dev/asm-disk2) [OCR]
\2. ONLINE a8ef977544d64f0dbffab877ec75e1dd (/dev/asm-disk3) [OCR]
\3. ONLINE 2fb3c679b7b64f8abf43887b3fbff81d (/dev/asm-disk4) [OCR]
故障模拟:
将asm-disk2 asm-disk3 asm-disk4破坏。
[[email protected] ~]# dd if=/dev/zero of=/dev/asm-disk2 bs=1024 count=1024;
[[email protected] ~]# dd if=/dev/zero of=/dev/asm-disk3 bs=1024 count=1024;
[[email protected] ~]# dd if=/dev/zero of=/dev/asm-disk4 bs=1024 count=1024;
[[email protected] ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[[email protected] ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3240
Available space (kbytes) : 258880
ID : 785902757
Device/File Name : +OCR
Device/File integrity check succeeded
此时还没有异常,我们重启集群。
[[email protected] ~]# /oracle/grid/crs_1/bin/crsctl stop crs
[[email protected] ~]# /oracle/grid/crs_1/bin/crsctl stop crs
再次启动:(此时无法启动)
[[email protected] ~]# /oracle/grid/crs_1/bin/crsctl start crs
[[email protected] ~]# /oracle/grid/crs_1/bin/crsctl start crs
集群状态如下:(仅有OHASD进程启动)
[[email protected] ~]# /oracle/grid/crs_1/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon CRS-4534: Cannot communicate with Event Manager
日志信息:(集群警告日志信息)$ORACLE_HOME/log/hostname/alert<hostname>.log
CSS日志输出(CRSD日志无异常输出)
从日志中可以看出,无法发现VF,问题很明显。
尝试使用ocrcheck检查ocr状态
[[email protected] ~]$ ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
ORA-29701: unable to connect to Cluster Synchronization Service
[[email protected] ~]$ crsctl query css votedisk
//无返回值
开始恢复:
恢复思路:
如果直接使用import进行恢复:
[[email protected] ~]# /oracle/grid/crs_1/bin/ocrconfig -import ocr_export
PROT-1: Failed to initialize ocrconfig
PROC-26: Error while accessing the physical storage
ORA-29701: unable to connect to Cluster Synchronization Service
可以看出此时是无法直接使用ocrconfig -import 恢复OCR,因为OCR磁盘组已经不存在,我们dd清空了磁盘,那么磁盘组信息也就会消失,此时就需要我们手动新创建一个OCR磁盘组,然后再使用ocrconfig import 向磁盘组中导入OCR。由于创建一个OCR磁盘组,需要启动ASM实例,但是由于集群故障,ASM实例无法启动,所以我们采用独占模式启动集群,这样CRS不会启动,但是ASM实例可以启动成功。
关闭集群,以独占模式启动集群:
由于OHASD进程已经启动,所以需要强制关闭OHASD进程
[[email protected] ~]# /oracle/grid/crs_1/bin/crsctl stop has -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
[[email protected] ~]# /oracle/grid/crs_1/bin/crsctl stop has -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac2'
发现两节点hang住,无法强制关闭OHASD进程
准备重启操作系统,重启前要关闭集群自启动,否则操作系统重启后,集群会自动启动OHASD进程。
[[email protected] ~]#/oracle/grid/crs_1/bin/crsctl disable has
[[email protected] ~]#/oracle/grid/crs_1/bin/crsctl disable crs--关闭开机重启
重启操作系统后查看集群状态:
[[email protected] ~]$ crsctl check crs CRS-4639: Could not contact Oracle High Availability Services
以独占模式启动集群:
[[email protected] ~]#/oracle/grid/crs_1/bin/crsctl start crs -excl -nocrs
[[email protected] ~]#/oracle/grid/crs_1/bin/crsctl start crs -excl -nocrs
输入如下:
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2679: Attempting to clean 'ora.asm' on 'rac1'
CRS-2681: Clean of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
连接ASM实例:
[[email protected] ~]$ sqlplus / as sysasm
查看当前磁盘组信息:
SQL> select name,state from v$asm_diskgroup;
NAME STATE
--- ----
DATA MOUNTED
创建磁盘组:
SQL>create diskgroup ocr normal redundancy DISK '/dev/asm-disk2' ,'/dev/asm-disk3' ,'/dev/asm-disk4' ATTRIBUTE 'compatible.asm'='11.2.0.0.0';
Diskgroup created.
//compatible.asm'='11.2.0.0.0'这个参数值一定要附加上,否则后续还需要修改,默认创建为10.0.0.0。
再次查看:
SQL> select name,state from v$asm_diskgroup;
NAME STATE
---- ----
DATA MOUNTED
OCR MOUNTED
使用ocrconfig进行恢复:
[[email protected] ~]# /oracle/grid/crs_1/bin/ocrconfig -import ocr_export
恢复完成后,执行ocrcheck
[[email protected] ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3240
Available space (kbytes) : 258880
ID : 785902757
Device/File Name : +OCR
Device/File integrity check succeeded
OCR恢复成功。
恢复VF:
[[email protected] ~]# /oracle/grid/crs_1/bin/crsctl replace votedisk +OCR
Successful addition of voting disk 5d0d201e0ab24f66bf24bfd4a88f2f30.
Successful addition of voting disk 9520d9d3ab8d4fefbfe5d05b62dac9cf.
Successful addition of voting disk 361f26ddd0b34feabfeba6a1123533d7.
Successfully replaced voting disk group with +OCR.
CRS-4266: Voting file(s) successfully replaced
查看VF信息:
[[email protected] ~]$ crsctl query css votedisk
\## STATE File Universal Id File Name Disk group
\1. ONLINE 5d0d201e0ab24f66bf24bfd4a88f2f30 (/dev/asm-disk2) [OCR]
\2. ONLINE 9520d9d3ab8d4fefbfe5d05b62dac9cf (/dev/asm-disk3) [OCR]
\3. ONLINE 361f26ddd0b34feabfeba6a1123533d7 (/dev/asm-disk4) [OCR]
停止独占模式运行的clusterware
[[email protected] ~]# /oracle/grid/crs_1/bin/crsctl stop crs -f
[[email protected] ~]# /oracle/grid/crs_1/bin/crsctl stop crs -f
所有节点正常启动crs
[[email protected] ~]# /oracle/grid/crs_1/bin/crsctl start crs
[[email protected] ~]# /oracle/grid/crs_1/bin/crsctl start crs
查看资源信息无异常。