Oracle RAC Failover 说明
Oracle RAC 同时具备HA(High Availiablity) 和LB(LoadBalance). 而其高可用性的基础就是Failover(故障转移)。
RAC Failover指集群中任何一个节点的故障都不会影响用户的使用,连接到故障节点的用户会被自动转移到健康节点,从用户感受而言, 是感觉不到这种切换。
一、Oracle RAC 的Failover 可以分为3种:
1. Client-Side Connect time Failover
2. Client-Side TAF
3. Service-Side TAF
注意事项:
不能在listener.ora 文件中设置GLOBAL_NAME, 因为这个参数会禁用Connect-time Failover 和 Transparent Application Failover(TAF)。
1.1、Client-Side Connect Time Failover
1.1.1、Client-Side Connect Time Failover的含义
如果用户端tnsname 中配置了多个地址,用户发起连接请求时,会先尝试连接地址表中的第一个地址,如果这个连接尝试失败,则继续尝试使用第二个地址,直至连接成功或者遍历了所有的地址。
1.1.2、Client-Side Connect Time Failover的特点
只在建立连接那一时刻起作用,也就是说,这种Failover方式只在发起连接时才会去感知节点故障,如果节点没有反应,则自动尝试地址列表中的下一个地址。一旦连接建立之后,节点出现故障都不会做处理,从客户端的表现就是会话断开了,用户程序必须重新建立连接。
启用这种Failover的方法就是在客户端的tnsnames.ora中添加FAILOVER=ON 条目,这个参数默认就是ON,所以即使不添加这个条目,客户端也会获得这种Failover能力。
配置如下:
1.2、Client-Side TAF(Transparent Application Failover
1.2.1、Client-Side TAF(Transparent Application Failover)的含义:
所谓TAF,就是连接建立以后,应用系统运行过程中,如果某个实例发生故障,连接到这个实例上的用户会被自动迁移到其他的健康实例上。
1.2.2、TAF的配置方法:
TAF 的配置也很简单,只需要在客户端的tnsnames.ora中添加FAILOVER_MODE配置项。这个条目有4个子项目需要定义:
A、METHOD: 用户定义何时创建到其实例的连接,有BASIC 和 PRECONNECT 两种可选值
A-1、BASIC: 是指在感知到节点故障时才创建到其他实例的连接。
A-2、PRECONNECT::是在最初建立连接时就同时建立到所有实例的连接,当发生故障时,立刻就可以切换到其他链路上。
两种方法比较: BASIC方式在Failover时会有时间延迟,PRECONNECT方式虽然没有时间延迟,但是建立多个冗余连接会消耗更多资源,两者就是是用时间换资源和用资源换时间的区别。
B、TYPE: 用于定义发生故障时对完成的SQL 语句如何处理,其中有2种类型:session 和select。
这2种方式对于未提交的事务都会自动回滚,区别在于对select 语句的处理,对于select,用户正在执行的select语句会被转移到新的实例上,在新的节点上继续返回后续结果集,而已经返回的记录集则抛弃。
假设用户正在节点1上执行查询,整个结果集共有100条记录,现在已从节点1上返回10条记录,这时节点1宕机,用户连接被转移到节点2上,如果是session模式,则需要重新执行查询语句。
如果是select方式,会从节点2上继续返回剩下的90条记录,而已经从节点1返回的10条记录不会重复返回给用户,对于用户而言,感受不到这种切换。
显然为了实现select 方式,Oracle 必须为每个session保存更多的内容,包括游标,用户上下文等,需要更多的资源也是用资源换时间的方案。
C、DELAY 和 RETRIES: 这2个参数分别代表重试间隔时间和重试次数
1.2.3、配置示例
(TNS端)(该方式虽然配置简单,但是依赖于用户层面做修改,因此实际运用上不方便)
以上就是关于Client-Side TAF的配置方式,虽然该方式也可以较好的实现RAC环境的高可用性,但是他要求在客户端的TNSNAME中配置,假如现在有1000多个用户连接数据库,那么我就需要管理这1000多个用户的tnsname,那么显然这种方式是不合理的。所以这就是以下Server-Side TAF要解决的。
配置方法如下:
配置客户端的tnsname.ora,加入如下内容(节点1为例):
OR11G1 =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.127.88)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.127.90)(PORT = 1521))
(LOAD_BALANCE = yes)
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = or11g)
(
FAILOVER_MODE=
(TYPE=select)
(METHOD=basic)
(RETRIES=180)
(DELAY=5)
)
)
)
然后再来测试Failover(在节点1上操作)
[[email protected] ~]$ sqlplus system/[email protected]
SQL*Plus: Release 11.2.0.4.0 Production on Fri Apr 12 15:48:19 2019
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
or11g1
SQL>
可以看出,当前登录的实例是or11g1
然后在窗口1中关闭节点1数据库
[[email protected] ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Fri Apr 12 15:52:27 2019
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
or11g1
SQL>
SQL> shutdown abort;
ORACLE instance shut down.
SQL>
然后直接在之前以system登录的窗口中执行之前的之前同样的查询语句
[[email protected] ~]$ sqlplus system/[email protected]
SQL*Plus: Release 11.2.0.4.0 Production on Fri Apr 12 15:48:19 2019
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
or11g1
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
or11g2
SQL>
以上就是TYPE=select的情况,如果将TYPE=session,结果如何?(节点1上tnsname.ora)
将节点1的数据库启动
SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 1202556928 bytes
Fixed Size 2252704 bytes
Variable Size 838860896 bytes
Database Buffers 352321536 bytes
Redo Buffers 9121792 bytes
Database mounted.
Database opened.
SQL>
然后再修改节点1的tnsname.ora文件
OR11G1 =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.127.88)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.127.90)(PORT = 1521))
(LOAD_BALANCE = yes)
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = or11g)
(
FAILOVER_MODE=
(TYPE=session)
(METHOD=basic)
(RETRIES=180)
(DELAY=5)
)
)
)
以system用户登录节点1
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
[[email protected] ~]$
[[email protected] ~]$
[[email protected] ~]$ sqlplus system/[email protected]
SQL*Plus: Release 11.2.0.4.0 Production on Fri Apr 12 16:05:06 2019
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
or11g1
SQL>
接下来在打开节点1窗口以dba身份登录数据库,并关闭数据库
[[email protected] ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Fri Apr 12 16:06:38 2019
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL>
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
or11g1
SQL> shutdown abort
ORACLE instance shut down.
SQL>
再回到以system身份登录节点1数据库的窗口,执行同样的查询,看看结果如何:
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
or11g1
SQL> select instance_name from v$instance;
select instance_name from v$instance
*
ERROR at line 1:
ORA-25408: can not safely replay call
SQL>
结论:
1、当TYPE=session时,用户的select语句正在执行中的时候,如果此时数据库异常关闭,持续返回的结果信息被中断,紧随其后的是报错信息“ORA-25401: can not continue fetches”,在报错的同时该连接已经重新连接到其他可用的数据库实例上。
**************************************************************************************************************************************************************************************************************************************************************************************************
测试对用户查询的影响
a、启动节点1的数据库,然后往wireless_site.CLICKTHROUGHRATE中插入200多W行数据
SQL> select count(*) from wireless_site.CLICKTHROUGHRATE;
COUNT(*)
----------
2384026
SQL>
b、配置tnsname.ora文件
OR11G1 =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.127.88)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.127.90)(PORT = 1521))
(LOAD_BALANCE = yes)
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = or11g)
(
FAILOVER_MODE=
(TYPE=session)
(METHOD=basic)
(RETRIES=180)
(DELAY=5)
)
)
)
c、执行查询语句:
select siteid from wireless_site.CLICKTHROUGHRATE;
[[email protected] ~]$ sqlplus system/[email protected]
SQL*Plus: Release 11.2.0.4.0 Production on Fri Apr 12 16:36:19 2019
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
or11g1
SQL>select siteid from wireless_site.CLICKTHROUGHRATE;
d、在执行查询中,关闭节点1数据库,看看会发生什么
[[email protected] ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Fri Apr 12 16:37:42 2019
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> shutdown abort;
ORACLE instance shut down.
SQL>
再看看查询会有什么变化
SITEID
----------------
ST001
ST001
ST001
ST001
ST001
ST001
ST001
ST001
ST001
ST001
ST001
SITEID
----------------
ST001
ST001
ST001
ST001
ST001
ST001
ST001
ST001
ST001
ST001
ST001
SITEID
----------------
ST001
ST001
ST001
ST001
ST001
ST001
ST001
ST001
ST001
ERROR:
ORA-25401: can not continue fetches
19215 rows selected.
SQL>
表wireless_site.CLICKTHROUGHRATE中2384026条数据,在执行查询的时候,我突然将节点1数据库关闭,此时数据查询窗口会中断并收到一个报错信息ORA-25401:: can not continue fetches
e、现在将tnsname.ora中的配置改为如下:
OR11G1 =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.127.88)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.127.90)(PORT = 1521))
(LOAD_BALANCE = yes)
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = or11g)
(
FAILOVER_MODE=
(TYPE=select)
(METHOD=basic)
(RETRIES=180)
(DELAY=5)
)
)
)
f、启动节点1数据库,并执行查询
SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 1202556928 bytes
Fixed Size 2252704 bytes
Variable Size 838860896 bytes
Database Buffers 352321536 bytes
Redo Buffers 9121792 bytes
Database mounted.
Database opened.
SQL>
执行查询中关闭数据库
[[email protected] ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Fri Apr 12 16:47:20 2019
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
or11g1
SQL> shutdown abort;
ORACLE instance shut down.
SQL>
当执行到34080行时,将数据库异常关闭,此时sql界面会短暂停顿(大概3-5s),然后继续执行,直到执行完毕。
ST001
SITEID
----------------
ST001
ST001
ST001
ST001
ST001
ST001
ST001
2384026 rows selected.
SQL>