前提:
1)本配置共有兩個(gè)測(cè)試節(jié)點(diǎn),分別node1.amd5.cn和node2.amd5.cn,相的IP地址分別為172.16.100.11和172.16.100.12;
2)集群服務(wù)為apache的httpd服務(wù);
3)提供web服務(wù)的地址為172.16.100.1;
4)系統(tǒng)為rhel5.8
1、準(zhǔn)備工作
為了配置一臺(tái)Linux主機(jī)成為HA的節(jié)點(diǎn),通常需要做出如下的準(zhǔn)備工作:
1)所有節(jié)點(diǎn)的主機(jī)名稱和對(duì)應(yīng)的IP地址解析服務(wù)可以正常工作,且每個(gè)節(jié)點(diǎn)的主機(jī)名稱需要跟"uname -n“命令的結(jié)果保持一致;因此,需要保證兩個(gè)節(jié)點(diǎn)上的/etc/hosts文件均為下面的內(nèi)容:
172.16.100.11 node1.amd5.cn node1
172.16.100.12 node2.amd5.cn node2
為了使得重新啟動(dòng)系統(tǒng)后仍能保持如上的主機(jī)名稱,還分別需要在各節(jié)點(diǎn)執(zhí)行類似如下的命令:
Node1:
# sed -i 's@\(HOSTNAME=\).*@\1node1.amd5.cn@g' /etc/sysconfig/network
# hostname node1.amd5.cn
Node2:
# sed -i 's@\(HOSTNAME=\).*@\1node2.amd5.cn@g' /etc/sysconfig/network
# hostname node2.amd5.cn
2)設(shè)定兩個(gè)節(jié)點(diǎn)可以基于密鑰進(jìn)行ssh通信,這可以通過(guò)類似如下的命令實(shí)現(xiàn):
Node1:
# ssh-keygen -t rsa
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2
Node2:
# ssh-keygen -t rsa
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1
2、安裝如下rpm包:
libibverbs, librdmacm, lm_sensors, libtool-ltdl, openhpi-libs, openhpi, perl-TimeDate
3、安裝corosync和pacemaker,首先下載所需要如下軟件包至本地某專用目錄(這里為/root/cluster):
cluster-glue
cluster-glue-libs
heartbeat
resource-agents
corosync
heartbeat-libs
pacemaker
corosynclib
libesmtp
pacemaker-libs
下載地址:http://clusterlabs.org/。請(qǐng)根據(jù)硬件平臺(tái)及操作系統(tǒng)類型選擇對(duì)應(yīng)的軟件包;這里建議每個(gè)軟件包都使用目前最新的版本。
使用如下命令安裝:
# cd /root/cluster
# yum -y --nogpgcheck localinstall *.rpm
4、配置corosync,(以下命令在node1.amd5.cn上執(zhí)行)
# cd /etc/corosync
# cp corosync.conf.example corosync.conf
接著編輯corosync.conf,添加如下內(nèi)容:
service {
ver: 0
name: pacemaker
# use_mgmtd: yes
}
aisexec {
user: root
group: root
}
并設(shè)定此配置文件中 bindnetaddr后面的IP地址為你的網(wǎng)卡所在網(wǎng)絡(luò)的網(wǎng)絡(luò)地址,我們這里的兩個(gè)節(jié)點(diǎn)在172.16.0.0網(wǎng)絡(luò),因此這里將其設(shè)定為172.16.0.0;如下
bindnetaddr: 172.16.0.0
生成節(jié)點(diǎn)間通信時(shí)用到的認(rèn)證密鑰文件:
# corosync-keygen
將corosync和authkey復(fù)制至node2:
# scp -p corosync authkey node2:/etc/corosync/
分別為兩個(gè)節(jié)點(diǎn)創(chuàng)建corosync生成的日志所在的目錄:
# mkdir /var/log/cluster
# ssh node2 'mkdir /var/log/cluster'
5、嘗試啟動(dòng),(以下命令在node1上執(zhí)行):
# /etc/init.d/corosync start
查看corosync引擎是否正常啟動(dòng):
# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/messages
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1397.
Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
查看初始化成員節(jié)點(diǎn)通知是否正常發(fā)出:
# grep TOTEM /var/log/messages
Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transport (UDP/IP).
Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] The network interface [172.16.100.11] is now up.
Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
檢查啟動(dòng)過(guò)程中是否有錯(cuò)誤產(chǎn)生:
# grep ERROR: /var/log/messages | grep -v unpack_resources
查看pacemaker是否正常啟動(dòng):
# grep pcmk_startup /var/log/messages
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: CRM: Initialized
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] Logging: Initialized pcmk_startup
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Service: 9
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Local hostname: node1.amd5.cn
如果上面命令執(zhí)行均沒(méi)有問(wèn)題,接著可以執(zhí)行如下命令啟動(dòng)node2上的corosync
# ssh node2 -- /etc/init.d/corosync start
注意:?jiǎn)?dòng)node2需要在node1上使用如上命令進(jìn)行,不要在node2節(jié)點(diǎn)上直接啟動(dòng);
使用如下命令查看集群節(jié)點(diǎn)的啟動(dòng)狀態(tài):
# crm status
============
Last updated: Tue Jun 14 19:07:06 2011
Stack: openais
Current DC: node1.amd5.cn - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
0 Resources configured.
============
Online: [ node1.amd5.cn node2.amd5.cn ]
從上面的信息可以看出兩個(gè)節(jié)點(diǎn)都已經(jīng)正常啟動(dòng),并且集群已經(jīng)處于正常工作狀態(tài)。
執(zhí)行ps auxf命令可以查看corosync啟動(dòng)的各相關(guān)進(jìn)程。
root 4665 0.4 0.8 86736 4244 ? Ssl 17:00 0:04 corosync
root 4673 0.0 0.4 11720 2260 ? S 17:00 0:00 \_ /usr/lib/heartbeat/stonithd
101 4674 0.0 0.7 12628 4100 ? S 17:00 0:00 \_ /usr/lib/heartbeat/cib
root 4675 0.0 0.3 6392 1852 ? S 17:00 0:00 \_ /usr/lib/heartbeat/lrmd
101 4676 0.0 0.4 12056 2528 ? S 17:00 0:00 \_ /usr/lib/heartbeat/attrd
101 4677 0.0 0.5 8692 2784 ? S 17:00 0:00 \_ /usr/lib/heartbeat/pengine
101 4678 0.0 0.5 12136 3012 ? S 17:00 0:00 \_ /usr/lib/heartbeat/crmd
6、配置集群的工作屬性,禁用stonith
corosync默認(rèn)啟用了stonith,而當(dāng)前集群并沒(méi)有相應(yīng)的stonith設(shè)備,因此此默認(rèn)配置目前尚不可用,這可以通過(guò)如下命令驗(yàn)正:
# crm_verify -L
crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
-V may provide more details
我們里可以通過(guò)如下命令先禁用stonith:
# crm configure property stonith-enabled=false
使用如下命令查看當(dāng)前的配置信息:
# crm configure show
node node1.amd5.cn
node node2.amd5.cn
property $id="cib-bootstrap-options" \
dc-versi \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false
從中可以看出stonith已經(jīng)被禁用。
上面的crm,crm_verify命令是1.0后的版本的pacemaker提供的基于命令行的集群管理工具;可以在集群中的任何一個(gè)節(jié)點(diǎn)上執(zhí)行。
7、為集群添加集群資源
corosync支持heartbeat,LSB和ocf等類型的資源代理,目前較為常用的類型為L(zhǎng)SB和OCF兩類,stonith類專為配置stonith設(shè)備而用;
可以通過(guò)如下命令查看當(dāng)前集群系統(tǒng)所支持的類型:
# crm ra classes
heartbeat
lsb
ocf / heartbeat pacemaker
stonith
如果想要查看某種類別下的所用資源代理的列表,可以使用類似如下命令實(shí)現(xiàn):
# crm ra list lsb
# crm ra list ocf heartbeat
# crm ra list ocf pacemaker
# crm ra list stonith
# crm ra info [class:[provider:]]resource_agent
例如:
# crm ra info ocf:heartbeat:IPaddr
8、接下來(lái)要?jiǎng)?chuàng)建的web集群創(chuàng)建一個(gè)IP地址資源,以在通過(guò)集群提供web服務(wù)時(shí)使用;這可以通過(guò)如下方式實(shí)現(xiàn):
語(yǔ)法:
primitive <rsc> [<class>:[<provider>:]]<type>
[params attr_list]
[operations id_spec]
[op op_type [<attribute>=<value>...] ...]
op_type :: start | stop | monitor
例子:
primitive apcfence stonith:apcsmart \
params ttydev=/dev/ttyS0 hostlist="node1 node2" \
op start timeout=60s \
op m timeout=60s
應(yīng)用:
# crm configure primitive WebIP ocf:heartbeat:IPaddr params ip=172.16.100.1
通過(guò)如下的命令執(zhí)行結(jié)果可以看出此資源已經(jīng)在node1.amd5.cn上啟動(dòng):
# crm status
============
Last updated: Tue Jun 14 19:31:05 2011
Stack: openais
Current DC: node1.amd5.cn - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.amd5.cn node2.amd5.cn ]
WebIP (ocf::heartbeat:IPaddr): Started node1.amd5.cn
當(dāng)然,也可以在node1上執(zhí)行ifconfig命令看到此地址已經(jīng)在eth0的別名上生效:
# ifconfig
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:AA:DD:CF
inet addr:172.16.100.1 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:67 Base address:0x2000
而后我們到node2上通過(guò)如下命令停止node1上的corosync服務(wù):
# ssh node1 -- /etc/init.d/corosync stop
查看集群工作狀態(tài):
# crm status
============
Last updated: Tue Jun 14 19:37:23 2011
Stack: openais
Current DC: node2.amd5.cn - partition WITHOUT quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node2.amd5.cn ]
OFFLINE: [ node1.amd5.cn ]
上面的信息顯示node1.amd5.cn已經(jīng)離線,但資源WebIP卻沒(méi)能在node2.amd5.cn上啟動(dòng)。這是因?yàn)榇藭r(shí)的集群狀態(tài)為"WITHOUT quorum",即已經(jīng)失去了quorum,此時(shí)集群服務(wù)本身已經(jīng)不滿足正常運(yùn)行的條件,這對(duì)于只有兩節(jié)點(diǎn)的集群來(lái)講是不合理的。因此,我們可以通過(guò)如下的命令來(lái)修改忽略quorum不能滿足的集群狀態(tài)檢查:
# crm configure property no-quorum-policy=ignore
片刻之后,集群就會(huì)在目前仍在運(yùn)行中的節(jié)點(diǎn)node2上啟動(dòng)此資源了,如下所示:
# crm status
============
Last updated: Tue Jun 14 19:43:42 2011
Stack: openais
Current DC: node2.amd5.cn - partition WITHOUT quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node2.amd5.cn ]
OFFLINE: [ node1.amd5.cn ]
WebIP (ocf::heartbeat:IPaddr): Started node2.amd5.cn
好了,驗(yàn)正完成后,我們正常啟動(dòng)node1.amd5.cn:
# ssh node1 -- /etc/init.d/corosync start
正常啟動(dòng)node1.amd5.cn后,集群資源WebIP很可能會(huì)重新從node2.amd5.cn轉(zhuǎn)移回node1.amd5.cn。資源的這種在節(jié)點(diǎn)間每一次的來(lái)回流動(dòng)都會(huì)造成那段時(shí)間內(nèi)其無(wú)法正常被訪問(wèn),所以,我們有時(shí)候需要在資源因?yàn)楣?jié)點(diǎn)故障轉(zhuǎn)移到其它節(jié)點(diǎn)后,即便原來(lái)的節(jié)點(diǎn)恢復(fù)正常也禁止資源再次流轉(zhuǎn)回來(lái)。這可以通過(guò)定義資源的黏性(stickiness)來(lái)實(shí)現(xiàn)。在創(chuàng)建資源時(shí)或在創(chuàng)建資源后,都可以指定指定資源黏性。
資源黏性值范圍及其作用:
0:這是默認(rèn)選項(xiàng)。資源放置在系統(tǒng)中的最適合位置。這意味著當(dāng)負(fù)載能力“較好”或較差的節(jié)點(diǎn)變得可用時(shí)才轉(zhuǎn)移資源。此選項(xiàng)的作用基本等同于自動(dòng)故障回復(fù),只是資源可能會(huì)轉(zhuǎn)移到非之前活動(dòng)的節(jié)點(diǎn)上;
大于0:資源更愿意留在當(dāng)前位置,但是如果有更合適的節(jié)點(diǎn)可用時(shí)會(huì)移動(dòng)。值越高表示資源越愿意留在當(dāng)前位置;
小于0:資源更愿意移離當(dāng)前位置。絕對(duì)值越高表示資源越愿意離開當(dāng)前位置;
INFINITY:如果不是因節(jié)點(diǎn)不適合運(yùn)行資源(節(jié)點(diǎn)關(guān)機(jī)、節(jié)點(diǎn)待機(jī)、達(dá)到migration-threshold 或配置更改)而強(qiáng)制資源轉(zhuǎn)移,資源總是留在當(dāng)前位置。此選項(xiàng)的作用幾乎等同于完全禁用自動(dòng)故障回復(fù);
-INFINITY:資源總是移離當(dāng)前位置;
我們這里可以通過(guò)以下方式為資源指定默認(rèn)黏性值:
# crm configure rsc_defaults resource-stickiness=100
9、結(jié)合上面已經(jīng)配置好的IP地址資源,將此集群配置成為一個(gè)active/passive模型的web(httpd)服務(wù)集群
為了將此集群?jiǎn)⒂脼閣eb(httpd)服務(wù)器集群,我們得先在各節(jié)點(diǎn)上安裝httpd,并配置其能在本地各自提供一個(gè)測(cè)試頁(yè)面。
Node1:
# yum -y install httpd
# echo "<h1>Node1.amd5.cn</h1>" > /var/www/html/index.html
Node2:
# yum -y install httpd
# echo "<h1>Node2.amd5.cn</h1>" > /var/www/html/index.html
而后在各節(jié)點(diǎn)手動(dòng)啟動(dòng)httpd服務(wù),并確認(rèn)其可以正常提供服務(wù)。接著使用下面的命令停止httpd服務(wù),并確保其不會(huì)自動(dòng)啟動(dòng)(在兩個(gè)節(jié)點(diǎn)各執(zhí)行一遍):
# /etc/init.d/httpd stop
# chkconfig httpd off
接下來(lái)我們將此httpd服務(wù)添加為集群資源。將httpd添加為集群資源有兩處資源代理可用:lsb和ocf:heartbeat,為了簡(jiǎn)單起見,我們這里使用lsb類型:
首先可以使用如下命令查看lsb類型的httpd資源的語(yǔ)法格式:
# crm ra info lsb:httpd
lsb:httpd
Apache is a World Wide Web server. It is used to serve \
HTML files and CGI.
Operations' defaults (advisory minimum):
start timeout=15
stop timeout=15
status timeout=15
restart timeout=15
force-reload timeout=15
monitor interval=15 timeout=15 start-delay=15
接下來(lái)新建資源WebSite:
# crm configure primitive WebSite lsb:httpd
查看配置文件中生成的定義:
node node1.amd5.cn
node node2.amd5.cn
primitive WebIP ocf:heartbeat:IPaddr \
params ip="172.16.100.1"
primitive WebSite lsb:httpd
property $id="cib-bootstrap-options" \
dc-versi \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
查看資源的啟用狀態(tài):
# crm status
============
Last updated: Tue Jun 14 19:57:31 2011
Stack: openais
Current DC: node2.amd5.cn - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ node1.amd5.cn node2.amd5.cn ]
WebIP (ocf::heartbeat:IPaddr): Started node1.amd5.cn
WebSite (lsb:httpd): Started node2.amd5.cn
從上面的信息中可以看出WebIP和WebSite有可能會(huì)分別運(yùn)行于兩個(gè)節(jié)點(diǎn)上,這對(duì)于通過(guò)此IP提供Web服務(wù)的應(yīng)用來(lái)說(shuō)是不成立的,即此兩者資源必須同時(shí)運(yùn)行在某節(jié)點(diǎn)上。
由此可見,即便集群擁有所有必需資源,但它可能還無(wú)法進(jìn)行正確處理。資源約束則用以指定在哪些群集節(jié)點(diǎn)上運(yùn)行資源,以何種順序裝載資源,以及特定資源依賴于哪些其它資源。pacemaker共給我們提供了三種資源約束方法:
1)Resource Location(資源位置):定義資源可以、不可以或盡可能在哪些節(jié)點(diǎn)上運(yùn)行;
2)Resource Collocation(資源排列):排列約束用以定義集群資源可以或不可以在某個(gè)節(jié)點(diǎn)上同時(shí)運(yùn)行;
3)Resource Order(資源順序):順序約束定義集群資源在節(jié)點(diǎn)上啟動(dòng)的順序;
定義約束時(shí),還需要指定分?jǐn)?shù)。各種分?jǐn)?shù)是集群工作方式的重要組成部分。其實(shí),從遷移資源到?jīng)Q定在已降級(jí)集群中停止哪些資源的整個(gè)過(guò)程是通過(guò)以某種方式修改分?jǐn)?shù)來(lái)實(shí)現(xiàn)的。分?jǐn)?shù)按每個(gè)資源來(lái)計(jì)算,資源分?jǐn)?shù)為負(fù)的任何節(jié)點(diǎn)都無(wú)法運(yùn)行該資源。在計(jì)算出資源分?jǐn)?shù)后,集群選擇分?jǐn)?shù)最高的節(jié)點(diǎn)。INFINITY(無(wú)窮大)目前定義為 1,000,000。加減無(wú)窮大遵循以下3個(gè)基本規(guī)則:
1)任何值 + 無(wú)窮大 = 無(wú)窮大
2)任何值 - 無(wú)窮大 = -無(wú)窮大
3)無(wú)窮大 - 無(wú)窮大 = -無(wú)窮大
定義資源約束時(shí),也可以指定每個(gè)約束的分?jǐn)?shù)。分?jǐn)?shù)表示指派給此資源約束的值。分?jǐn)?shù)較高的約束先應(yīng)用,分?jǐn)?shù)較低的約束后應(yīng)用。通過(guò)使用不同的分?jǐn)?shù)為既定資源創(chuàng)建更多位置約束,可以指定資源要故障轉(zhuǎn)移至的目標(biāo)節(jié)點(diǎn)的順序。
因此,對(duì)于前述的WebIP和WebSite可能會(huì)運(yùn)行于不同節(jié)點(diǎn)的問(wèn)題,可以通過(guò)以下命令來(lái)解決:
# crm configure colocation website-with-ip INFINITY: WebSite WebIP
接著,我們還得確保WebSite在某節(jié)點(diǎn)啟動(dòng)之前得先啟動(dòng)WebIP,這可以使用如下命令實(shí)現(xiàn):
# crm configure order httpd-after-ip mandatory: WebIP WebSite
此外,由于HA集群本身并不強(qiáng)制每個(gè)節(jié)點(diǎn)的性能相同或相近,所以,某些時(shí)候我們可能希望在正常時(shí)服務(wù)總能在某個(gè)性能較強(qiáng)的節(jié)點(diǎn)上運(yùn)行,這可以通過(guò)位置約束來(lái)實(shí)現(xiàn):
# crm configure location prefer-node1 WebSite rule 200: node1
這條命令實(shí)現(xiàn)了將WebSite約束在node1上,且指定其分?jǐn)?shù)為200;
補(bǔ)充知識(shí):
多播地址(multicast address)即組播地址,是一組主機(jī)的標(biāo)示符,它已經(jīng)加入到一個(gè)多播組中。在以太網(wǎng)中,多播地址是一個(gè)48位的標(biāo)示符,命名了一組應(yīng)該在這個(gè)網(wǎng)絡(luò)中應(yīng)用接收到一個(gè)分組的站點(diǎn)。在IPv4中,它歷史上被叫做D類地址,一種類型的IP地址,它的范圍從224.0.0.0到239.255.255.255,或,等同的,在224.0.0.0/4。在IPv6,多播地址都有前綴ff00::/8。
多播是第一個(gè)字節(jié)的最低位為1的所有地址,例如01-12-0f-00-00-02。廣播地址是全1的48位地址,也屬于多播地址。但是廣播又是多播中的特例,就像是正方形屬于長(zhǎng)方形,但是正方形有長(zhǎng)方形沒(méi)有的特點(diǎn)。
colocation (collocation)
This constraint expresses the placement relation between two or more resources. If there are more than two resources, then the constraint is called a resource set. Collocation resource sets have an extra attribute to allow for sets of resources which don’t depend on each other in terms of state. The shell syntax for such sets is to put resources in parentheses.
Usage:
colocation <id> <score>: <rsc>[:<role>] <rsc>[:<role>] ...
Example:
colocation dummy_and_apache -inf: apache dummy
colocation c1 inf: A ( B C )
order
This constraint expresses the order of actions on two resources or more resources. If there are more than two resources, then the constraint is called a resource set. Ordered resource sets have an extra attribute to allow for sets of resources whose actions may run in parallel. The shell syntax for such sets is to put resources in parentheses.
Usage:
order <id> score-type: <rsc>[:<action>] <rsc>[:<action>] ...
[symmetrical=<bool>]
score-type :: advisory | mandatory | <score>
Example:
order c_apache_1 mandatory: apache:start ip_1
order o1 inf: A ( B C )
property
Set the cluster (crm_config) options.
Usage:
property [$id=<set_id>] <option>=<value> [<option>=<value> ...]
Example:
property stonith-enabled=true
rsc_defaults
Set defaults for the resource meta attributes.
Usage:
rsc_defaults [$id=<set_id>] <option>=<value> [<option>=<value> ...]
Example:
rsc_defaults failure-timeout=3m
Shadow CIB usage
Shadow CIB is a new feature. Shadow CIBs may be manipulated in the same way like the live CIB, but these changes have no effect on the cluster resources. No changes take place before the configure commit command.
crm(live)configure# cib new test-2
INFO: test-2 shadow CIB created
crm(test-2)configure# commit
Global Cluster Options
no-quorum-policy
ignore
The quorum state does not influence the cluster behavior at all, resource management is continued.
freeze
If quorum is lost, the cluster freezes. Resource management is continued: running resources are not stopped (but possibly restarted in response to monitor events), but no further resources are started within the affected partition.
stop (default value)
If quorum is lost, all resources in the affected cluster partition are stopped in an orderly fashion.
suicide
Fence all nodes in the affected cluster partition.
stonith-enabled
This global option defines if to apply fencing, allowing STONITH devices to shoot failed nodes and nodes with resources that cannot be stopped.
Supported Resource Agent Classes
Legacy Heartbeat 1 Resource Agents
Linux Standards Base (LSB) Scripts
Open Cluster Framework (OCF) Resource Agents
STONITH Resource Agents
Types of Resources
Primitives
Groups
Clones
Masters
Resource Options (Meta Attributes)
<0
>0
0
100
group Web
Web, node1: location: 500
Web: node2
Failover: Active/Passive
Failback:
資源的粘性:
stickiness
>0: 傾向于留在原地
<0: 傾向于離開此節(jié)點(diǎn)
=0:由HA來(lái)決定去留
INFINITY:無(wú)窮大
-INFINITY:
Node2: INFINITY
-INFINITY
約束:
location
order
colocation
位置:
次序:
排列:
partitioned cluater
votes, quorum
# crm crm(live)
# cib new active
INFO: active shadow CIB created
crm(active) # configure clone WebIP ClusterIP \
meta globally-unique="true" clone-max="2" clone-node-max="2"
crm(active) # configure shownode pcmk-1
node pcmk-2
primitive WebData ocf:linbit:drbd \
params drbd_resource="wwwdata" \
op m>
primitive WebFS ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="gfs2"
primitive WebSite ocf:heartbeat:apache \
params c \
op m>
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="192.168.122.101" cidr_netmask="32" clusterip_hash="sourceip" \
op m>
ms WebDataClone WebData \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
clone WebIP ClusterIP \
meta globally-unique="true" clone-max="2" clone-node-max="2"
colocation WebSite-with-WebFS inf: WebSite WebFS
colocation fs_on_drbd inf: WebFS WebDataClone:Master
colocation website-with-ip inf: WebSite WebIPorder WebFS-after-WebData inf: WebDataClone:promote WebFS:start
order WebSite-after-WebFS inf: WebFS WebSiteorder apache-after-ip inf: WebIP WebSite
property $id="cib-bootstrap-options" \
dc-versi \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"


