高可用集群原理詳解
資源粘性:
資源約束:Constraint
排列約束: (colocation)
資源是否能夠運行于同一節(jié)點
score:
正值:可以在一起
負(fù)值:不能在一起
位置約束:(location), score(分?jǐn)?shù))
正值:傾向于此節(jié)點
負(fù)值:傾向于逃離于此節(jié)點
順序約束: (order)
定義資源啟動或關(guān)閉時的次序
vip, ipvs
ipvs–>vip
-inf: 負(fù)無窮
inf: 正無窮
資源隔離:
節(jié)點級別:STONITH
資源級別:
例如:FC SAN switch可以實現(xiàn)在存儲資源級別拒絕某節(jié)點的訪問
STONITH:
split-brain: 集群節(jié)點無法有效獲取其它節(jié)點的狀態(tài)信息時,產(chǎn)生腦裂
后果之一:搶占共享存儲
active/active: 高可用
高可用集群原理之共享存儲
IDE:(ATA),130M
SATA:600M
7200rpm
IOPS: 100
SCSI: 320M
SAS:
15000rpm
IOPS: 200
USB 3.0: 400M
機械:
隨機讀寫
順序讀寫
固態(tài):
IDE, SCSI: 并口
SATA, SAS, USB: 串口
DAS:
Direct Attached Storage
直接接到主板總線,BUS
文件:塊
NAS:
Network Attached Storage
文件服務(wù)器:文件級別
SAN:
Storage Area network
存儲區(qū)域網(wǎng)絡(luò)
FC SAN
IP SAN: iSCSI
SCSI: Small Computer System Interface
高可用集群原理之多節(jié)點集群
crm:使本身不具備高可用的使其具有高可用,rm本身就是一個腳本。
資源粘性:資源對某點的依賴程度,通過score定義
資源約束:
location: 資源對節(jié)點傾向程度
coloation: 資源間依賴性
order: 資源的采取動作的次序
Heartbeat v1 自帶的資源管理器
haresources:
Heartbeat v2 自帶的資源管理器
haresources
crm
Heartbeat v3: 資源管理器crm發(fā)展為獨立的項目,pacemaker
Resource Type:
primitive: 主資源,在某一時刻是能運行在某一節(jié)點
clone: 可以在多個節(jié)點運行
group:把多個primitive歸為組,一般只包含primitive
master/slave: drbd只能運行在兩個節(jié)點
RA: Resource Agent
RA Classes:
Legacy heartbeat v1 RA
LSB (/etc/rc.d/init.d/) Linux Standard Base
OCF (Open Cluster Framework)
pacemaker
linbit (drbd)
STONITH:管理硬件stonith設(shè)備
隔離級別:
節(jié)點級別
STONTIH
資源級別
FC SAN Switch
Stonith設(shè)備
1、Power Distribution Units (PDU)
Power Distribution Units are an essential element in managing power capacity and functionality for critical network, server and data center equipment. They can provide remote load monitoring of connected equipment and individual outlet power control for remote power recycling.
2、Uninterruptible Power Supplies (UPS)
A stable power supply provides emergency power to connected equipment by supplying power from a separate source in the event of utility power failure.
3、Blade Power Control Devices
If you are running a cluster on a set of blades, then the power control device in the blade enclosure is the only candidate for fencing. Of course, this device must be
capable of managing single blade computers.
4、Lights-out Devices
Lights-out devices (IBM RSA, HP iLO, Dell DRAC) are becoming increasingly popular and may even become standard in off-the-shelf computers. However, they are inferior to UPS devices, because they share a power supply with their host (a cluster node). If a node stays without power, the device supposed to control it would be just as useless. In that case, the CRM would continue its attempts to fence the node indefinitely while all other resource operations would wait for the fencing/STONITH operation to complete.
5、Testing Devices
Testing devices are used exclusively for testing purposes. They are usually more gentle on the hardware. Once the cluster goes into production, they must be replaced
with real fencing devices.
stonithd
stonithd is a daemon which can be accessed by local processes or over the network. It accepts the commands which correspond to fencing operations: reset, power-off, and power-on. It can also check the status of the fencing device.
The stonithd daemon runs on every node in the CRM HA cluster. The stonithd instance running on the DC node receives a fencing request from the CRM. It is up to this and other stonithd programs to carry out the desired fencing operation.
STONITH Plug-ins
For every supported fencing device there is a STONITH plug-in which is capable of controlling said device. A STONITH plug-in is the interface to the fencing device.
On each node, all STONITH plug-ins reside in /usr/lib/stonith/plugins (or in /usr/lib64/stonith/plugins for 64-bit architectures). All STONITH plug-ins look the same to stonithd, but are quite different on the other side reflecting the nature of the fencing device.
Some plug-ins support more than one device. A typical example is ipmilan (or external/ipmi) which implements the IPMI protocol and can control any device which supports this protocol.
Heartbeat:udp:694
高可用集群之heartbeat安裝配置兩個節(jié)點:172.16.100.6 172.16.100.7
vip:172.16.100.1
1.兩個節(jié)點互相通信
2.配置主機名:hostname node1.magedu.com uname -a
永久生效:vim /etc/sysconfig/network HOSTNAME=node1.magedu.com
3.ssh雙機互信:
4.配置主機名解析
vim /etc/hosts
172.16.100.6 node1.magedu.com node1
172.16.100.7 node2.magedu.com node2
關(guān)閉iptables
5.時間同步
ntpdate 172.16.0.1
service ntpd stop
chkconfig ntpd off
為了保證以后盡可能同步:crontab -e
*/5 * * * * /sbin/ntpdate 172.16.0.1 &> /dev/null
scp /var/spool/cron/root /node2:/var/spool/cron/
epel
heartbeat – Heartbeat subsystem for High-Availability Linux
heartbeat-devel – Heartbeat development package
heartbeat-gui – Provides a gui interface to manage heartbeat clusters
heartbeat-ldirectord – Monitor daemon for maintaining high availability resources, 為ipvs高可用提供規(guī)則自動生成及后端realserver健康狀態(tài)檢查的組件;
heartbeat-pils – Provides a general plugin and interface loading library
heartbeat-stonith – Provides an interface to Shoot The Other Node In The Head
http://dl.fedoraproject.org/pub/epel/5/i386/repoview/letter_h.group.html
三個配置文件:
1、密鑰文件,600, authkeys
2、heartbeat服務(wù)的配置配置ha.cf
3、資源管理配置文件
haresources
vim authkeys
vim ha.cf
logfacility local0
keepalive 1
node node1.magedu.com
node node2.magedu.com
ping 172.16.0.1
vim haresources 保證httpd服務(wù)沒啟動,并且chkconfig httpd off
node1.magedu.com IPaddr::162.16.100.1/16/eth0 httpd
訪問:172.16.0.1
模擬172.16.100.6故障,訪問172.16.0.1出現(xiàn)node2.magedu.com
mkdir /web/htdocs -pv
vim /etc/exports
/web/htdocs 172.16.0.0/16(ro)
service nfs restart
showmount -e 172.16.100.10輸出結(jié)果正常
關(guān)閉服務(wù):mount 172.16.100.10:/web/htdocs /mnt
ls /mnt index.html
umount /mnt
編輯haresources:node1.magedu.com IPaddr::172.16.100.1/16/eth0 Filesystem::172.16.100.10:/web/htdocs::/var/www/html::nfs httpd
scp haresources node2:/etc/ha.d/
tail -f /var/log/messages
高可用集群之heartbeat基于crm進行資源管理
RA classes:
OCF
pacemaker
linbit
LSB
Legacy Heartbeat V1
STONITH
RA: Resource Agent
代為管理資源
LRM: Local Resource Manager
DC:TE,PE
CRM: Cluster Resource Manager
haresource (heartbeat v1)
crm, haresource (heartbeat v2)
pacemaker (heartbeat v3)
rgmanager (RHCS)
為那些非ha-aware的應(yīng)用程序提供調(diào)用的基礎(chǔ)平臺;
crmd:管理API GUI,CLI
web(三個資源):vip,httpd,filesystem
Resource Type:
primitive(native)
group
clone
STONISH
Cluster Filesystem dlm:Distributed Lock Manager
master/slave:drbd
資源粘性:資源是否傾向于留在當(dāng)前節(jié)點
正數(shù):樂意
負(fù)數(shù):離開
資源約束:
location:位置約束 colocation:排列約束 order:順序約束
heartbeat:
authkeys
ha.cf
node
bcast、mcast、ucast
haresource
HA:
1、時間同步;2、SSH雙機互信;3、主機名稱要與uname -n,并通過/etc/hosts解析;
CIB: Cluster Information Base
xml格式
crm –> pacemaker
crmd respawn|on
mcast eth0 255.0.100.19 694 1 0
原理簡介
組播報文的目的地址使用D類IP地址, 范圍是從224.0.0.0到239.255.255.255。D類地址不能出現(xiàn)在IP報文的源IP地址字段。單播數(shù)據(jù)傳輸過程中,一個數(shù)據(jù)包傳輸?shù)穆窂绞菑脑吹刂仿酚傻侥康牡刂罚?ldquo;逐跳”(hop-by-hop)的原理在IP網(wǎng)絡(luò)中傳輸。然而在ip組播環(huán)中,數(shù)據(jù)包的目的地址不是一個,而是一組,形成組地址。所有的信息接收者都加入到一個組內(nèi),并且一旦加入之后,流向組地址的數(shù)據(jù)立即開始向接收者傳輸,組中的所有成員都能接收到數(shù)據(jù)包。組播組中的成員是動態(tài)的,主機可以在任何時刻加入和離開組播組。
組播組分類
組播組可以是永久的也可以是臨時的。組播組地址中,有一部分由官方分配的,稱為永久組播組。永久組播組保持不變的是它的ip地址,組中的成員構(gòu)成可以發(fā)生變化。永久組播組中成員的數(shù)量都可以是任意的,甚至可以為零。那些沒有保留下來供永久組播組使用的ip組播地址,可以被臨時組播組利用。
224.0.0.0~224.0.0.255為預(yù)留的組播地址(永久組地址),地址224.0.0.0保留不做分配,其它地址供路由協(xié)議使用;
224.0.1.0~224.0.1.255是公用組播地址,可以用于Internet;
224.0.2.0~238.255.255.255為用戶可用的組播地址(臨時組地址),全網(wǎng)范圍內(nèi)有效;
239.0.0.0~239.255.255.255為本地管理組播地址,僅在特定的本地范圍內(nèi)有效。
常用預(yù)留組播地址
列表如下:
224.0.0.0 基準(zhǔn)地址(保留)
224.0.0.1 所有主機的地址 (包括所有路由器地址)
224.0.0.2 所有組播路由器的地址
224.0.0.3 不分配
224.0.0.4 dvmrp 路由器
224.0.0.5 ospf 路由器
224.0.0.6 ospf dr
224.0.0.7 st 路由器
224.0.0.8 st 主機
224.0.0.9 rip-2 路由器
224.0.0.10 Eigrp 路由器
224.0.0.11 活動代理
224.0.0.12 dhcp 服務(wù)器/中繼代理
224.0.0.13 所有pim 路由器
224.0.0.14 rsvp 封裝
224.0.0.15 所有cbt 路由器
224.0.0.16 指定sbm
224.0.0.17 所有sbms
224.0.0.18 vrrp
以太網(wǎng)傳輸單播ip報文的時候,目的mac地址使用的是接收者的mac地址。但是在傳輸組播報文時,傳輸目的不再是一個具體的接收者,而是一個成員不確定的組,所以使用的是組播mac地址。組播mac地址是和組播ip地址對應(yīng)的。iana(internet assigned number authority)規(guī)定,組播mac地址的高24bit為0x01005e,mac 地址的低23bit為組播ip地址的低23bit。
由于ip組播地址的后28位中只有23位被映射到mac地址,這樣就會有32個ip組播地址映射到同一mac地址上。
高可用集群之基于heartbeat和nfs的高可用mysql