lost and found ( for me ? )

Linux : Heartbeat ( ha cluster )

ためしに、apache を heartbeat で冗長化。

[root@hat1-vm ~]# cat /etc/redhat-release
CentOS release 5.5 (Final)
[root@hat1-vm ~]# uname -r
2.6.18-194.11.1.el5

クラスタホスト: hat1-vm , hat2-vm
サービス用 VIP : 192.168.1.150
サービス用 eth0 :  192.168.1.50 , 192.168.1.51
ハートビート用 eth1 : 192.168.122.161 , 192.168.122.141

    Router 192.168.1.254
            |
        L2SW
    |  eth0        | eth0
hat1-vm   hat2-vm
    eth1          eth1
      |-------------|

[ heartbeat のインストール ]

両ホストに heartbeat をインストール

[root@hat1-vm ~]# yum install -y heartbeat.i386
[root@hat2-vm ~]# yum install -y heartbeat.i386

エラーが。

  Installing     : heartbeat-pils                                           1/4
 Installing     : heartbeat-stonith                                        2/4
 Installing     : PyXML                                                    3/4
useradd: ユーザ hacluster は存在します
error: %pre(heartbeat-2.1.3-3.el5.centos.i386) scriptlet failed, exit status 9
error:   install: %pre scriptlet failed (2), skipping heartbeat-2.1.3-3.el5.centos

Installed:
 heartbeat.i386 0:2.1.3-3.el5.centos                                           

Dependency Installed:
 PyXML.i386 0:0.8.4-4.el5_4.2                                                  
 heartbeat-pils.i386 0:2.1.3-3.el5.centos                                      
 heartbeat-stonith.i386 0:2.1.3-3.el5.centos                                   

Complete!

haartbeat.i386 がインストールされていないなー。

[root@hat1-vm ~]# rpm -qa | grep heartbeat
heartbeat-stonith-2.1.3-3.el5.centos
heartbeat-pils-2.1.3-3.el5.centos

もう一回インストールすると、heartbeat,i386 をインストールできる。

[root@hat1-vm ~]# yum install -y heartbeat.i386
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* addons: ftp.jaist.ac.jp
* base: ftp.jaist.ac.jp
* extras: ftp.jaist.ac.jp
* updates: ftp.jaist.ac.jp
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package heartbeat.i386 0:2.1.3-3.el5.centos set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
Package           Arch         Version                    Repository      Size
================================================================================
Installing:
heartbeat         i386         2.1.3-3.el5.centos         extras         1.7 M

Transaction Summary
================================================================================
Install       1 Package(s)
Upgrade       0 Package(s)

Total download size: 1.7 M
Downloading Packages:
heartbeat-2.1.3-3.el5.centos.i386.rpm                    | 1.7 MB     00:01     
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
 Installing     : heartbeat                                                1/1

Installed:
 heartbeat.i386 0:2.1.3-3.el5.centos      

2回 yum install -y heartbeat.i386 を実行すればOK.

 508  yum install -y heartbeat.i386
 511  yum install -y heartbeat.i386

[ heartbeat の設定 ]

/etc/ha.d/ ディレクトリに3種類のコンフィグファイルが必要となる。

[root@hat1-vm ha.d]# less README.config
You need three configuration files to make heartbeat happy,
and they all go in this directory.

They are:
       ha.cf           Main configuration file
       haresources     Resource configuration file
       authkeys        Authentication information

設定ファイルのサンプルは下記のディレクトリにある。

[root@hat1-vm heartbeat-2.1.3]# pwd
/usr/share/doc/heartbeat-2.1.3

・ha.cf

ハートビートの全体的な設定

- hat1-vm

設定ファイルをコピー

[root@hat1-vm ~]# cp /usr/share/doc/heartbeat-2.1.3/ha.cf /etc/ha.d/


[root@hat1-vm ~]# egrep -v "^#" /etc/ha.d/ha.cf | grep -v "^$"
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
ucast eth1 192.168.122.141 # hat2-vm のIP
auto_failback off
watchdog /dev/watchdog
node hat1-vm.localdomain hat2-vm.localdomain  # uname -n で表示される名前を指定
ping 192.168.1.254 # デフォルトゲートウェイへの ping チェック
respawn hacluster /usr/lib/heartbeat/ipfail # 192.168.1.254 へ ping が通らないと ipfail を起動
apiauth ipfail gid=haclient uid=hacluster
[root@hat1-vm ~]#

ipfail は、ping 192.168.1.254 へ到達性がないと、他方へ切り替わるプログラム。

- hat2-vm

ucast の IP が hat1-vm のIP 。それ以外は hat1-vm と同じ。

[root@hat2-vm ~]# egrep -v "^#" /etc/ha.d/ha.cf | egrep -v "^$"
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
ucast eth1 192.168.122.161
auto_failback off
watchdog /dev/watchdog
node hat1-vm.localdomain hat2-vm.localdomain
ping 192.168.1.254
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster


・haresources

共有リソースの設定

httpd サービスを hat1-vm が稼働系ノードとして動作させる。
スタンバイの hat2-vm も同様の設定にする。

[root@hat1-vm ~]# cat /etc/ha.d/haresources
hat1-vm.localdomain httpd 192.168.1.150/24

[root@hat2-vm ~]# cat /etc/ha.d/haresources
hat1-vm.localdomain httpd 192.168.1.150/24

・authkeys

サーバ間のハートビートの認証方法と鍵の設定

CRCによるパケットの整合性のみチェック。他に sha1 や MD5 で通信することも可能。

[root@hat1-vm ha.d]# tail -5 /etc/ha.d/authkeys
#
auth 1
1 crc
#2 sha1 HI!
#3 md5 Hello!

[root@hat2-vm ~]# tail -5 /etc/ha.d/authkeys
#
auth 1
1 crc
#2 sha1 HI!
#3 md5 Hello!

[ heartbeat の起動 ]

エラーが。

[root@hat1-vm ~]# /etc/init.d/heartbeat start
Starting High-Availability services:
                                                          [失敗]
heartbeat: udpport setting must precede media statementsheartbeat[27527]: 2010/08/16_01:23:46 ERROR: Bad permissions on keyfile [/etc/ha.d/authkeys], 600 recommended.
heartbeat[27527]: 2010/08/16_01:23:46 ERROR: Authentication configuration error.
heartbeat[27527]: 2010/08/16_01:23:46 ERROR: Configuration error, heartbeat not started.

authkeys のパーミッションを変更。

[root@hat1-vm ~]# chmod 600 /etc/ha.d/authkeys

起動した。

[root@hat1-vm ~]# /etc/init.d/heartbeat start
logd is already running
Starting High-Availability services:
                                                          [  OK  ]

[root@hat2-vm ~]# chmod 600 /etc/ha.d/authkeys
[root@hat2-vm ~]# /etc/init.d/heartbeat start
Starting High-Availability services:
                                                          [  OK  ]

マスタの hat1-vm に VIP 192.168.1.150 が付与された。
また、httpd も 自動的に起動してくれる。

[root@hat1-vm ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 52:54:00:75:75:13  
         inet addr:192.168.1.50  Bcast:192.168.1.255  Mask:255.255.255.0
         inet6 addr: fe80::5054:ff:fe75:7513/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:8408 errors:0 dropped:0 overruns:0 frame:0
         TX packets:5158 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:761031 (743.1 KiB)  TX bytes:800311 (781.5 KiB)

eth0:0    Link encap:Ethernet  HWaddr 52:54:00:75:75:13  
         inet addr:192.168.1.150  Bcast:192.168.1.255  Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

[root@hat1-vm ~]# ps -C httpd
 PID TTY          TIME CMD
27740 ?        00:00:00 httpd
27741 ?        00:00:00 httpd
27742 ?        00:00:00 httpd
27743 ?        00:00:00 httpd
27744 ?        00:00:00 httpd
27745 ?        00:00:00 httpd
27746 ?        00:00:00 httpd
27763 ?        00:00:00 httpd
27764 ?        00:00:00 httpd

スタンバイの hat2-vm に VIP は付与されていない。また、httpdも起動していない。

[root@hat2-vm ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 52:54:00:54:D3:B6  
         inet addr:192.168.1.51  Bcast:192.168.1.255  Mask:255.255.255.0
         inet6 addr: fe80::5054:ff:fe54:d3b6/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:3096 errors:0 dropped:0 overruns:0 frame:0
         TX packets:1970 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:287589 (280.8 KiB)  TX bytes:285573 (278.8 KiB)

[root@hat2-vm ~]# ps -C httpd
 PID TTY          TIME CMD
[root@hat2-vm ~]#

[ ハートビートパケット ]

どんなパケットやりとりしてるかキャプチャ( UDP 694 )してみた。
双方向でパケットをやりとりしてる。

[root@hat1-vm ~]# lsof -i:694
COMMAND     PID   USER   FD   TYPE DEVICE SIZE NODE NAME
heartbeat 28266 nobody    8u  IPv4  42684       UDP *:ha-cluster
heartbeat 28267 nobody    8u  IPv4  42684       UDP *:ha-cluster

[root@hat1-vm ~]# tshark -i eth1 port 694 -w aaa.pcap
Running as user "root" and group "root". This could be dangerous.
Capturing on eth1
6

[root@hat1-vm ~]# tshark -r aaa.pcap
Running as user "root" and group "root". This could be dangerous.
 1   0.000000 192.168.122.141 -> 192.168.122.161 UDP Source port: 37822  Destination port: ha-cluster
 2   0.520165 192.168.122.161 -> 192.168.122.141 UDP Source port: 48484  Destination port: ha-cluster

ホスト名とか uuid がはいってるなー。

Data (199 bytes)

0000  3e 3e 3e 0a 74 3d 4e 53 5f 61 63 6b 6d 73 67 0a   >>>.t=NS_ackmsg.
0010  64 65 73 74 3d 68 61 74 32 2d 76 6d 2e 6c 6f 63   dest=hat2-vm.loc
0020  61 6c 64 6f 6d 61 69 6e 0a 61 63 6b 73 65 71 3d   aldomain.ackseq=
0030  62 36 0a 28 31 29 64 65 73 74 75 75 69 64 3d 66   b6.(1)destuuid=f
0040  70 61 51 32 6c 6f 69 53 51 71 6b 78 52 4b 48 79   paQ2loiSQqkxRKHy
0050  6f 31 4c 46 67 3d 3d 0a 73 72 63 3d 68 61 74 32   o1LFg==.src=hat2
0060  2d 76 6d 2e 6c 6f 63 61 6c 64 6f 6d 61 69 6e 0a   -vm.localdomain.
0070  28 31 29 73 72 63 75 75 69 64 3d 66 70 61 51 32   (1)srcuuid=fpaQ2
0080  6c 6f 69 53 51 71 6b 78 52 4b 48 79 6f 31 4c 46   loiSQqkxRKHyo1LF
0090  67 3d 3d 0a 68 67 3d 34 63 36 38 31 35 32 37 0a   g==.hg=4c681527.
00a0  74 73 3d 34 63 36 38 31 36 37 61 0a 74 74 6c 3d   ts=4c68167a.ttl=
00b0  34 0a 61 75 74 68 3d 31 20 63 35 38 62 39 61 39   4.auth=1 c58b9a9
00c0  36 0a 3c 3c 3c 0a 00                              6.<<<..
   Data: 3E3E3E0A743D4E535F61636B6D73670A646573743D686174...


[ アクセス確認 ]

VIP 192.168.1.150 にアクセス。hat1-vm が アクティブ。
hat1-vm につながった。



hat1-vm の heartbeat を停止。

[root@hat1-vm ~]# /etc/init.d/heartbeat stop
Stopping High-Availability services:
                                                          [  OK  ]
[root@hat1-vm ~]#

hat2-vm がアクティブになった。

hat2-vm のシスログ

Aug 16 01:37:20 hat2-vm heartbeat: [27425]: info: Received shutdown notice from 'hat1-vm.localdomain'.
Aug 16 01:37:20 hat2-vm heartbeat: [27425]: info: Resources being acquired from hat1-vm.localdomain.
Aug 16 01:37:20 hat2-vm heartbeat: [27507]: info: acquire local HA resources (standby).
Aug 16 01:37:20 hat2-vm heartbeat: [27507]: info: local HA resource acquisition completed (standby).
Aug 16 01:37:20 hat2-vm heartbeat: [27425]: info: Standby resource acquisition done [all].
Aug 16 01:37:20 hat2-vm heartbeat: [27508]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys hat2-vm.localdomain] to acquire.
Aug 16 01:37:20 hat2-vm harc[27533]: info: Running /etc/ha.d/rc.d/status status
Aug 16 01:37:20 hat2-vm mach_down[27549]: info: Taking over resource group httpd
Aug 16 01:37:20 hat2-vm ResourceManager[27575]: info: Acquiring resource group: hat1-vm.localdomain httpd 192.168.1.150/24
Aug 16 01:37:20 hat2-vm ResourceManager[27575]: info: Running /etc/init.d/httpd  start
Aug 16 01:37:20 hat2-vm IPaddr[27644]: INFO:  Resource is stopped
Aug 16 01:37:20 hat2-vm ResourceManager[27575]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.150/24 start
Aug 16 01:37:20 hat2-vm IPaddr[27740]: INFO: Using calculated nic for 192.168.1.150: eth0
Aug 16 01:37:20 hat2-vm IPaddr[27740]: INFO: Using calculated netmask for 192.168.1.150: 255.255.255.0
Aug 16 01:37:20 hat2-vm IPaddr[27740]: INFO: eval ifconfig eth0:0 192.168.1.150 netmask 255.255.255.0 broadcast 192.168.1.255
Aug 16 01:37:20 hat2-vm avahi-daemon[2276]: Registering new address record for 192.168.1.150 on eth0.
Aug 16 01:37:20 hat2-vm IPaddr[27714]: INFO:  Success
Aug 16 01:37:20 hat2-vm mach_down[27549]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Aug 16 01:37:20 hat2-vm mach_down[27549]: info: mach_down takeover complete for node hat1-vm.localdomain.
Aug 16 01:37:20 hat2-vm heartbeat: [27425]: info: mach_down takeover complete.
Aug 16 01:37:20 hat2-vm heartbeat: [27425]: WARN: G_CH_dispatch_int: Dispatch function for FIFO took too long to execute: 60 ms (> 50 ms) (GSource: 0x92c6350)
Aug 16 01:37:53 hat2-vm heartbeat: [27425]: WARN: node hat1-vm.localdomain: is dead
Aug 16 01:37:53 hat2-vm heartbeat: [27425]: info: Dead node hat1-vm.localdomain gave up resources.
Aug 16 01:37:53 hat2-vm heartbeat: [27425]: info: Link hat1-vm.localdomain:eth1 dead.

[root@hat2-vm ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 52:54:00:54:D3:B6  
         inet addr:192.168.1.51  Bcast:192.168.1.255  Mask:255.255.255.0
         inet6 addr: fe80::5054:ff:fe54:d3b6/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:3562 errors:0 dropped:0 overruns:0 frame:0
         TX packets:2399 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:336321 (328.4 KiB)  TX bytes:340282 (332.3 KiB)

eth0:0    Link encap:Ethernet  HWaddr 52:54:00:54:D3:B6  
         inet addr:192.168.1.150  Bcast:192.168.1.255  Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

[root@hat2-vm ~]# ps -C httpd
 PID TTY          TIME CMD
27622 ?        00:00:00 httpd
27623 ?        00:00:00 httpd
27624 ?        00:00:00 httpd
27625 ?        00:00:00 httpd
27626 ?        00:00:00 httpd
27627 ?        00:00:00 httpd
27628 ?        00:00:00 httpd
27645 ?        00:00:00 httpd
27646 ?        00:00:00 httpd

ブラウザでアクセス。VIP 192.168.1.150 あては、hat2-vm につながった。




hat1-vm の heartbeat を起動

[root@hat1-vm ~]# /etc/init.d/heartbeat start
Starting High-Availability services:
                                                          [  OK  ]

フェイルバックはしない。hat2-vm がアクティブ

フェイルバックさせるには、アクティブの hat2-vm で下記コマンドを実行。

[root@hat2-vm ~]# /usr/lib/heartbeat/hb_standby
2010/08/16_01:42:41 Going standby [all].

hat1-vm がアクティブになった。

Aug 16 01:42:40 hat1-vm heartbeat: [28262]: info: hat2-vm.localdomain wants to go standby [all]
Aug 16 01:42:42 hat1-vm heartbeat: [28262]: info: standby: acquire [all] resources from hat2-vm.localdomain
Aug 16 01:42:42 hat1-vm heartbeat: [28295]: info: acquire all HA resources (standby).
Aug 16 01:42:42 hat1-vm ResourceManager[28308]: info: Acquiring resource group: hat1-vm.localdomain httpd 192.168.1.150/24
Aug 16 01:42:42 hat1-vm ResourceManager[28308]: info: Running /etc/init.d/httpd  start
Aug 16 01:42:42 hat1-vm IPaddr[28377]: INFO:  Resource is stopped
Aug 16 01:42:42 hat1-vm ResourceManager[28308]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.150/24 start
Aug 16 01:42:42 hat1-vm IPaddr[28473]: INFO: Using calculated nic for 192.168.1.150: eth0
Aug 16 01:42:42 hat1-vm IPaddr[28473]: INFO: Using calculated netmask for 192.168.1.150: 255.255.255.0
Aug 16 01:42:42 hat1-vm IPaddr[28473]: INFO: eval ifconfig eth0:0 192.168.1.150 netmask 255.255.255.0 broadcast 192.168.1.255
Aug 16 01:42:42 hat1-vm avahi-daemon[2343]: Registering new address record for 192.168.1.150 on eth0.
Aug 16 01:42:42 hat1-vm IPaddr[28447]: INFO:  Success
Aug 16 01:42:42 hat1-vm heartbeat: [28295]: info: all HA resource acquisition completed (standby).
Aug 16 01:42:42 hat1-vm heartbeat: [28262]: info: Standby resource acquisition done [all].
Aug 16 01:42:42 hat1-vm heartbeat: [28262]: info: remote resource transition completed.

[root@hat1-vm ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 52:54:00:75:75:13  
         inet addr:192.168.1.50  Bcast:192.168.1.255  Mask:255.255.255.0
         inet6 addr: fe80::5054:ff:fe75:7513/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:9394 errors:0 dropped:0 overruns:0 frame:0
         TX packets:6105 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:854015 (833.9 KiB)  TX bytes:941871 (919.7 KiB)

eth0:0    Link encap:Ethernet  HWaddr 52:54:00:75:75:13  
         inet addr:192.168.1.150  Bcast:192.168.1.255  Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1      Link encap:Ethernet  HWaddr 52:54:00:16:3D:85  
         inet addr:192.168.122.161  Bcast:192.168.122.255  Mask:255.255.255.0
         inet6 addr: fe80::5054:ff:fe16:3d85/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:6725 errors:0 dropped:0 overruns:0 frame:0
         TX packets:4199 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:8729951 (8.3 MiB)  TX bytes:411126 (401.4 KiB)
         Interrupt:11 Base address:0xe000

lo        Link encap:Local Loopback  
         inet addr:127.0.0.1  Mask:255.0.0.0
         inet6 addr: ::1/128 Scope:Host
         UP LOOPBACK RUNNING  MTU:16436  Metric:1
         RX packets:14 errors:0 dropped:0 overruns:0 frame:0
         TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:928 (928.0 b)  TX bytes:928 (928.0 b)

[root@hat1-vm ~]# ps -C httpd
 PID TTY          TIME CMD
28355 ?        00:00:00 httpd
28357 ?        00:00:00 httpd
28358 ?        00:00:00 httpd
28359 ?        00:00:00 httpd
28360 ?        00:00:00 httpd
28361 ?        00:00:00 httpd
28362 ?        00:00:00 httpd
28383 ?        00:00:00 httpd
28384 ?        00:00:00 httpd
[root@hat1-vm ~]#