lost and found ( for me ? ): Linux : Heartbeat ( ha cluster ) with Caching Name Server ( BIND )

クラスタホスト: hat1-vm , hat2-vm
サービス用 VIP : 192.168.11.230
サービス用 eth0 : 192.168.11.200 , 192.168.11.201
ハートビート用 eth1 : 192.168.122.161 , 192.168.122.141

   Router 192.168.1.254
           |
       L2SW
   | eth0        | eth0
hat1-vm   hat2-vm
   eth1          eth1
     |-------------|

・haresources

共有リソースの設定

BIND(キャッシュサーバ)を、 hat1-vm が稼働系ノードとして動作させる。スタンバイは hat2-vm
キャッシュ情報は in memory だから、データの同期はしない。
きりかわったら、コールドキャッシュ。

[root@hat1-vm ~]# cat /etc/ha.d/haresources
hat1-vm.localdomain named 192.168.11.230/24

[root@hat2-vm ~]# cat /etc/ha.d/haresources
hat1-vm.localdomain named 192.168.11.230/24

エラーが。あれ、heartbeatd じゃなくてルートで service named start はうまくいったんだけどなー。

[root@hat1-vm ~]# /etc/init.d/heartbeat start
Starting High-Availability services:
rndc: connect failed: 127.0.0.1#953: connection refused
                                                          [ OK ]
[root@hat1-vm ~]#

所有者を変更

[root@hat1-vm ~]# chown named.named /etc/named.conf
[root@hat1-vm ~]# chown named.named /etc/rndc.conf
[root@hat1-vm ~]#

起動した。

[root@hat1-vm ~]# /etc/init.d/heartbeat start
logd is already running
Starting High-Availability services:
2010/09/08_02:10:26 CRITICAL: Resource named is active, and should not be!
2010/09/08_02:10:26 CRITICAL: Non-idle resources can affect data integrity!
2010/09/08_02:10:26 info: If you don't know what this means, then get help!
2010/09/08_02:10:26 info: Read the docs and/or source to /usr/share/heartbeat/ResourceManager for more details.
CRITICAL: Resource named is active, and should not be!
CRITICAL: Non-idle resources can affect data integrity!
info: If you don't know what this means, then get help!
info: Read the docs and/or the source to /usr/share/heartbeat/ResourceManager for more details.
2010/09/08_02:10:26 CRITICAL: Non-idle resources will affect resource takeback!
2010/09/08_02:10:26 CRITICAL: Non-idle resources may affect data integrity!
                                                          [ OK ]

BIND起動してる。

[root@hat1-vm ~]# lsof -i:53
COMMAND PID USER   FD   TYPE DEVICE SIZE NODE NAME
named   4903 named   21u IPv4 11103       TCP hat1-vm.localdomain:domain (LISTEN)
named   4903 named   22u IPv4 11105       TCP 192.168.11.200:domain (LISTEN)
named   4903 named   23u IPv4 11107       TCP 192.168.122.161:domain (LISTEN)
named   4903 named 512u IPv4 11102       UDP hat1-vm.localdomain:domain
named   4903 named 513u IPv4 11104       UDP 192.168.11.200:domain
named   4903 named 514u IPv4 11106       UDP 192.168.122.161:domain
[root@hat1-vm ~]#

VIPもついてる。

[root@hat1-vm ~]# ifconfig
eth0      Link encap:Ethernet HWaddr 52:54:00:75:75:13
         inet addr:192.168.11.200 Bcast:192.168.11.255 Mask:255.255.255.0
         inet6 addr: fe80::5054:ff:fe75:7513/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
         RX packets:5982 errors:0 dropped:0 overruns:0 frame:0
         TX packets:3112 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:591323 (577.4 KiB) TX bytes:391936 (382.7 KiB)

eth0:0    Link encap:Ethernet HWaddr 52:54:00:75:75:13
         inet addr:192.168.11.230 Bcast:192.168.11.255 Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

hat2-vmも同様に。

[root@hat2-vm ~]# /etc/init.d/heartbeat start
Starting High-Availability services:
rndc: connect failed: 127.0.0.1#953: connection refused
                                                          [ OK ]

named.conf , rndc.conf の所有者をかえたけどエラーがでるなー。

とりあえず、hat1-vm の heartbeatd を停止して切り替わるか確認。
おっ, hat2-vm にひきつがれた。standby で connect failed がでるのは無視していいのかなー。
ちゃんと rndc 使えるし。

[root@hat2-vm ~]# ifconfig
eth0      Link encap:Ethernet HWaddr 52:54:00:54:D3:B6
         inet addr:192.168.11.201 Bcast:192.168.11.255 Mask:255.255.255.0
         inet6 addr: fe80::5054:ff:fe54:d3b6/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
         RX packets:9889 errors:0 dropped:0 overruns:0 frame:0
         TX packets:6096 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:951491 (929.1 KiB) TX bytes:716790 (699.9 KiB)

eth0:0    Link encap:Ethernet HWaddr 52:54:00:54:D3:B6
         inet addr:192.168.11.230 Bcast:192.168.11.255 Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

[root@hat2-vm ~]# rndc status
number of zones: 4
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
recursive clients: 0/1000
tcp clients: 0/100
server is up and running

・動作確認

VIPに dig して名前解決できるかチェック。

VIPへの ping OK.

root@hat1:~# ping 192.168.11.230
PING 192.168.11.230 (192.168.11.230) 56(84) bytes of data.
64 bytes from 192.168.11.230: icmp_seq=1 ttl=64 time=15.2 ms

タイムアウトになるなー。

root@hat1:~# dig @192.168.11.230 www.google.com

; <<>> DiG 9.7.0-P1 <<>> @192.168.11.230 www.google.com
; (1 server found)
;; global options: +cmd

ん、VIP 11.230 で listen してないなー。

[root@hat2-vm ~]# netstat -an | grep 53
tcp        0      0 192.168.11.201:53           0.0.0.0:*                   LISTEN
tcp        0      0 192.168.122.141:53          0.0.0.0:*                   LISTEN
tcp        0      0 127.0.0.1:53                0.0.0.0:*                   LISTEN
tcp        0      0 127.0.0.1:953               0.0.0.0:*                   LISTEN
udp        0      0 192.168.11.201:53           0.0.0.0:*
udp        0      0 192.168.122.141:53          0.0.0.0:*

ためしに rndc reload してみると、11.230 が listen する。

[root@hat2-vm ~]# netstat -an | grep 53
tcp        0      0 192.168.11.230:53           0.0.0.0:*                   LISTEN
tcp        0      0 192.168.11.201:53           0.0.0.0:*                   LISTEN
tcp        0      0 192.168.122.141:53          0.0.0.0:*                   LISTEN
tcp        0      0 127.0.0.1:53                0.0.0.0:*                   LISTEN
tcp        0      0 127.0.0.1:953               0.0.0.0:*                   LISTEN
tcp        0      0 127.0.0.1:38831             127.0.0.1:953               TIME_WAIT
udp        0      0 192.168.11.230:53           0.0.0.0:*
udp        0      0 192.168.11.201:53           0.0.0.0:*

named 起動 -> VIP 付与だからか　VIP 11.230 で listen しないのかー。
切り替わって、VIPついたあとに、named を reload して VIP をリッスンさせるする必要があるなー。

シスログからも named起動 -> VIP付与という動作をしていることが読み取れる

Sep 10 01:30:40 hat1-vm named[3607]: running
Sep 10 01:30:40 hat1-vm IPaddr[3630]: INFO: Resource is stopped
Sep 10 01:30:40 hat1-vm ResourceManager[3533]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.11.230/24 start
Sep 10 01:30:40 hat1-vm IPaddr[3724]: INFO: Using calculated nic for 192.168.11.230: eth0

VIPがついたら、VIPでTCP/UDP の 53 を listen しているか監視させ、listen してなかったら rndc reload させるスクリプトを、監視スクリプトとして登録してみよう。

スクリプトのイメージはこんな感じ

まず、VIPがついているかチェック。

[root@hat1-vm ~]# /etc/ha.d/resource.d/IPaddr 192.168.11.230 status
2010/09/10_01:40:32 INFO: Running OK
INFO: Running OK
[root@hat1-vm ~]# echo $?
0

で、192.168.11.230 で listen しているかチェック

[root@hat1-vm ~]# lsof -ni:53 | grep "192.168.11.230"
[root@hat1-vm ~]# echo $?
1

listen していない場合、rndc reload

[root@hat1-vm ~]# rndc reload
server reload successful

[root@hat1-vm ~]# lsof -ni:53 | grep "192.168.11.230"
named   3607 named   26u IPv4   9614       TCP 192.168.11.230:domain (LISTEN)
named   3607 named 515u IPv4   9613       UDP 192.168.11.230:domain
[root@hat1-vm ~]# echo $?
0

スクリプトはこんな感じ。

10秒毎に VIP がついてるかチェック -> VIP 192.168.11.230 TCP , UDP 53 を listen しているかチェック。
listen していなかったら rndc reload っていうスクリプト。

[root@hat2-vm ~]# less -N dns_port_check.sh
     1 #!/bin/sh
     2
     3 INTERVAL=10
     4
     5 VIP=192.168.11.230
     6
     7 while true
     8
     9 do
    10
    11         /etc/ha.d/resource.d/IPaddr ${VIP} status > /dev/null 2>&1
    12         if [ $? -ne 0 ]; then
    13                 sleep ${INTERVAL}
    14                 continue
    15         fi
    16
    17         lsof -ni:53 | grep "${VIP}" > /dev/null 2>&1
    18         if [ $? -ne 0 ]; then
    19                 rndc reload
    20         fi
    21
    22         sleep ${INTERVAL}
    23
    24 done
    25
[root@hat2-vm ~]#

監視スクリプトを登録

[root@hat1-vm ~]# egrep dns_port /etc/ha.d/ha.cf
respawn root /root/dns_port_check.sh

[root@hat2-vm ~]# egrep dns_port /etc/ha.d/ha.cf
respawn root /root/dns_port_check.sh

heartbeatd を stop -> start して確認。

[root@hat1-vm ~]# /etc/init.d/heartbeat stop
[root@hat2-vm ~]# /etc/init.d/heartbeat stop

[root@hat1-vm ~]# /etc/init.d/heartbeat start
[root@hat2-vm ~]# /etc/init.d/heartbeat start

VIPが listen するようになった。

[root@hat1-vm ~]# ifconfig
eth0      Link encap:Ethernet HWaddr 52:54:00:75:75:13
         inet addr:192.168.11.200 Bcast:192.168.11.255 Mask:255.255.255.0
         inet6 addr: fe80::5054:ff:fe75:7513/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
         RX packets:10242 errors:0 dropped:0 overruns:0 frame:0
         TX packets:7516 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:1123759 (1.0 MiB) TX bytes:759820 (742.0 KiB)

eth0:0    Link encap:Ethernet HWaddr 52:54:00:75:75:13
         inet addr:192.168.11.230 Bcast:192.168.11.255 Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

[root@hat1-vm ~]# lsof -ni:53 | grep "11.230"
named   4827 named   26u IPv4 11352       TCP 192.168.11.230:domain (LISTEN)
named   4827 named 515u IPv4 11351       UDP 192.168.11.230:domain

[root@hat1-vm ~]# dig @192.168.11.230 www.google.co.jp

; <<>> DiG 9.7.1 <<>> @192.168.11.230 www.google.co.jp
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59670
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 4, ADDITIONAL: 4

ハートビートを組んでいるのとは別のマシン(hat1)からVIP 192.168.11.230 へ dig ( OK )

root@hat1:~# LANG=C ifconfig br0 | grep "inet addr"
         inet addr:192.168.11.100 Bcast:192.168.11.255 Mask:255.255.255.0

root@hat1:~# dig @192.168.11.230 www.google.co.jp

; <<>> DiG 9.7.0-P1 <<>> @192.168.11.230 www.google.co.jp
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3762
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 4, ADDITIONAL: 4

;; QUESTION SECTION:
;www.google.co.jp. IN A

;; ANSWER SECTION:
www.google.co.jp. 345410 IN CNAME www.google.com.
www.google.com. 604610 IN CNAME www.l.google.com.

リアルIP ( 11.200 , 11.201 ) で応答しないように、named.conf の listen で VIP と 127.1 のみ指定する。

[root@hat1-vm ~]# less /etc/named.conf
options {
       directory "/var/named";
       pid-file "/var/run/named/named.pid";
       max-cache-size 5M;
       recursion yes;
       version "";
#       dnssec-enable yes;
#       dnssec-validation yes;
       listen-on { 192.168.11.230; 127.0.0.1; };
};

[root@hat2-vm ~]# less /etc/named.conf
options {
       directory "/var/named";
       pid-file "/var/run/named/named.pid";
       recursion yes;
       max-cache-size 5M;
       version "";
       listen-on { 192.168.11.230; 127.0.0.1; };
};

再度チェック。

[root@hat1-vm ~]# /etc/init.d/heartbeat restart
[root@hat2-vm ~]# /etc/init.d/heartbeat restart

VIP と 127.0.0.1 のみ listen するようになった。かんせーい。

[root@hat1-vm ~]# LANG=C ifconfig | grep "inet addr"
         inet addr:192.168.11.200 Bcast:192.168.11.255 Mask:255.255.255.0
         inet addr:192.168.11.230 Bcast:192.168.11.255 Mask:255.255.255.0
         inet addr:192.168.122.161 Bcast:192.168.122.255 Mask:255.255.255.0
         inet addr:127.0.0.1 Mask:255.0.0.0

[root@hat1-vm ~]# lsof -ni:53
COMMAND PID USER   FD   TYPE DEVICE SIZE NODE NAME
named   7834 named   21u IPv4 14624       TCP 127.0.0.1:domain (LISTEN)
named   7834 named   24u IPv4 15071       TCP 192.168.11.230:domain (LISTEN)
named   7834 named 512u IPv4 14623       UDP 127.0.0.1:domain
named   7834 named 513u IPv4 15070       UDP 192.168.11.230:domain

[ config ]

[root@hat1-vm ~]# egrep -v "^#" /etc/ha.d/ha.cf | grep -v "^$"
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
ucast eth1 192.168.122.141
auto_failback off
watchdog /dev/watchdog
node hat1-vm.localdomain hat2-vm.localdomain
ping 192.168.11.1
respawn hacluster /usr/lib/heartbeat/ipfail
respawn root /root/dns_port_check.sh
apiauth ipfail gid=haclient uid=hacluster

[root@hat2-vm ~]# egrep -v "^#" /etc/ha.d/ha.cf | egrep -v "^$"
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
ucast eth1 192.168.122.161
auto_failback off
watchdog /dev/watchdog
node hat1-vm.localdomain hat2-vm.localdomain
ping 192.168.11.1
respawn hacluster /usr/lib/heartbeat/ipfail
respawn root /root/dns_port_check.sh
apiauth ipfail gid=haclient uid=hacluster

[root@hat1-vm ~]# cat /etc/ha.d/haresources
hat1-vm.localdomain named 192.168.11.230/24

[root@hat2-vm ~]# cat /etc/ha.d/haresources
hat1-vm.localdomain named 192.168.11.230/24

[root@hat1-vm ~]# tail -5 /etc/ha.d/authkeys
#
auth 1
1 crc
#2 sha1 HI!
#3 md5 Hello!

[root@hat2-vm ~]# tail -5 /etc/ha.d/authkeys
#
auth 1
1 crc
#2 sha1 HI!
#3 md5 Hello!

get things done

lost and found ( for me ? )

lost and found ( for me ? )

Linux : Heartbeat ( ha cluster ) with Caching Name Server ( BIND )

No comments:

Post a Comment