怎样建立高可用性网络
前言
随着网络的快速普及和应用的日益深入,各种增值业务在网络上得到了广泛部署,网络带宽也以指数级增长,网络短时间的中断就可能影响大量业务,造成重大损失。作为业务承载主体的基础网络,其高可用性(High Availablity,HA)也因此日益成为关注的焦点。在这种背景下,从运营商到大中型企业客户,在构建生产网络(production network)时,5个9的网络可用性(一年中不能提供服务的时间在5分钟左右),已经成为建网的追求。对于设备提供商或解决方案提供商来说,能否提供端到端的高可用性网络解决方案,不但是厂商技术实力的反映,也是关乎能否在未来激烈的竞争中生存的关键。
如何定义高可用性网络
那么,如何衡量一个网络的可用性呢?首先,一个高可用性网络不能频频出现故障,只要发生故障,即使是很短时间的中断,都会影响业务运营,特别在当前适时性强、对丢包和时延敏感的业务,如语音和视频等业务在网络上广泛部署的情况下更是如此。其次,高可用性的网络,即使出现故障,也应该能很快恢复。如果一个网络一年不出一次故障,但一次故障需要几个小时,甚至几天才能恢复,那么这个网络也算不上一个高可用的网络。事实上,故障少、故障恢复时间短基本就概括了高可用性网络的特点。在实际网络中,软、硬件的版本质量是有极限的,并且也避免不了各种人为和非技术因素造成的网络故障和服务中断。基于这个原因,开发能让网络迅速从故障中恢复的技术非常重要。事实上,如果网络总是能在不中断(绝大部分)业务的情况下恢复,对多数用户,就其业务体验来说,甚至可以认为是无故障的。
总体方案(5个方面)
1.服务器 群集 (windows linux 防火墙)
2.路由节点 (HSRP VRRP)
3.线路 (lan:stp wan:【备份技术】)
4.磁盘 (raid raid1 raid5 raid6 raid10)
5.网卡 (bond)
具体方案实施步骤及过程:
方案2.vrrp
VRRP即虚拟路由器冗余协议。虚拟路由器冗余协议(VRRP)是一种选择协议,它可以把一个虚拟路由器的责任动态分配到局域网上的 VRRP 路由器中的一台。控制虚拟路由器 IP 地址的 VRRP 路由器称为主路由器,它负责转发数据包到这些虚拟 IP 地址。一旦主路由器不可用,这种选择过程就提供了动态的故障转移机制,这就允许虚拟路由器的 IP 地址可以作为终端主机的默认第一跳路由器。使用 VRRP 的好处是有更高的默认路径的可用性而无需在每个终端主机上配置动态路由或路由发现协议。 VRRP 包封装在 IP 包中发送。
VRRP拓扑图
所需设备:
quidway R2621 3台
quidway S2403H-EI 2台
主机 4台
具体步骤:
SW1配置
System View: return to User View with Ctrl+Z.
[Quidway]sys
[Quidway]sysname SW1
[SW1]vlan 10
[SW1-vlan10]port e1/0/10
[SW1-vlan10]vlan 20
[SW1-vlan20]port e1/0/20
[SW1-vlan20]int e1/0/1
[SW1-Ethernet1/0/1]port link-type trunk
[SW1-Ethernet1/0/1]port trunk permit vlan all
Please wait........................................... Done.
[SW1-Ethernet1/0/1]dis vlan
The following VLANs exist:
1(default), 10, 20
[SW1-Ethernet1/0/1]int e1/0/24
[SW1-Ethernet1/0/24]port link-type trunk
[SW1-Ethernet1/0/24]port trunk permit vlan all
Please wait........................................... Done.
SW2配置
System View: return to User View with Ctrl+Z.
[Quidway]sysname SW2
[SW2]vlan 10
[SW2-vlan10]port e1/0/10
[SW2-vlan10]vlan 20
[SW2-vlan20]port e1/0/20
[SW2-vlan20]int e1/0/1
[SW2-Ethernet1/0/1]port link-type trunk
[SW2-Ethernet1/0/1]port trunk permit vlan all
Please wait........................................... Done.
[SW2-Ethernet1/0/1]int e1/0/24
[SW2-Ethernet1/0/24]port link-type trunk
[SW2-Ethernet1/0/24]port trunk permit vlan all
Please wait........................................... Done.
R3配置
[Router]sysname R3
[R3]int e0
[R3-Ethernet0]ip add 3.3.3.3 24
[R3-Ethernet0]loopback
[R3-Ethernet0]int s0
[R3-Serial0]ip add 1.1.1.2 24
[R3-Serial0]shut
[R3-Serial0]undo shut
[R3-Serial0]int s1
[R3-Serial1]ip add 1.1.2.2 24
[R3-Serial1]shut
[R3-Serial1]undo shut
R1配置
[Router]sysname R1
[R1]int s0
[R1-Serial0]ip add 1.1.1.1 24
[R1-Serial0]shut
[R1-Serial0]undo shut
[R1-Serial0]int e0.1
[R1-Ethernet0.1]vlan-type dot1q vid 10
[R1-Ethernet0.1]ip add 192.168.10.1 24
[R1-Ethernet0.1]int e0.2
[R1-Ethernet0.2]vlan-type dot1q vid 20
[R1-Ethernet0.2]ip add 192.168.20.1 24
[R1-Ethernet0.2]quit
[R1]ip route-static 0.0.0.0 0.0.0.0 1.1.1.2
[R1]ping 3.3.3.3
PING 3.3.3.3: 56 data bytes, press CTRL_C to break
Reply from 3.3.3.3: bytes=56 Sequence=0 ttl=255 time = 25 ms
Reply from 3.3.3.3: bytes=56 Sequence=1 ttl=255 time = 26 ms
Reply from 3.3.3.3: bytes=56 Sequence=2 ttl=255 time = 25 ms
Reply from 3.3.3.3: bytes=56 Sequence=3 ttl=255 time = 25 ms
Reply from 3.3.3.3: bytes=56 Sequence=4 ttl=255 time = 25 ms
--- 3.3.3.3 ping statistics ---
5 packets transmitted
5 packets received
0.00% packet loss
round-trip min/avg/max = 25/25/26 ms
[R1]acl 2000 match-order auto
[R1-acl-2000]rule permit source any
Rule has been added to normal packet-filtering rules
[R1-acl-2000]quit
[R1]nat address-group 1.1.1.4 1.1.1.6 add
[R1]int s0
[R1-Serial0]nat outbound 2000 address-group add
[R1-Serial0]quit
[R1]vrrp ping-enable
ping vrrp enable
[R1]int e0.1
[R1-Ethernet0.1]vrrp vrid 10 virtual-ip 192.168.10.254
[R1-Ethernet0.1]vrrp vrid 10 priority 120
[R1-Ethernet0.1]vrrp vrid 10 track s0 reduced 30
[R1-Ethernet0.1]int e0.2
[R1-Ethernet0.2]vrrp vrid 20 virtual-ip 192.168.20.254
[R1-Ethernet0.2]quit
R2配置
[Router]sysname R2
[R2]int s1
[R2-Serial1]ip add 1.1.2.1 24
[R2-Serial1]shut
[R2-Serial1]undo shut
[R2-Serial1]int e0.1
[R2-Ethernet0.1]vlan-type dot1q vid 10
[R2-Ethernet0.1]ip add 192.168.10.2 24
[R2-Ethernet0.1]int e0.2
[R2-Ethernet0.2]vlan-type dot1q vid 20
[R2-Ethernet0.2]ip add 192.168.20.2 24
[R2-Ethernet0.2]quit
[R2]ip route-static 0.0.0.0 0.0.0.0 1.1.2.2
[R2]acl 2000 m a
[R2-acl-2000]rule permit source any
Rule has been added to normal packet-filtering rules
[R2-acl-2000]quit
[R2]nat address-group 1.1.2.4 1.1.2.6 add
[R2]int s1
[R2-Serial1]nat outbound 2000 address-group add
[R2-Serial1]quit
[R2]vrrp ping-enable
ping vrrp enable
[R2]int e0.1
[R2-Ethernet0.1]vrrp vrid 10 virtual-ip 192.168.10.254
[R2-Ethernet0.1]int e0.2
[R2-Ethernet0.2]vrrp vrid 20 virtual-ip 192.168.20.254
[R2-Ethernet0.2]vrrp vrid 20 priority 120
[R2-Ethernet0.2]vrrp vrid 20 track s1 reduced 30
[R2-Ethernet0.2]quit
[R2]ping 3.3.3.3
PING 3.3.3.3: 56 data bytes, press CTRL_C to break
Reply from 3.3.3.3: bytes=56 Sequence=0 ttl=255 time = 25 ms
Reply from 3.3.3.3: bytes=56 Sequence=1 ttl=255 time = 25 ms
Reply from 3.3.3.3: bytes=56 Sequence=2 ttl=255 time = 25 ms
Reply from 3.3.3.3: bytes=56 Sequence=3 ttl=255 time = 26 ms
Reply from 3.3.3.3: bytes=56 Sequence=4 ttl=255 time = 25 ms
--- 3.3.3.3 ping statistics ---
5 packets transmitted
5 packets received
0.00% packet loss
round-trip min/avg/max = 25/25/26 ms
[R2]dis vrrp
Ethernet0.2 | Virtual Router 20
state : Master
Virtual IP : 192.168.20.254
Priority : 120
Preempt : YES Delay Time : 0
Timer : 1
Auth Type : NO
Track IF : Serial1 Priority reduced : 30
Ethernet0.1 | Virtual Router 10
state : Backup
Virtual IP : 192.168.10.254
Priority : 100
Preempt : YES Delay Time : 0
Timer : 1
Auth Type : NO
故障模拟1,电信网络出现故障。
[R2]int s1
[R2-Serial1]shut //模拟电信网络故障
% Interface Serial1 is down
[R2-Serial1]dis vrrp
Ethernet0.2 | Virtual Router 20
state : Backup
Virtual IP : 192.168.20.254
Priority : 90
Preempt : YES Delay Time : 0
Timer : 1
Auth Type : NO
Track IF : Serial1 Priority reduced : 30
Ethernet0.1 | Virtual Router 10
state : Backup
Virtual IP : 192.168.10.254
Priority : 100
Preempt : YES Delay Time : 0
Timer : 1
Auth Type : NO
[R1]dis vrrp
Ethernet0.2 | Virtual Router 20
state : Master
Virtual IP : 192.168.20.254
Priority : 100
Preempt : YES Delay Time : 0
Timer : 1
Auth Type : NO
Ethernet0.1 | Virtual Router 10
state : Master
Virtual IP : 192.168.10.254
Priority : 120
Preempt : YES Delay Time : 0
Timer : 1
Auth Type : NO
Track IF : Serial0 Priority reduced : 30
4个主机正常工作。
故障模拟2,网通网络出现故障。
[R1]int s0
[R1-Serial0]shut //模拟网络网络故障
% Interface Serial0 is shut down
[R2]int s1
[R2-Serial1]undo shut //恢复电信网络
% Interface Serial1 is reset
[R2-Serial1]
:41:31: Interface Serial1 is UP
[R2-Serial1]dis vrrp
Ethernet0.2 | Virtual Router 20
state : Master
Virtual IP : 192.168.20.254
Priority : 120
Preempt : YES Delay Time : 0
Timer : 1
Auth Type : NO
Track IF : Serial1 Priority reduced : 30
Ethernet0.1 | Virtual Router 10
state : Master
Virtual IP : 192.168.10.254
Priority : 100
Preempt : YES Delay Time : 0
Timer : 1
Auth Type : NO
[R1]dis vrrp
Ethernet0.2 | Virtual Router 20
state : Backup
Virtual IP : 192.168.20.254
Priority : 100
Preempt : YES Delay Time : 0
Timer : 1
Auth Type : NO
Ethernet0.1 | Virtual Router 10
state : Backup
Virtual IP : 192.168.10.254
Priority : 90
Preempt : YES Delay Time : 0
Timer : 1
Auth Type : NO
Track IF : Serial0 Priority reduced : 30
4个主机正常工作。
故障模拟3,子端口出现故障。
[R1]int s0
[R1-Serial0]undo shut //恢复网通网络
% Interface Serial0 is reset
[R1-Serial0]
:43:07: Interface Serial0 is UP
[R1-Serial0]quit
[R1]int e0.1
[R1-Ethernet0.1]shut //模拟子端口故障
% Interface Ethernet0.1 is shut down
[R1-Ethernet0.1]
:43:30: Line protocol ip on the interface Ethernet0.1 is DOWN
[R1-Ethernet0.1]dis vrrp
Ethernet0.2 | Virtual Router 20
state : Backup
Virtual IP : 192.168.20.254
Priority : 100
Preempt : YES Delay Time : 0
Timer : 1
Auth Type : NO
Ethernet0.1 | Virtual Router 10
state : Initialize
Virtual IP : 192.168.10.254
Priority : 120
Preempt : YES Delay Time : 0
Timer : 1
Auth Type : NO
Track IF : Serial0 Priority reduced : 30
[R2]dis vrrp
Ethernet0.2 | Virtual Router 20
state : Master
Virtual IP : 192.168.20.254
Priority : 120
Preempt : YES Delay Time : 0
Timer : 1
Auth Type : NO
Track IF : Serial1 Priority reduced : 30
Ethernet0.1 | Virtual Router 10
state : Master
Virtual IP : 192.168.10.254
Priority : 100
Preempt : YES Delay Time : 0
Timer : 1
Auth Type : NO
4个主机依然正常工作。
方案3-1:stp和链路聚合
STP(Spanning Tree Protocol)是生成树协议的英文缩写。该协议可应用于在网络中建立树形拓扑,消除网络中的环路,并且可以通过一定的方法实现路径冗余,但不是一定可以实现路径冗余。生成树协议最主要的应用是为了避免局域网中的单点故障、网络回环,解决成环以太网网络的"广播风暴"问题,从某种意义上说是一种网络保护技术,可以消除由于失误或者意外带来的循环连接。但是,由于协议机制本身的局限,STP拓扑收敛慢,当网络拓扑发生改变的时候,生成树协议需要50-52秒的时间才能完成拓扑收敛;而且不能提供负载均衡的功能,当网络中出现环路的时候,生成树协议简单的将环路进行Block,这样该链路就不能进行数据包的转发,浪费网络资源。而链路聚合正好解决这两个不足,链路聚合是将两个或更多数据信道结合成一个单个的信道,该信道以一个单个的更高带宽的逻辑链路出现。
拓扑图
所需设备:
quidway S2403H-EI 2台
具体步骤:
SW1配置
System View: return to User View with Ctrl+Z.
[Quidway]sysname SW1
[SW1]stp enable
[SW1]dis stp
-------[CIST Global Info][Mode MSTP]-------
CIST Bridge :32768.000f-e274-4920
Bridge Times :Hello 2s MaxAge 20s FwDly 15s MaxHop 20
CIST Root/ERPC :32768.000f-e274-4920 / 0
CIST RegRoot/IRPC :32768.000f-e274-4920 / 0
CIST RootPortId :0.0
BPDU-Protection :disabled
TC-Protection :enabled / Threshold=6
Bridge Config
Digest Snooping :disabled
TC or TCN received :0
Time since last TC :0 days 0h:1m:20s
[SW1]dis stp brief
MSTID Port Role STP State Protection
0 Ethernet1/0/22 DESI FORWARDING NONE
0 Ethernet1/0/24 BACK DISCARDING NONE
[SW1]link-aggregation group 1 mode manual
[SW1]int e1/0/22
[SW1-Ethernet1/0/22]port link-aggregation group 1
[SW1-Ethernet1/0/22]dis link-aggregation summary
[SW1-Ethernet1/0/22]int e1/0/24
[SW1-Ethernet1/0/24]port link-aggregation group 1
Aggregation Group Type:D -- Dynamic, S -- Static , M -- Manual
Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing
Actor ID: 0x8000, 000f-e274-4920
AL AL Partner ID Select Unselect Share Master
ID Type Ports Ports Type Port
--------------------------------------------------------------------------------
1 M none 1 0 NonS Ethernet1/0/22
SW2配置
System View: return to User View with Ctrl+Z.
[Quidway]sysname SW2
[SW2]stp enable
[SW2]dis stp
-------[CIST Global Info][Mode MSTP]-------
CIST Bridge :32768.000f-e242-8a41
Bridge Times :Hello 2s MaxAge 20s FwDly 15s MaxHop 20
CIST Root/ERPC :32768.000f-e242-8a41 / 0
CIST RegRoot/IRPC :32768.000f-e242-8a41 / 0
CIST RootPortId :0.0
BPDU-Protection :disabled
TC-Protection :enabled / Threshold=6
Bridge Config
Digest Snooping :disabled
TC or TCN received :0
Time since last TC :0 days 0h:4m:5s
[SW2]link-aggregation group 1 mode manual
[SW2]int e1/0/22
[SW2-Ethernet1/0/22]port link-aggregation group 1
[SW2-Ethernet1/0/22]int e1/0/24
[SW2-Ethernet1/0/24]port link-aggregation group 1
[SW2-Ethernet1/0/24]dis link-aggregation summary
Aggregation Group Type:D -- Dynamic, S -- Static , M -- Manual
Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing
Actor ID: 0x8000, 000f-e242-8a41
AL AL Partner ID Select Unselect Share Master
ID Type Ports Ports Type Port
--------------------------------------------------------------------------------
1 M none 2 0 Shar Ethernet1/0/22
方案4.raid (CentOS6.4下测试)
磁盘阵列(Redundant Arrays of Independent Disks,RAID),有"价格便宜具有冗余能力的磁盘阵列"之意。原理是利用数组方式来作磁盘组,配合数据分散排列的设计,提升数据的安全性。磁盘阵列是由很多价格较便宜的磁盘,组合成一个容量巨大的磁盘组,利用个别磁盘提供数据所产生加成效果提升整个磁盘系统效能。利用这项技术,将数据切割成许多区段,分别存放在各个硬盘上。磁盘阵列还能利用同位检查(Parity Check)的观念,在数组中任一颗硬盘故障时,仍可读出数据,在数据重构时,将数据经计算后重新置入新硬盘中。
mdadm命令使用
--create(或其缩写-C)参数来创建新的陈列并且将一些重要阵列的标识信息作为元数据可以写在每一个底层设备的指定区间
--level(或者其缩写-l)表示阵列的RAID级别
--chunk(或者其缩写-c)表示每个条带单元的大小,以KB为单位,默认为64KB,条带单元的大小配置对不同负载下的阵列读写性能有很大影响
--raid-devices(或者其缩写-n)表示阵列中活跃的设备个数
--spare-devices(或者其缩写-x)表示阵列中热备盘的个数,一旦阵列中的某个磁盘失效,MD内核驱动程序自动用将热备磁盘加入到阵列,然后重构丢失磁盘上的数据到热备磁盘上。
--verbose(或者其缩写-v):显示细节过程
--fail(或者其缩写-f): 模拟故障
raid1
RAID1通过磁盘数据镜像实现数据冗余,在成对的独立磁盘上产生互为备份的数据。当原始数据繁忙时,可直接从镜像拷贝中读取数据,因此RAID1可以提高读取性能。RAID1是磁盘阵列中单位成本最高的,但提供了很高的数据安全性和可用性。当一个磁盘失效时,系统可以自动切换到镜像磁盘上读写,而不需要重组失效的数据。
[root@localhost ~]# mdadm -Cv /dev/md0 -l 1 -n 2 /dev/sdb /dev/sdc -x 1 /dev/sdd
Continue creating array? y
[root@localhost ~]# mkfs -t ext3 /dev/md0
[root@localhost ~]# mkdir /mnt/raid1
[root@localhost ~]# mount /dev/md0 /mnt/raid1/
[root@localhost ~]# df -h
/dev/md0 5.0G 139M 4.6G 3% /mnt/raid1
[root@localhost ~]# cd /mnt/raid1/
[root@localhost raid1]# cp -r /usr/share/* ./
^C
[root@localhost mnt]# du -sh raid1/
97Mraid1/
[root@localhost mnt]# vi /etc/fstab
/dev/md0 /mnt/raid1 auto defaults 0 0
[root@localhost mnt]# mount -a
[root@localhost mnt]# cd raid1/
[root@localhost raid1]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdd[2](S) sdc[1] sdb[0]
5238720 blocks super 1.2 [2/2] [UU]
unused devices:
[root@localhost ~]# mdadm --detail --scan /dev/md0
[root@localhost ~]# mdadm /dev/md0 -f /dev/sdb
mdadm: set /dev/sdb faulty in /dev/md0
[root@localhost ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdd[2] sdc[1] sdb[0](F)
5238720 blocks super 1.2 [2/2] [UU]
unused devices:
[root@localhost ~]# mdadm --detail --scan /dev/md0
[root@localhost ~]# mdadm --detail --scan >/etc/mdadm.conf
raid5
RAID 5 是一种存储性能、数据安全和存储成本兼顾的存储解决方案。 RAID 5可以理解为是RAID 0和RAID 1的折中方案。RAID 5可以为系统提供数据安全保障,但保障程度要比Mirror低而磁盘空间利用率要比Mirror高。RAID 5具有和RAID 0相近似的数据读取速度,只是多了一个奇偶校验信息,写入数据的速度比对单个磁盘进行写入操作稍慢。同时由于多个数据对应一个奇偶校验信息,RAID 5的磁盘空间利用率要比RAID 1高,存储成本相对较低。
[root@localhost ~]# fdisk -l
[root@localhost ~]# mdadm -Cv /dev/md0 -l 5 -n 3 /dev/sdb /dev/sdc /dev/sdd -x 1 /dev/sde
Continue creating array? y
[root@localhost ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd[4] sde[3](S) sdc[1] sdb[0]
10476544 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
unused devices:
[root@localhost ~]# mdadm --detail --scan
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 32 1 active sync /dev/sdc
4 8 48 2 active sync /dev/sdd
3 8 64 - spare /dev/sde
[root@localhost ~]# mkfs -t ext3 /dev/md0
[root@localhost ~]# mkdir /mnt/raid5
[root@localhost ~]# mount /dev/md0 /mnt/raid5
[root@localhost ~]# vi /etc/fstab
/dev/md0 /mnt/raid5 auto defaults 0 0
[root@localhost ~]# mdadm --detail --scan >/etc/mdadm.conf
[root@localhost ~]# cat /etc/mdadm.conf
ARRAY /dev/md0 metadata=1.2 spares=1 name=localhost.localdomain:0 UUID=52a024d4:76260d6f:9eeac1e5:f4a7d0d9
[root@localhost ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd[4] sde[3](S) sdc[1] sdb[0]
10476544 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
unused devices:
[root@localhost ~]# mdadm --detail --scan /dev/md0
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 32 1 active sync /dev/sdc
4 8 48 2 active sync /dev/sdd
3 8 64 - spare /dev/sde
[root@localhost ~]# mdadm /dev/md0 -f /dev/sdb
mdadm: set /dev/sdb faulty in /dev/md0
[root@localhost ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd[4] sde[3] sdc[1] sdb[0](F)
10476544 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
unused devices:
[root@localhost ~]# mdadm --detail --scan /dev/md0
Number Major Minor RaidDevice State
3 8 64 0 active sync /dev/sde
1 8 32 1 active sync /dev/sdc
4 8 48 2 active sync /dev/sdd
0 8 16 - faulty spare /dev/sdb