如何用使用DPDK优化VirtIO和OVS网络
如何用使用DPDK优化VirtIO和OVS网络,针对这个问题,这篇文章详细介绍了相对应的分析和解答,希望可以帮助更多想解决这个问题的小伙伴找到更简单易行的方法。
准备测试环境
一共有2
个节点,配置基本相同。节点A
用于运行虚拟机,节点B
用于测试性能。
查看系统信息
发行版版本:
$ cat /etc/redhat-release CentOS Linux release 7.4.1708 (Core)
内核版本:
$ uname -aLinux osdev-gpu 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
CPU
信息:
$ cat /proc/cpuinfo | tail -n26processor : 71vendor_id : GenuineIntelcpu family : 6model : 85model name : Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHzstepping : 4microcode : 0x200002ccpu MHz : 1499.941cache size : 25344 KBphysical id : 1siblings : 36core id : 27cpu cores : 18apicid : 119initial apicid : 119fpu : yesfpu_exception : yescpuid level : 22wp : yesflags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_reqbogomips : 4604.72clflush size : 64cache_alignment : 64address sizes : 46 bits physical, 48 bits virtualpower management:
内存信息:
$ free -h total used free shared buff/cache availableMem: 754G 5.7G 748G 13M 425M 746GSwap: 0B 0B 0B
网卡信息:
# 节点A:$ lspci | grep Ethernet1a:00.0 Ethernet controller: Intel Corporation Ethernet Connection X722 for 1GbE (rev 09)1a:00.1 Ethernet controller: Intel Corporation Ethernet Connection X722 for 1GbE (rev 09)1a:00.2 Ethernet controller: Intel Corporation Ethernet Connection X722 for 1GbE (rev 09)1a:00.3 Ethernet controller: Intel Corporation Ethernet Connection X722 for 1GbE (rev 09)5f:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)5f:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)$ lspci | awk '$0~/Ethernet/{printf($1 " ");gsub(":","\\:");cmd="ls /sys/bus/pci/devices/0000\\:" $1 "/driver/module/drivers/"; system(cmd)}'1a:00.0 pci:i40e1a:00.1 pci:i40e1a:00.2 pci:i40e1a:00.3 pci:i40e5f:00.0 pci:i40e5f:00.1 pci:i40e# 节点B:$ lspci | grep Ethernet1a:00.0 Ethernet controller: Intel Corporation Ethernet Connection X722 for 1GbE (rev 09)1a:00.1 Ethernet controller: Intel Corporation Ethernet Connection X722 for 1GbE (rev 09)1a:00.2 Ethernet controller: Intel Corporation Ethernet Connection X722 for 1GbE (rev 09)1a:00.3 Ethernet controller: Intel Corporation Ethernet Connection X722 for 1GbE (rev 09)5e:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)5e:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)60:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)60:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)$ lspci | awk '$0~/Ethernet/{printf($1 " ");gsub(":","\\:");cmd="ls /sys/bus/pci/devices/0000\\:" $1 "/driver/module/drivers/"; system(cmd)}'1a:00.0 pci:i40e1a:00.1 pci:i40e1a:00.2 pci:i40e1a:00.3 pci:i40e5e:00.0 pci:i40e5e:00.1 pci:i40e60:00.0 pci:i40e60:00.1 pci:i40e
开启IOMMU
如果要使用VFIO
驱动,则必须开启IOMMU
支持。进入主板的BIOS
,打开处理器的Intel VT-d
支持。如果需要使用单根虚拟化,则需要同时开启PCI
的SR-IOV
支持。
修改
Linux
内核启动参数,增加intel_iommu=on iommu=pt
选项:
$ vi /etc/default/grubGRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt crashkernel=auto biosdevname=0 rhgb quiet"
更新启动Grub启动配置,并重启主机:
$ grub2-mkconfig -o /boot/grub2/grub.cfg$ reboot
主机重启后,查看启动参数是否正确添加:
$ cat /proc/cmdlineBOOT_IMAGE=/vmlinuz-3.10.0-693.17.1.el7.x86_64 root=UUID=1645c2f2-2308-436e-86e6-91ebdc76477e ro intel_iommu=on iommu=pt crashkernel=auto biosdevname=0 rhgb quiet
查看内核中的
IOMMU
是否正确初始化,如果成功则会有如下信息:
$ dmesg | grep -e DMAR -e IOMMU[ 0.000000] ACPI: DMAR 000000006ca320d8 002A0 (v01 ALASKA A M I 00000001 INTL 20091013)[ 0.000000] DMAR: IOMMU enabled[ 0.348112] DMAR: Host address width 46[ 0.348114] DMAR: DRHD base: 0x000000d37fc000 flags: 0x0[ 0.348123] DMAR: dmar0: reg_base_addr d37fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df[ 0.348124] DMAR: DRHD base: 0x000000e0ffc000 flags: 0x0[ 0.348128] DMAR: dmar1: reg_base_addr e0ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df[ 0.348129] DMAR: DRHD base: 0x000000ee7fc000 flags: 0x0[ 0.348133] DMAR: dmar2: reg_base_addr ee7fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df[ 0.348134] DMAR: DRHD base: 0x000000fbffc000 flags: 0x0[ 0.348138] DMAR: dmar3: reg_base_addr fbffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df[ 0.348139] DMAR: DRHD base: 0x000000aaffc000 flags: 0x0[ 0.348156] DMAR: dmar4: reg_base_addr aaffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df[ 0.348157] DMAR: DRHD base: 0x000000b87fc000 flags: 0x0[ 0.348162] DMAR: dmar5: reg_base_addr b87fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df...
查看
IOMMU
分组是否正常,如果目录非空则表明分组成功:
$ ls /sys/kernel/iommu_groups/0 11 14 17 2 22 25 28 30 33 36 39 41 44 47 5 52 55 58 60 63 66 69 71 74 77 8 82 851 12 15 18 20 23 26 29 31 34 37 4 42 45 48 50 53 56 59 61 64 67 7 72 75 78 80 83 8610 13 16 19 21 24 27 3 32 35 38 40 43 46 49 51 54 57 6 62 65 68 70 73 76 79 81 84 9
安装Qemu-KVM
新增
Qmu-KVM
源:
$ yum install -y epel-release$ yum install -y centos-release-qemu-ev
安装并查看
Qemu-KVM
版本:
$ yum install -y qemu-kvm-ev$ /usr/libexec/qemu-kvm --versionQEMU emulator version 2.9.0(qemu-kvm-ev-2.9.0-16.el7_4.14.1)Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers$ qemu-img --versionqemu-img version 2.9.0(qemu-kvm-ev-2.9.0-16.el7_4.14.1)Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
准备虚拟机镜像
下载Cloud镜像
安装依赖软件包:
$ yum install -y libvirt libguestfs-tools libguestfs-xfs genisoimage$ systemctl enable libvirtd && systemctl start libvirtd && systemctl status libvirtd
下载CentOS的Cloud镜像:
$ export OVS_ROOT=/opt/ovs$ mkdir -pv $OVS_ROOT/images$ wget http://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2 --directory-prefix $OVS_ROOT/images
查看镜像信息:
$ qemu-img info $OVS_ROOT/images/CentOS-7-x86_64-GenericCloud.qcow2image: /opt/ovs/images/CentOS-7-x86_64-GenericCloud.qcow2file format: qcow2virtual size: 8.0G (8589934592 bytes)disk size: 832Mcluster_size: 65536Format specific information: compat: 0.10 refcount bits: 16$ virt-filesystems --long --parts --blkdevs -h -a $OVS_ROOT/images/CentOS-7-x86_64-GenericCloud.qcow2Name Type MBR Size Parent/dev/sda1 partition 83 8.0G /dev/sda/dev/sda device - 8.0G -$ virt-df -h $OVS_ROOT/images/CentOS-7-x86_64-GenericCloud.qcow2文件系统 大小 已用空间 可用空间 使用百分比%CentOS-7-x86_64-GenericCloud.qcow2:/dev/sda1 8.0G 795M 7.2G 10%
扩展镜像分区
创建一个新镜像:
$ qemu-img create -f qcow2 $OVS_ROOT/images/CentOS-7-x86_64.qcow2 20GFormatting '/opt/ovs/images/CentOS-7-x86_64.qcow2', fmt=qcow2 size=21474836480 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
扩展镜像分区:
$ virt-resize $OVS_ROOT/images/CentOS-7-x86_64-GenericCloud.qcow2 $OVS_ROOT/images/CentOS-7-x86_64.qcow2 --expand /dev/sda1[ 0.0] Examining /opt/ovs/images/CentOS-7-x86_64-GenericCloud.qcow2 25% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒══════════════════════════════════════════⟧ --:-- 100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ --:--**********Summary of changes:/dev/sda1: This partition will be resized from 8.0G to 20.0G. The filesystem xfs on /dev/sda1 will be expanded using the 'xfs_growfs' method.**********[ 17.3] Setting up initial partition table on /opt/ovs/images/CentOS-7-x86_64.qcow2[ 17.5] Copying /dev/sda1 100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ 00:00[ 26.8] Expanding /dev/sda1 using the 'xfs_growfs' methodResize operation completed with no errors. Before deleting the old disk, carefully check that the resized disk boots and works correctly.
查看新镜像信息:
$ qemu-img info $OVS_ROOT/images/CentOS-7-x86_64.qcow2image: /opt/ovs/images/CentOS-7-x86_64.qcow2file format: qcow2virtual size: 20G (21474836480 bytes)disk size: 834Mcluster_size: 65536Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false$ virt-filesystems --long --parts --blkdevs -h -a $OVS_ROOT/images/CentOS-7-x86_64.qcow2Name Type MBR Size Parent/dev/sda1 partition 83 20G /dev/sda/dev/sda device - 20G -$ virt-df -h $OVS_ROOT/images/CentOS-7-x86_64.qcow2文件系统 大小 已用空间 可用空间 使用百分比%CentOS-7-x86_64.qcow2:/dev/sda1 20G 795M 19G 4%
创建元数据镜像
创建元数据配置:
$ vi $OVS_ROOT/images/meta-datainstance-id: centos7-ovs;local-hostname: centos7-ovs;
创建用户数据配置(把
ssh_authorized_keys
的Key
换成自己的):
$ vi $OVS_ROOT/images/user-data#cloud-configuser: rootpassword: 123456chpasswd: { expire: False }ssh_pwauth: Truessh_authorized_keys: - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD6ijVswX40X2ercmKDZD8mVRD6GFkhdeBM/OjvGd8mSnLDdOjhZw83jhMm/rptNEDlpBW+IjxOZsDO3wm+iVcGn5LblJ3qXtdGEIHlsttFAkLsF8B3jnLBeSTRee8JXZBic5KdfYffq9JC3WrgGJl+OQz6mNW7rqquBFI98QCVlsZEvsJzw5LEm1/ej3Ka2pSkTEei+sB4PBPolnH9cUahq5T8Wwgtlw6JutNob1e5OgFWvPThTRWAtCqLaFWenedagKEA4jPseuF7dq/Eb7nEqL2jYNsWKyR1JpuUdejxAfw434guitORyvLCbj022Sgn5bwEYPIw3EVykcc6XxoZ root@osdev-gpufinal_message: "SYSTEM READY TO LOG IN"
生成元数据镜像:
$ genisoimage -output $OVS_ROOT/images/centos7-init.iso -volid cidata -joliet -rock $OVS_ROOT/images/user-data $OVS_ROOT/images/meta-data
不使用DPDK的测试
编译安装OVS
安装依赖软件包:
$ yum -y install gcc autoconf automake libtool kernel kernel-devel
获取源码:
$ export OVS_ROOT=/opt/ovs && cd $OVS_ROOT$ git clone https://github.com/openvswitch/ovs.git$ cd ovs && git checkout -b v2.9.0/origin v2.9.0 && git checkout -b v2.9.0/devel
编译和安装:
$ ./boot.sh$ mkdir -pv $OVS_ROOT/build-nodpdk $OVS_ROOT/target-nodpdk && cd $OVS_ROOT/build-nodpdk$ ../ovs/configure --enable-shared --with-linux=/lib/modules/$(uname -r)/build --prefix=$OVS_ROOT/target-nodpdk CFLAGS="-g -Ofast"$ make -j16 'CFLAGS=-g -Ofast -march=native' && make install$ mkdir -pv $OVS_ROOT/target-nodpdk/modules && cp -vf $OVS_ROOT/build-nodpdk/datapath/linux/*ko $OVS_ROOT/target-nodpdk/modules/
启动停止OVS
初始化环境:
$ export OVS_ROOT=/opt/ovs && export OVS_DIR=$OVS_ROOT/target-nodpdk && export PATH=$PATH:$OVS_DIR/share/openvswitch/scripts && cd $OVS_DIR
使用脚本自动启动:
$ ovs-ctl start
使用脚本自动停止:
$ ovs-ctl stop
使用命令手动启动:
$ mkdir -pv $OVS_DIR/var/run/openvswitch $OVS_DIR/etc/openvswitch# 创建ovsdb-server数据库:$ $OVS_DIR/bin/ovsdb-tool create $OVS_DIR/etc/openvswitch/conf.db $OVS_DIR/share/openvswitch/vswitch.ovsschema# 启动ovsdb-server数据库服务:$ $OVS_DIR/sbin/ovsdb-server --remote=punix:$OVS_DIR/var/run/openvswitch/db.sock --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach# 初始化数据库(只在第一次运行时需要执行):$ $OVS_DIR/bin/ovs-vsctl --no-wait init# 查看依赖的内核模块:$ cat /usr/lib/modules/`uname -r`/modules.dep | awk '$1~"^extra/openvswitch"{for(i=2;i<=NF;i++) {print $i}}' | xargs echokernel/net/ipv6/netfilter/nf_nat_ipv6.ko.xz kernel/net/ipv4/netfilter/nf_nat_ipv4.ko.xz kernel/net/ipv6/netfilter/nf_defrag_ipv6.ko.xz kernel/net/netfilter/nf_nat.ko.xz kernel/net/netfilter/nf_conntrack.ko.xz kernel/net/ipv4/udp_tunnel.ko.xz kernel/lib/libcrc32c.ko.xz# 加载openvswitch.ko依赖的内核模块:$ cat /usr/lib/modules/`uname -r`/modules.dep | awk '$1~"^extra/openvswitch"{for(i=2;i<=NF;i++) {mod=gensub(/\.ko\.xz/,"",1,gensub(/.*\//,"",1,$i));cmd="modprobe " mod;system(cmd)}}'# 加载openvswitch.ko内核模块:$ insmod $OVS_DIR/modules/openvswitch.ko# 启动ovs-vswitchd守护进程:$ $OVS_DIR/sbin/ovs-vswitchd --pidfile --detach --log-file
使用命令手动停止:
$ pkill -9 ovs$ rm -rfv $OVS_DIR/var/run/openvswitch $OVS_DIR/etc/openvswitch/ $OVS_DIR/etc/openvswitch/conf.db
编辑环境变量脚本:
$ vi $OVS_ROOT/ovs-env-nodpdk.shexport OVS_ROOT=/opt/ovsexport OVS_DIR=$OVS_ROOT/target-nodpdkexport PATH=$OVS_DIR/bin/:$OVS_DIR/share/openvswitch/scripts:$PATH
编辑
OVS
启动脚本:
$ vi $OVS_ROOT/start-ovs.sh#!/bin/bash$OVS_DIR/sbin/ovsdb-server --remote=punix:$OVS_DIR/var/run/openvswitch/db.sock --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detachcat /usr/lib/modules/`uname -r`/modules.dep | awk '$1~"^extra/openvswitch"{for(i=2;i<=NF;i++) {mod=gensub(/\.ko\.xz/,"",1,gensub(/.*\//,"",1,$i));cmd="modprobe " mod;system(cmd)}}'insmod $OVS_DIR/modules/openvswitch.ko$OVS_DIR/sbin/ovs-vswitchd --pidfile --detach --log-file$ chmod a+x $OVS_ROOT/start-ovs.sh
编辑
OVS
停止脚本:
$ vi $OVS_ROOT/stop-ovs.sh#!/bin/bashpkill -9 ovs# rm -rfv $OVS_DIR/var/run/openvswitch $OVS_DIR/etc/openvswitch/ $OVS_DIR/etc/openvswitch/conf.db$ chmod a+x $OVS_ROOT/stop-ovs.sh
创建网络环境
设置查看环境
设置环境变量:
$ export OVS_ROOT=/opt/ovs && export OVS_DIR=$OVS_ROOT/target-nodpdk && export PATH=$OVS_DIR/bin/:$OVS_DIR/share/openvswitch/scripts:$PATH# Or$ . /opt/ovs/ovs-env-nodpdk.sh
开启数据包转发:
$ echo 1 > /proc/sys/net/ipv4/ip_forward
查看当前网卡配置:
# 节点A:$ ip a | grep -A2 "enp.*: "2: enp26s0f0:mtu 1500 qdisc noop portid 6c92bf74beac state DOWN qlen 1000 link/ether 6c:92:bf:74:be:ac brd ff:ff:ff:ff:ff:ff3: enp26s0f1: mtu 1500 qdisc noop portid 6c92bf74bead state DOWN qlen 1000 link/ether 6c:92:bf:74:be:ad brd ff:ff:ff:ff:ff:ff4: enp26s0f2: mtu 1500 qdisc mq portid 6c92bf74beae state UP qlen 1000 link/ether 6c:92:bf:74:be:ae brd ff:ff:ff:ff:ff:ff inet 172.29.101.166/24 brd 172.29.101.255 scope global enp26s0f2--5: enp26s0f3: mtu 1500 qdisc noop portid 6c92bf74beaf state DOWN qlen 1000 link/ether 6c:92:bf:74:be:af brd ff:ff:ff:ff:ff:ff6: enp95s0f0: mtu 1500 qdisc mq portid 6c92bf5cdc5e state UP qlen 1000 link/ether 6c:92:bf:5c:dc:5e brd ff:ff:ff:ff:ff:ff inet 172.29.113.28/24 brd 172.29.113.255 scope global enp95s0f0--7: enp95s0f1: mtu 1500 qdisc noop portid 6c92bf5cdc5f state DOWN qlen 1000 link/ether 6c:92:bf:5c:dc:5f brd ff:ff:ff:ff:ff:ff8: docker0: mtu 1500 qdisc noqueue state UP# 节点B:$ ip a | grep -A2 "enp.*: "2: enp26s0f0: mtu 1500 qdisc mq portid 6c92bf74bfd0 state DOWN qlen 1000 link/ether 6c:92:bf:74:bf:d0 brd ff:ff:ff:ff:ff:ff3: enp26s0f1: mtu 1500 qdisc mq portid 6c92bf74bfd1 state DOWN qlen 1000 link/ether 6c:92:bf:74:bf:d1 brd ff:ff:ff:ff:ff:ff4: enp26s0f2: mtu 1500 qdisc mq portid 6c92bf74bfd2 state UP qlen 1000 link/ether 6c:92:bf:74:bf:d2 brd ff:ff:ff:ff:ff:ff5: enp26s0f3: mtu 1500 qdisc mq portid 6c92bf74bfd3 state UP qlen 1000 link/ether 6c:92:bf:74:bf:d3 brd ff:ff:ff:ff:ff:ff inet 172.29.101.171/24 brd 172.29.101.255 scope global enp26s0f3--6: enp94s0f0: mtu 1500 qdisc mq portid 6c92bf5bc5cd state UP qlen 1000 link/ether 6c:92:bf:5b:c5:cd brd ff:ff:ff:ff:ff:ff inet 172.29.122.43/24 brd 172.29.122.255 scope global enp94s0f0--7: enp94s0f1: mtu 1500 qdisc mq portid 6c92bf5bc5ce state UP qlen 1000 link/ether 6c:92:bf:5b:c5:ce brd ff:ff:ff:ff:ff:ff8: enp96s0f0: mtu 1500 qdisc mq portid 6c92bf5bc59d state UP qlen 1000 link/ether 6c:92:bf:5b:c5:9d brd ff:ff:ff:ff:ff:ff9: enp96s0f1: mtu 1500 qdisc mq portid 6c92bf5bc59e state UP qlen 1000 link/ether 6c:92:bf:5b:c5:9e brd ff:ff:ff:ff:ff:ff10: virbr0: mtu 1500 qdisc noqueue state DOWN qlen 1000
创建NAT网络
在节点A
上创建的NAT
网络网段为192.168.2.0/24
,网关为192.168.2.99
。
新增
OVS
网桥br-ovs
:
$ ovs-vsctl add-br br-ovs$ ip link set dev br-ovs up$ ip addr add 192.168.2.99/24 dev br-ovs$ ip addr show br-ovs21: br-ovs:mtu 1500 qdisc noqueue state UNKNOWN qlen 1000 link/ether a2:e8:a6:bb:55:46 brd ff:ff:ff:ff:ff:ff inet 192.168.2.99/24 scope global br-ovs valid_lft forever preferred_lft forever inet6 fe80::a0e8:a6ff:febb:5546/64 scope link valid_lft forever preferred_lft forever
增加
NAT
地址转换:
$ iptables -t nat -A POSTROUTING -s 192.168.2.0/24 ! -d 192.168.2.0/24 -j MASQUERADE
配置
DHCP
服务:
$ yum install -y dnsmasq$ dnsmasq --strict-order --except-interface=lo --interface=br-ovs --listen-address=192.168.2.99 --bind-interfaces --dhcp-range=192.168.2.128,192.168.2.192 --conf-file="" --pid-file=/var/run/br-ovs-dhcp.pid --dhcp-leasefile=/var/run/br-ovs-dhcp.leases --dhcp-no-override
创建
NAT
网络启动脚本:
$ vi $OVS_ROOT/start-nat.sh#!/bin/bash#ovs-vsctl del-br br-ovs || return 0#ovs-vsctl add-br br-ovsip link set dev br-ovs upip addr add 192.168.2.99/24 dev br-ovsiptables -t nat -A POSTROUTING -s 192.168.2.0/24 ! -d 192.168.2.0/24 -j MASQUERADEdnsmasq --strict-order --except-interface=lo --interface=br-ovs --listen-address=192.168.2.99 --bind-interfaces --dhcp-range=192.168.2.128,192.168.2.192 --conf-file="" --pid-file=/var/run/br-ovs-dhcp.pid --dhcp-leasefile=/var/run/br-ovs-dhcp.leases --dhcp-no-override$ chmod a+x $OVS_ROOT/start-nat.sh
测试NAT网络
增加测试网络
NameSpace
:
$ ip netns add ovs-ns
创建测试
VEth
:
$ ip link add ovs-veth-br type veth peer name ovs-veth-in
把
VEth
一端加入br-ovs
网桥:
$ ovs-vsctl add-port br-ovs ovs-veth-br$ ip link set dev ovs-veth-br up
把
VEth
另一端加入NameSpace
,并配置IP
:
$ ip link set ovs-veth-in netns ovs-ns$ ip netns exec ovs-ns ip link set dev ovs-veth-in up$ ip netns exec ovs-ns ip addr add 192.168.2.95/24 dev ovs-veth-in$ ip netns exec ovs-ns ifconfigovs-veth-in: flags=4163mtu 1500 inet 192.168.2.95 netmask 255.255.255.0 broadcast 0.0.0.0 inet6 fe80::25:e8ff:fe78:9b31 prefixlen 64 scopeid 0x20 ether 02:25:e8:78:9b:31 txqueuelen 1000 (Ethernet) RX packets 8 bytes 648 (648.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 8 bytes 648 (648.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
为
NameSpace
中网络的路由:
$ ip netns exec ovs-ns route add default gw 192.168.2.99$ ip netns exec ovs-ns route -nKernel IP routing tableDestination Gateway Genmask Flags Metric Ref Use Iface0.0.0.0 192.168.2.99 0.0.0.0 UG 0 0 0 ovs-veth-in192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 ovs-veth-in
测试
NAT
网络是否正常:
$ ip netns exec ovs-ns ping -c 4 8.8.8.8PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.64 bytes from 8.8.8.8: icmp_seq=1 ttl=43 time=49.5 ms64 bytes from 8.8.8.8: icmp_seq=2 ttl=43 time=49.0 ms64 bytes from 8.8.8.8: icmp_seq=3 ttl=43 time=49.0 ms64 bytes from 8.8.8.8: icmp_seq=4 ttl=43 time=49.2 ms--- 8.8.8.8 ping statistics ---4 packets transmitted, 4 received, 0% packet loss, time 3006msrtt min/avg/max/mdev = 49.022/49.197/49.504/0.193 ms$ ip netns exec ovs-ns ping -c 4 www.163.comPING 163.xdwscache.ourglb0.com (58.221.28.167) 56(84) bytes of data.64 bytes from 58.221.28.167 (58.221.28.167): icmp_seq=1 ttl=47 time=15.1 ms64 bytes from 58.221.28.167 (58.221.28.167): icmp_seq=2 ttl=47 time=14.5 ms64 bytes from 58.221.28.167 (58.221.28.167): icmp_seq=3 ttl=47 time=14.6 ms64 bytes from 58.221.28.167 (58.221.28.167): icmp_seq=4 ttl=47 time=14.8 ms--- 163.xdwscache.ourglb0.com ping statistics ---4 packets transmitted, 4 received, 0% packet loss, time 3009msrtt min/avg/max/mdev = 14.561/14.810/15.183/0.252 ms
测试DHCP服务是否正常:
$ ip netns exec ovs-ns dhclient$ tcpdump -n -i br-ovstcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on br-ovs, link-type EN10MB (Ethernet), capture size 262144 bytes19:31:30.619593 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 02:25:e8:78:9b:31, length 30019:31:30.619817 IP 192.168.2.99.bootps > 192.168.2.158.bootpc: BOOTP/DHCP, Reply, length 30019:31:30.680620 ARP, Request who-has 192.168.2.158 (Broadcast) tell 0.0.0.0, length 2819:31:31.681806 ARP, Request who-has 192.168.2.158 (Broadcast) tell 0.0.0.0, length 2819:31:35.628085 ARP, Request who-has 192.168.2.158 tell 192.168.2.99, length 2819:31:35.628350 ARP, Reply 192.168.2.158 is-at 02:25:e8:78:9b:31, length 28$ ip netns exec ovs-ns ip addr1: lo:mtu 65536 qdisc noop state DOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:0024: ovs-veth-in@if25: mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 02:25:e8:78:9b:31 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.2.95/24 scope global ovs-veth-in valid_lft forever preferred_lft forever inet 192.168.2.158/24 brd 192.168.2.255 scope global secondary dynamic ovs-veth-in valid_lft 3377sec preferred_lft 3377sec inet6 fe80::25:e8ff:fe78:9b31/64 scope link valid_lft forever preferred_lft forever
创建桥接网络
两个节点添加的物理网卡在一个交换机的相同VLAN
上。
在节点
A
的br-ovs
网桥上添加物理网口enp26s0f2
:
$ ovs-vsctl add-port br-ovs enp26s0f2$ ovs-vsctl show249d4a72-bf90-4250-aafd-23e12b2a0868 Bridge br-ovs Port "tap-test1" Interface "tap-test1" Port br-ovs Interface br-ovs type: internal Port ovs-veth-br Interface ovs-veth-br Port "enp26s0f2" Interface "enp26s0f2" Port "tap-test2" Interface "tap-test2"
在节点
B
上创建网桥br-ovs
,添加物理网卡enp26s0f2
,并设置网桥IP为192.168.2.98/24
:
$ ovs-vsctl add-br br-ovs$ ovs-vsctl set bridge br-ovs stp_enable=false$ ovs-vsctl add-port br-ovs enp26s0f2$ ovs-vsctl showc50241cc-7046-4673-9f4c-c9a33ea3bb28 Bridge br-ovs Port "enp26s0f2" Interface "enp26s0f2" Port br-ovs Interface br-ovs type: internal$ ip link set dev br-ovs up$ ip addr add 192.168.2.98/24 dev br-ovs
测试是否可以正常通信:
# 节点A:$ ping -c 4 192.168.2.98PING 192.168.2.98 (192.168.2.98) 56(84) bytes of data.64 bytes from 192.168.2.98: icmp_seq=1 ttl=64 time=0.545 ms64 bytes from 192.168.2.98: icmp_seq=2 ttl=64 time=0.089 ms64 bytes from 192.168.2.98: icmp_seq=3 ttl=64 time=0.081 ms64 bytes from 192.168.2.98: icmp_seq=4 ttl=64 time=0.082 ms--- 192.168.2.98 ping statistics ---4 packets transmitted, 4 received, 0% packet loss, time 3002msrtt min/avg/max/mdev = 0.081/0.199/0.545/0.199 ms# 节点B:$ ping -c 4 192.168.2.99PING 192.168.2.99 (192.168.2.99) 56(84) bytes of data.64 bytes from 192.168.2.99: icmp_seq=1 ttl=64 time=0.113 ms64 bytes from 192.168.2.99: icmp_seq=2 ttl=64 time=0.092 ms64 bytes from 192.168.2.99: icmp_seq=3 ttl=64 time=0.085 ms64 bytes from 192.168.2.99: icmp_seq=4 ttl=64 time=0.081 ms--- 192.168.2.99 ping statistics ---4 packets transmitted, 4 received, 0% packet loss, time 2999msrtt min/avg/max/mdev = 0.081/0.092/0.113/0.017 ms
创建网卡脚本
创建网络启动脚本:
$ vi $OVS_ROOT/ovs-ifup#!/bin/shset -ueswitch='br-ovs'/usr/sbin/ip link set dev $1 up$OVS_DIR/bin/ovs-vsctl add-port ${switch} $1
创建网络停止脚本:
$ vi $OVS_ROOT/ovs-ifdown#!/bin/shset -ueswitch='br-ovs'/usr/sbin/ip link set dev $1 down$OVS_DIR/bin/ovs-vsctl del-port ${switch} $1
为脚本增加可执行权限:
$ chmod a+x $OVS_ROOT/ovs-ifup && chmod a+x $OVS_ROOT/ovs-ifup
正式开始测试
设初始化环境:
$ . /opt/ovs/ovs-env-nodpdk.sh
创建测试快照:
$ qemu-img create -f qcow2 -b $OVS_ROOT/images/CentOS-7-x86_64.qcow2 -o backing_fmt=qcow2 $OVS_ROOT/images/CentOS-7-x86_64_Snapshot1.qcow2 20G$ qemu-img create -f qcow2 -b $OVS_ROOT/images/CentOS-7-x86_64.qcow2 -o backing_fmt=qcow2 $OVS_ROOT/images/CentOS-7-x86_64_Snapshot2.qcow2 20G
启动测试虚拟机1:
$ /usr/libexec/qemu-kvm -smp 2 -m 1024 -serial stdio -cdrom $OVS_ROOT/images/centos7-init.iso -hda $OVS_ROOT/images/CentOS-7-x86_64_Snapshot1.qcow2 -net nic,model=virtio,macaddr=fa:16:3e:4d:58:6f -net tap,ifname=tap-test1,script=$OVS_ROOT/ovs-ifup,downscript=$OVS_ROOT/ovs-ifdown$ ip addr show eth02: eth0:mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether fa:16:3e:4d:58:6f brd ff:ff:ff:ff:ff:ff inet 192.168.2.139/24 brd 192.168.2.255 scope global dynamic eth0 valid_lft 3509sec preferred_lft 3509sec inet6 fe80::f816:3eff:fe4d:586f/64 scope link valid_lft forever preferred_lft forever
启动测试虚拟机2:
$ /usr/libexec/qemu-kvm -smp 2 -m 1024 -serial stdio -cdrom $OVS_ROOT/images/centos7-init.iso -hda $OVS_ROOT/images/CentOS-7-x86_64_Snapshot2.qcow2 -net nic,model=virtio,macaddr=fa:16:3e:4d:58:7f -net tap,ifname=tap-test2,script=$OVS_ROOT/ovs-ifup,downscript=$OVS_ROOT/ovs-ifdown$ ip addr show eth02: eth0:mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether fa:16:3e:4d:58:7f brd ff:ff:ff:ff:ff:ff inet 192.168.2.155/24 brd 192.168.2.255 scope global dynamic eth0 valid_lft 2634sec preferred_lft 2634sec inet6 fe80::f816:3eff:fe4d:587f/64 scope link valid_lft forever preferred_lft forever
安装软件包,并查看网卡信息:
$ yum install -y vim pciutils iperf3$ lspci | grep Virtio00:03.0 Ethernet controller: Red Hat, Inc Virtio network device$ ethtool -i eth0driver: virtio_netversion: 1.0.0firmware-version: expansion-rom-version: bus-info: 0000:00:03.0supports-statistics: nosupports-test: nosupports-eeprom-access: nosupports-register-dump: nosupports-priv-flags: no
测试相同节点,虚拟机间
Virt-IO
网卡性能,CPU占用率分别为180%(Guest1)、200%(Guest2)和2.6%(ovs-vswitchd):
Guest1 $ iperf3 -s-----------------------------------------------------------Server listening on 5201-----------------------------------------------------------Accepted connection from 192.168.2.155, port 41600[ 5] local 192.168.2.139 port 5201 connected to 192.168.2.155 port 41602[ ID] Interval Transfer Bandwidth[ 5] 0.00-1.00 sec 401 MBytes 3.36 Gbits/sec [ 5] 1.00-2.00 sec 415 MBytes 3.48 Gbits/sec [ 5] 2.00-3.00 sec 419 MBytes 3.52 Gbits/sec [ 5] 3.00-4.00 sec 414 MBytes 3.47 Gbits/sec [ 5] 4.00-5.00 sec 320 MBytes 2.69 Gbits/sec [ 5] 5.00-6.00 sec 404 MBytes 3.39 Gbits/sec [ 5] 6.00-7.00 sec 408 MBytes 3.43 Gbits/sec [ 5] 7.00-8.00 sec 386 MBytes 3.24 Gbits/sec [ 5] 8.00-9.00 sec 373 MBytes 3.13 Gbits/sec [ 5] 9.00-10.00 sec 375 MBytes 3.15 Gbits/sec [ 5] 10.00-10.04 sec 15.3 MBytes 3.21 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - -[ ID] Interval Transfer Bandwidth[ 5] 0.00-10.04 sec 0.00 Bytes 0.00 bits/sec sender[ 5] 0.00-10.04 sec 3.84 GBytes 3.28 Gbits/sec receiverGuest2 $ iperf3 -c 192.168.2.139Connecting to host 192.168.2.139, port 5201[ 4] local 192.168.2.155 port 41602 connected to 192.168.2.139 port 5201[ ID] Interval Transfer Bandwidth Retr Cwnd[ 4] 0.00-1.00 sec 419 MBytes 3.51 Gbits/sec 18 1.30 MBytes [ 4] 1.00-2.00 sec 415 MBytes 3.48 Gbits/sec 1 1.09 MBytes [ 4] 2.00-3.00 sec 420 MBytes 3.52 Gbits/sec 0 1.35 MBytes [ 4] 3.00-4.00 sec 410 MBytes 3.44 Gbits/sec 1 1.14 MBytes [ 4] 4.00-5.00 sec 324 MBytes 2.72 Gbits/sec 0 1.33 MBytes [ 4] 5.00-6.00 sec 404 MBytes 3.39 Gbits/sec 1 1.12 MBytes [ 4] 6.00-7.00 sec 409 MBytes 3.43 Gbits/sec 0 1.36 MBytes [ 4] 7.00-8.00 sec 385 MBytes 3.23 Gbits/sec 2 1.12 MBytes [ 4] 8.00-9.00 sec 372 MBytes 3.12 Gbits/sec 0 1.35 MBytes [ 4] 9.00-10.00 sec 376 MBytes 3.16 Gbits/sec 4 1.09 MBytes - - - - - - - - - - - - - - - - - - - - - - - - -[ ID] Interval Transfer Bandwidth Retr[ 4] 0.00-10.00 sec 3.84 GBytes 3.30 Gbits/sec 27 sender[ 4] 0.00-10.00 sec 3.84 GBytes 3.30 Gbits/sec receiveriperf Done.
测试不同节点,物理网卡与虚拟机
Virt-IO
网卡性能,CPU占用率分别为180%(Qemu),2.5%(iperf)和2.6%(ovs-vswitchd):
Guest1 $ iperf3 -s-----------------------------------------------------------Server listening on 5201-----------------------------------------------------------Accepted connection from 192.168.2.98, port 42720[ 5] local 192.168.2.139 port 5201 connected to 192.168.2.98 port 42722[ ID] Interval Transfer Bandwidth[ 5] 0.00-1.00 sec 108 MBytes 907 Mbits/sec [ 5] 1.00-2.00 sec 112 MBytes 941 Mbits/sec [ 5] 2.00-3.00 sec 112 MBytes 941 Mbits/sec [ 5] 3.00-4.00 sec 112 MBytes 939 Mbits/sec [ 5] 4.00-5.00 sec 112 MBytes 941 Mbits/sec [ 5] 5.00-6.00 sec 112 MBytes 941 Mbits/sec [ 5] 6.00-7.00 sec 112 MBytes 941 Mbits/sec [ 5] 7.00-8.00 sec 112 MBytes 941 Mbits/sec [ 5] 8.00-9.00 sec 112 MBytes 941 Mbits/sec [ 5] 9.00-10.00 sec 112 MBytes 941 Mbits/sec [ 5] 10.00-10.04 sec 4.19 MBytes 942 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - -[ ID] Interval Transfer Bandwidth[ 5] 0.00-10.04 sec 0.00 Bytes 0.00 bits/sec sender[ 5] 0.00-10.04 sec 1.10 GBytes 938 Mbits/sec receiverNode B $ iperf3 -c 192.168.2.139Connecting to host 192.168.2.139, port 5201[ 4] local 192.168.2.98 port 42722 connected to 192.168.2.139 port 5201[ ID] Interval Transfer Bandwidth Retr Cwnd[ 4] 0.00-1.00 sec 114 MBytes 958 Mbits/sec 2 370 KBytes [ 4] 1.00-2.00 sec 112 MBytes 939 Mbits/sec 0 373 KBytes [ 4] 2.00-3.00 sec 112 MBytes 940 Mbits/sec 0 379 KBytes [ 4] 3.00-4.00 sec 113 MBytes 944 Mbits/sec 0 379 KBytes [ 4] 4.00-5.00 sec 112 MBytes 937 Mbits/sec 0 382 KBytes [ 4] 5.00-6.00 sec 113 MBytes 946 Mbits/sec 0 383 KBytes [ 4] 6.00-7.00 sec 112 MBytes 937 Mbits/sec 0 385 KBytes [ 4] 7.00-8.00 sec 112 MBytes 942 Mbits/sec 0 386 KBytes [ 4] 8.00-9.00 sec 112 MBytes 940 Mbits/sec 0 427 KBytes [ 4] 9.00-10.00 sec 113 MBytes 952 Mbits/sec 0 553 KBytes - - - - - - - - - - - - - - - - - - - - - - - - -[ ID] Interval Transfer Bandwidth Retr[ 4] 0.00-10.00 sec 1.10 GBytes 944 Mbits/sec 2 sender[ 4] 0.00-10.00 sec 1.10 GBytes 941 Mbits/sec receiver
使用DPDK的测试
编译安装DPDK
安装依赖软件包:
$ yum install -y gcc numactl numactl-libs numactl-devel kernel kernel-debug kernel-debug-devel kernel-devel kernel-doc kernel-headers libpcap-devel
设置环境变量:
$ export OVS_ROOT=/opt/ovs$ export DPDK_DIR=$OVS_ROOT/dpdk$ export DPDK_BUILD=$DPDK_DIR/build$ export DPDK_INSTALL=$DPDK_DIR/install$ export DPDK_TARGET=x86_64-native-linuxapp-gcc
获取源码:
$ cd $OVS_ROOT$ git clone http://dpdk.org/git/dpdk$ cd dpdk && git checkout -b v18.02/origin v18.02 && git checkout -b v18.02/devel
配置和编译:
$ make config T=$DPDK_TARGET$ sed -ri 's,(PMD_PCAP=).*,\1y,' build/.config$ make -j16 && make install DESTDIR=$DPDK_INSTALL$ make -j16 -C examples RTE_SDK=$DPDK_DIR RTE_TARGET=build O=$DPDK_INSTALL/examples
配置大页内存
仅本次生效:
$ mkdir -pv /mnt/huge$ mount -t hugetlbfs nodev /mnt/huges -o pagesize=2MB$ echo 8192 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages# Or$ sysctl -w vm.nr_hugepages=8192
重启后生效:
$ vi /etc/fstabnodev /mnt/huges hugetlbfs pagesize=2MB 0 0$ echo 'vm.nr_hugepages=8192' > /etc/sysctl.d/hugepages.conf# Or$ echo 'vm.nr_hugepages=8192' >> /etc/sysctl.conf$ sysctl -p
验证配置是否生效:
$ grep HugePages /proc/meminfoAnonHugePages: 77824 kBHugePages_Total: 8192HugePages_Free: 6656HugePages_Rsvd: 0HugePages_Surp: 0
网卡驱动绑定
查看网卡信息:
$ /opt/ovs/dpdk/usertools/dpdk-devbind.py --statusNetwork devices using DPDK-compatible driver============================================Network devices using kernel driver===================================0000:1a:00.0 'Ethernet Connection X722 for 1GbE 37d1' if=enp26s0f0 drv=i40e unused=vfio-pci 0000:1a:00.1 'Ethernet Connection X722 for 1GbE 37d1' if=enp26s0f1 drv=i40e unused=vfio-pci 0000:1a:00.2 'Ethernet Connection X722 for 1GbE 37d1' if=enp26s0f2 drv=i40e unused=vfio-pci 0000:1a:00.3 'Ethernet Connection X722 for 1GbE 37d1' if=enp26s0f3 drv=i40e unused=vfio-pci *Active*0000:5f:00.0 'Ethernet Controller X710 for 10GbE SFP+ 1572' if=enp95s0f0 drv=i40e unused=vfio-pci *Active*0000:5f:00.1 'Ethernet Controller X710 for 10GbE SFP+ 1572' if=enp95s0f1 drv=i40e unused=vfio-pci$ lspci | grep Ethernet | awk '{printf($1 " "); gsub(":","\\:"); cmd="ls /sys/bus/pci/devices/0000\\:" $1 "/driver/module/drivers/"; system(cmd)}'1a:00.0 pci:i40e1a:00.1 pci:i40e1a:00.2 pci:i40e1a:00.3 pci:i40e5f:00.0 pci:i40e5f:00.1 pci:i40e
加载网卡
UIO
驱动模块:
$ modprobe uio$ rmmod igb_uio$ insmod $DPDK_INSTALL/lib/modules/`uname -r`/extra/dpdk/igb_uio.ko
绑定
enp26s0f2
网卡UIO
驱动:
$ $DPDK_DIR/usertools/dpdk-devbind.py -u 0000:1a:00.2$ $DPDK_DIR/usertools/dpdk-devbind.py --bind=igb_uio 0000:1a:00.2
查看绑定状态:
$ $DPDK_DIR/usertools/dpdk-devbind.py --statusNetwork devices using DPDK-compatible driver============================================0000:1a:00.2 'Ethernet Connection X722 for 1GbE 37d1' drv=igb_uio unused=i40eNetwork devices using kernel driver===================================0000:1a:00.0 'Ethernet Connection X722 for 1GbE 37d1' if=enp26s0f0 drv=i40e unused=igb_uio 0000:1a:00.1 'Ethernet Connection X722 for 1GbE 37d1' if=enp26s0f1 drv=i40e unused=igb_uio 0000:1a:00.3 'Ethernet Connection X722 for 1GbE 37d1' if=enp26s0f3 drv=i40e unused=igb_uio *Active*0000:5f:00.0 'Ethernet Controller X710 for 10GbE SFP+ 1572' if=enp95s0f0 drv=i40e unused=igb_uio *Active*0000:5f:00.1 'Ethernet Controller X710 for 10GbE SFP+ 1572' if=enp95s0f1 drv=i40e unused=igb_uio
编译安装OVS
编辑环境变量脚本:
$ vi $OVS_ROOT/ovs-env-dpdk.shexport OVS_ROOT=/opt/ovsexport DPDK_DIR=$OVS_ROOT/dpdkexport DPDK_BUILD=$DPDK_DIR/buildexport DPDK_INSTALL=$DPDK_DIR/installexport DPDK_TARGET=x86_64-native-linuxapp-gccexport OVS_DIR=$OVS_ROOT/target-dpdkexport PATH=$OVS_DIR/bin/:$OVS_DIR/share/openvswitch/scripts:$PATH
编译和安装:
$ . /opt/ovs/ovs-env-dpdk.sh$ cd $OVS_ROOT/ovs && ./boot.sh$ mkdir -pv $OVS_ROOT/build-dpdk $OVS_ROOT/target-dpdk && cd $OVS_ROOT/build-dpdk$ ../ovs/configure --with-dpdk=$DPDK_BUILD --with-linux=/lib/modules/$(uname -r)/build --prefix=$OVS_ROOT/target-dpdk CFLAGS="-g -Ofast"$ make -j16 'CFLAGS=-g -Ofast -march=native' && make install$ mkdir -pv $OVS_ROOT/target-dpdk/modules && cp -vf $OVS_ROOT/build-dpdk/datapath/linux/*ko $OVS_ROOT/target-dpdk/modules/
启动停止OVS
使用命令手动启动:
$ . /opt/ovs/ovs-env-dpdk.sh && cd $OVS_DIR$ mkdir -pv $OVS_DIR/var/run/openvswitch $OVS_DIR/etc/openvswitch# 创建ovsdb-server数据库:$ $OVS_DIR/bin/ovsdb-tool create $OVS_DIR/etc/openvswitch/conf.db $OVS_DIR/share/openvswitch/vswitch.ovsschema# 启动ovsdb-server数据库服务:$ $OVS_DIR/sbin/ovsdb-server --remote=punix:$OVS_DIR/var/run/openvswitch/db.sock --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach# 初始化数据库(只在第一次运行时需要执行):$ $OVS_DIR/bin/ovs-vsctl --no-wait init# 查看依赖的内核模块:$ cat /usr/lib/modules/`uname -r`/modules.dep | awk '$1~"^extra/openvswitch"{for(i=2;i<=NF;i++) {print $i}}' | xargs echokernel/net/ipv6/netfilter/nf_nat_ipv6.ko.xz kernel/net/ipv4/netfilter/nf_nat_ipv4.ko.xz kernel/net/ipv6/netfilter/nf_defrag_ipv6.ko.xz kernel/net/netfilter/nf_nat.ko.xz kernel/net/netfilter/nf_conntrack.ko.xz kernel/net/ipv4/udp_tunnel.ko.xz kernel/lib/libcrc32c.ko.xz# 加载openvswitch.ko依赖的内核模块:$ cat /usr/lib/modules/`uname -r`/modules.dep | awk '$1~"^extra/openvswitch"{for(i=2;i<=NF;i++) {mod=gensub(/\.ko\.xz/,"",1,gensub(/.*\//,"",1,$i));cmd="modprobe " mod;system(cmd)}}'# 加载openvswitch.ko内核模块:$ insmod $OVS_DIR/modules/openvswitch.ko# 启动ovs-vswitchd守护进程:$ $OVS_DIR/sbin/ovs-vswitchd --pidfile --detach --log-file# 为ovs-vswitchd增加dpdk支持:$ ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true$ ovs-ctl --no-ovsdb-server --db-sock="$OVS_DIR/var/run/openvswitch/db.sock" start
使用命令手动停止:
$ pkill -9 ovs$ rm -rfv $OVS_DIR/var/run/openvswitch $OVS_DIR/etc/openvswitch/ $OVS_DIR/etc/openvswitch/conf.db
创建NAT网络
在节点A
上创建的NAT
网络网段为192.168.2.0/24
,网关为192.168.2.99
。
新增
OVS
网桥br-ovs
:
$ ovs-vsctl add-br br-ovs -- set bridge br-ovs datapath_type=netdev$ ip link set dev br-ovs up$ ip addr add 192.168.2.99/24 dev br-ovs$ ip addr show br-ovs21: br-ovs:mtu 1500 qdisc noqueue state UNKNOWN qlen 1000 link/ether a2:e8:a6:bb:55:46 brd ff:ff:ff:ff:ff:ff inet 192.168.2.99/24 scope global br-ovs valid_lft forever preferred_lft forever inet6 fe80::a0e8:a6ff:febb:5546/64 scope link valid_lft forever preferred_lft forever
增加
NAT
地址转换:
$ iptables -t nat -A POSTROUTING -s 192.168.2.0/24 ! -d 192.168.2.0/24 -j MASQUERADE
配置
DHCP
服务:
$ dnsmasq --strict-order --except-interface=lo --interface=br-ovs --listen-address=192.168.2.99 --bind-interfaces --dhcp-range=192.168.2.128,192.168.2.192 --conf-file="" --pid-file=/var/run/br-ovs-dhcp.pid --dhcp-leasefile=/var/run/br-ovs-dhcp.leases --dhcp-no-override
创建桥接网络
两个节点添加的物理网卡在一个交换机的相同VLAN
上。
在节点
A
的br-ovs
网桥上天剑物理网口enp26s0f2
:
$ ovs-vsctl add-port br-ovs dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:1a:00.2$ ovs-vsctl show429a3e72-c5c5-4330-9670-09492255e7e9 Bridge br-ovs Port "dpdk0" Interface "dpdk0" type: dpdk options: {dpdk-devargs="0000:1a:00.2"} Port br-ovs Interface br-ovs type: internal
在节点
B
上创建网桥br-ovs
,添加物理网卡enp26s0f2
,并设置网桥IP为192.168.2.98/24
:
$ ovs-vsctl add-br br-ovs -- set bridge br-ovs datapath_type=netdev$ ovs-vsctl set bridge br-ovs stp_enable=false$ ovs-vsctl add-port br-ovs dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:1a:00.2$ ovs-vsctl show653d6824-6977-4806-8717-fc9e27d5ea8d Bridge br-ovs Port "dpdk0" Interface "dpdk0" type: dpdk options: {dpdk-devargs="0000:1a:00.2"} Port br-ovs Interface br-ovs type: internal$ ip link set dev br-ovs up$ ip addr add 192.168.2.98/24 dev br-ovs$ ping -c 4 192.168.2.99PING 192.168.2.99 (192.168.2.99) 56(84) bytes of data.64 bytes from 192.168.2.99: icmp_seq=1 ttl=64 time=0.198 ms64 bytes from 192.168.2.99: icmp_seq=2 ttl=64 time=0.160 ms64 bytes from 192.168.2.99: icmp_seq=3 ttl=64 time=0.123 ms64 bytes from 192.168.2.99: icmp_seq=4 ttl=64 time=0.119 ms--- 192.168.2.99 ping statistics ---4 packets transmitted, 4 received, 0% packet loss, time 2999msrtt min/avg/max/mdev = 0.119/0.150/0.198/0.032 ms
正式开始测试
设初始化环境:
$ . /opt/ovs/ovs-env-dpdk.sh
使用server端口
增加
dpdkvhostuser
端口:
$ ovs-vsctl add-port br-ovs vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuser$ ovs-vsctl add-port br-ovs vhost-user2 -- set Interface vhost-user2 type=dpdkvhostuser$ ovs-vsctl show429a3e72-c5c5-4330-9670-09492255e7e9 Bridge br-ovs Port "vhost-user2" Interface "vhost-user2" type: dpdkvhostuser Port "dpdk0" Interface "dpdk0" type: dpdk options: {dpdk-devargs="0000:1a:00.2"} Port br-ovs Interface br-ovs type: internal Port "vhost-user1" Interface "vhost-user1" type: dpdkvhostuser
启动测试虚拟机1:
$ /usr/libexec/qemu-kvm -smp 2 -m 2048 -serial stdio -cdrom $OVS_ROOT/images/centos7-init.iso -hda $OVS_ROOT/images/CentOS-7-x86_64_Snapshot1.qcow2 -net none -chardev socket,id=char1,path=$OVS_DIR/var/run/openvswitch/vhost-user1 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce -device virtio-net-pci,mac=fa:16:3e:4d:58:6f,netdev=mynet1 -object memory-backend-file,id=mem,size=2048M,mem-path=/mnt/huge,share=on -numa node,memdev=mem -mem-prealloc$ ip addr show eth02: eth0:mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether fa:16:3e:4d:58:6f brd ff:ff:ff:ff:ff:ff inet 192.168.2.139/24 brd 192.168.2.255 scope global dynamic eth0 valid_lft 3540sec preferred_lft 3540sec inet6 fe80::f816:3eff:fe4d:586f/64 scope link valid_lft forever preferred_lft forever
启动测试虚拟机2:
$ /usr/libexec/qemu-kvm -smp 2 -m 2048 -serial stdio -cdrom $OVS_ROOT/images/centos7-init.iso -hda $OVS_ROOT/images/CentOS-7-x86_64_Snapshot2.qcow2 -net none -chardev socket,id=char1,path=$OVS_DIR/var/run/openvswitch/vhost-user2 -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce -device virtio-net-pci,mac=fa:16:3e:4d:58:7f,netdev=mynet1 -object memory-backend-file,id=mem,size=2048M,mem-path=/mnt/huge,share=on -numa node,memdev=mem -mem-prealloc$ ip addr show eth02: eth0:mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether fa:16:3e:4d:58:7f brd ff:ff:ff:ff:ff:ff inet 192.168.2.155/24 brd 192.168.2.255 scope global dynamic eth0 valid_lft 3589sec preferred_lft 3589sec inet6 fe80::f816:3eff:fe4d:587f/64 scope link valid_lft forever preferred_lft forever
使用client端口
增加
dpdkvhostuserclient
端口:
$ ovs-vsctl add-port br-ovs vhost-user-client1 -- set Interface vhost-user-client1 type=dpdkvhostuserclient options:vhost-server-path=$OVS_DIR/var/run/openvswitch/vhost-user-client1$ ovs-vsctl add-port br-ovs vhost-user-client2 -- set Interface vhost-user-client2 type=dpdkvhostuserclient options:vhost-server-path=$OVS_DIR/var/run/openvswitch/vhost-user-client2$ ovs-vsctl show429a3e72-c5c5-4330-9670-09492255e7e9 Bridge br-ovs Port "dpdk0" Interface "dpdk0" type: dpdk options: {dpdk-devargs="0000:1a:00.2"} Port "vhost-user2" Interface "vhost-user2" type: dpdkvhostuser Port "vhost-user-client2" Interface "vhost-user-client2" type: dpdkvhostuserclient options: {vhost-server-path="/opt/ovs/target-dpdk/var/run/openvswitch/vhost-user-client2"} Port ovs-veth-br Interface ovs-veth-br Port br-ovs Interface br-ovs type: internal Port "vhost-user-client1" Interface "vhost-user-client1" type: dpdkvhostuserclient options: {vhost-server-path="/opt/ovs/target-dpdk/var/run/openvswitch/vhost-user-client1"} Port "vhost-user1" Interface "vhost-user1" type: dpdkvhostuser
启动测试虚拟机1:
$ /usr/libexec/qemu-kvm -smp 2 -m 2048 -serial stdio -cdrom $OVS_ROOT/images/centos7-init.iso -hda $OVS_ROOT/images/CentOS-7-x86_64_Snapshot1.qcow2 -net none -chardev socket,id=char1,path=$OVS_DIR/var/run/openvswitch/vhost-user-client1,server -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce -device virtio-net-pci,mac=fa:16:3e:4d:58:6f,netdev=mynet1 -object memory-backend-file,id=mem,size=2048M,mem-path=/mnt/huge,share=on -numa node,memdev=mem -mem-prealloc$ ip addr show eth02: eth0:mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether fa:16:3e:4d:58:6f brd ff:ff:ff:ff:ff:ff inet 192.168.2.139/24 brd 192.168.2.255 scope global dynamic eth0 valid_lft 3540sec preferred_lft 3540sec inet6 fe80::f816:3eff:fe4d:586f/64 scope link valid_lft forever preferred_lft forever
启动测试虚拟机2:
$ /usr/libexec/qemu-kvm -smp 2 -m 2048 -serial stdio -cdrom $OVS_ROOT/images/centos7-init.iso -hda $OVS_ROOT/images/CentOS-7-x86_64_Snapshot2.qcow2 -net none -chardev socket,id=char1,path=$OVS_DIR/var/run/openvswitch/vhost-user-client2,server -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce -device virtio-net-pci,mac=fa:16:3e:4d:58:7f,netdev=mynet1 -object memory-backend-file,id=mem,size=2048M,mem-path=/mnt/huge,share=on -numa node,memdev=mem -mem-prealloc$ ip addr show eth02: eth0:mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether fa:16:3e:4d:58:7f brd ff:ff:ff:ff:ff:ff inet 192.168.2.155/24 brd 192.168.2.255 scope global dynamic eth0 valid_lft 3589sec preferred_lft 3589sec inet6 fe80::f816:3eff:fe4d:587f/64 scope link valid_lft forever preferred_lft forever
测试iperf性能
测试相同节点,虚拟机间
Virt-IO
网卡性能,CPU占用率分别为100%(Guest1)、100%(Guest2)和100%(ovs-vswitchd):
Guest1 $ iperf3 -s-----------------------------------------------------------Server listening on 5201-----------------------------------------------------------Accepted connection from 192.168.2.155, port 54110[ 5] local 192.168.2.139 port 5201 connected to 192.168.2.155 port 54112[ ID] Interval Transfer Bandwidth[ 5] 0.00-1.00 sec 514 MBytes 4.31 Gbits/sec [ 5] 1.00-2.00 sec 392 MBytes 3.29 Gbits/sec [ 5] 2.00-3.00 sec 781 MBytes 6.55 Gbits/sec [ 5] 3.00-4.00 sec 734 MBytes 6.16 Gbits/sec [ 5] 4.00-5.00 sec 569 MBytes 4.77 Gbits/sec [ 5] 5.00-6.00 sec 968 MBytes 8.12 Gbits/sec [ 5] 6.00-7.00 sec 688 MBytes 5.77 Gbits/sec [ 5] 7.00-8.00 sec 795 MBytes 6.67 Gbits/sec [ 5] 8.00-9.00 sec 693 MBytes 5.82 Gbits/sec [ 5] 9.00-10.00 sec 769 MBytes 6.45 Gbits/sec [ 5] 10.00-10.04 sec 37.6 MBytes 8.22 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - -[ ID] Interval Transfer Bandwidth[ 5] 0.00-10.04 sec 0.00 Bytes 0.00 bits/sec sender[ 5] 0.00-10.04 sec 6.78 GBytes 5.80 Gbits/sec receiver-----------------------------------------------------------Server listening on 5201-----------------------------------------------------------Guest2 $ iperf3 -c 192.168.2.139Connecting to host 192.168.2.139, port 5201[ 4] local 192.168.2.155 port 54112 connected to 192.168.2.139 port 5201[ ID] Interval Transfer Bandwidth Retr Cwnd[ 4] 0.00-1.00 sec 529 MBytes 4.43 Gbits/sec 20 460 KBytes [ 4] 1.00-2.00 sec 403 MBytes 3.38 Gbits/sec 29 327 KBytes [ 4] 2.00-3.00 sec 794 MBytes 6.67 Gbits/sec 732 264 KBytes [ 4] 3.00-4.00 sec 728 MBytes 6.10 Gbits/sec 228 505 KBytes [ 4] 4.00-5.00 sec 576 MBytes 4.83 Gbits/sec 230 225 KBytes [ 4] 5.00-6.00 sec 930 MBytes 7.81 Gbits/sec 451 308 KBytes [ 4] 6.00-7.00 sec 721 MBytes 6.04 Gbits/sec 171 503 KBytes [ 4] 7.00-8.00 sec 796 MBytes 6.67 Gbits/sec 159 417 KBytes [ 4] 8.00-9.00 sec 698 MBytes 5.87 Gbits/sec 322 419 KBytes [ 4] 9.00-10.00 sec 768 MBytes 6.44 Gbits/sec 273 329 KBytes - - - - - - - - - - - - - - - - - - - - - - - - -[ ID] Interval Transfer Bandwidth Retr[ 4] 0.00-10.00 sec 6.78 GBytes 5.82 Gbits/sec 2615 sender[ 4] 0.00-10.00 sec 6.78 GBytes 5.82 Gbits/sec receiveriperf Done.
测试不同节点,物理网卡与虚拟机
Virt-IO
网卡性能,CPU占用率分别为100%(Qemu),1.7%(iperf)和100%(ovs-vswitchd):
Guest1 $ iperf3 -s-----------------------------------------------------------Server listening on 5201-----------------------------------------------------------Accepted connection from 192.168.2.98, port 50942[ 5] local 192.168.2.139 port 5201 connected to 192.168.2.98 port 50944[ ID] Interval Transfer Bandwidth[ 5] 0.00-1.00 sec 62.5 MBytes 524 Mbits/sec [ 5] 1.00-2.00 sec 67.5 MBytes 566 Mbits/sec [ 5] 2.00-3.00 sec 63.0 MBytes 529 Mbits/sec [ 5] 3.00-4.00 sec 61.8 MBytes 519 Mbits/sec [ 5] 4.00-5.00 sec 61.8 MBytes 518 Mbits/sec [ 5] 5.00-6.00 sec 62.0 MBytes 520 Mbits/sec [ 5] 6.00-7.00 sec 61.9 MBytes 519 Mbits/sec [ 5] 7.00-8.00 sec 62.2 MBytes 522 Mbits/sec [ 5] 8.00-9.00 sec 63.0 MBytes 528 Mbits/sec [ 5] 9.00-10.00 sec 65.6 MBytes 550 Mbits/sec [ 5] 10.00-10.05 sec 3.62 MBytes 568 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - -[ ID] Interval Transfer Bandwidth[ 5] 0.00-10.05 sec 0.00 Bytes 0.00 bits/sec sender[ 5] 0.00-10.05 sec 635 MBytes 530 Mbits/sec receiverNode B $ iperf3 -c 192.168.2.139Connecting to host 192.168.2.139, port 5201[ 4] local 192.168.2.98 port 50944 connected to 192.168.2.139 port 5201[ ID] Interval Transfer Bandwidth Retr Cwnd[ 4] 0.00-1.00 sec 67.7 MBytes 568 Mbits/sec 45 757 KBytes [ 4] 1.00-2.00 sec 67.5 MBytes 566 Mbits/sec 0 822 KBytes [ 4] 2.00-3.00 sec 63.8 MBytes 535 Mbits/sec 0 880 KBytes [ 4] 3.00-4.00 sec 61.2 MBytes 514 Mbits/sec 0 932 KBytes [ 4] 4.00-5.00 sec 62.5 MBytes 524 Mbits/sec 0 981 KBytes [ 4] 5.00-6.00 sec 61.2 MBytes 514 Mbits/sec 0 1.01 MBytes [ 4] 6.00-7.00 sec 62.5 MBytes 524 Mbits/sec 0 1.05 MBytes [ 4] 7.00-8.00 sec 61.2 MBytes 514 Mbits/sec 0 1.09 MBytes [ 4] 8.00-9.00 sec 62.5 MBytes 524 Mbits/sec 0 1.14 MBytes [ 4] 9.00-10.00 sec 66.2 MBytes 556 Mbits/sec 0 1.18 MBytes - - - - - - - - - - - - - - - - - - - - - - - - -[ ID] Interval Transfer Bandwidth Retr[ 4] 0.00-10.00 sec 636 MBytes 534 Mbits/sec 45 sender[ 4] 0.00-10.00 sec 635 MBytes 533 Mbits/sec receiveriperf Done.
使用iperf3
默认参数测试一个节点上两个虚拟机,以及一个节点上的虚拟机与另外一个节点物理机之间(千兆网卡)的网络性能,慢速状态下,使用DPDK
可以降低约80%
的CPU
占用率,同时会提升ovs-vswitchd
守护进程的CPU
占用率至100%
(因为使用DPDK
时是查询模式)。
关于如何用使用DPDK优化VirtIO和OVS网络问题的解答就分享到这里了,希望以上内容可以对大家有一定的帮助,如果你还有很多疑惑没有解开,可以关注行业资讯频道了解更多相关知识。