千家信息网

ubuntu16.04 部署GPU环境

发表于:2024-11-30 作者:千家信息网编辑
千家信息网最后更新 2024年11月30日,参考文档https://blog.csdn.net/nwpushuai/article/details/79935740https://blog.csdn.net/qq_43030766/articl
千家信息网最后更新 2024年11月30日ubuntu16.04 部署GPU环境

参考文档

https://blog.csdn.net/nwpushuai/article/details/79935740
https://blog.csdn.net/qq_43030766/article/details/91513501
https://blog.csdn.net/zhqh200/article/details/77646497
https://www.cnblogs.com/zixuan-L/p/11023051.html
https://blog.csdn.net/huangfei711/article/details/79230446
https://www.cnblogs.com/yjlch2016/p/8641910.html

硬件环境

CPU   I7-7700,8M,3.6GHZ,4核内存  DDR4  16G硬盘  SSD 500G系统  Ubuntu 16.04 Desktop版(需要用到图像界面)显卡  NVDIA  GeForce GTX1050Ti  4G

系统环境

1.双网卡绑定

root@mec03:~# cat /etc/modules# /etc/modules: kernel modules to load at boot time.## This file contains the names of kernel modules that should be loaded# at boot time, one per line. Lines beginning with "#" are ignored.bonding mode=0 miimon=100root@mec03:/etc/network# cat /etc/network/interfacesauto bond0iface bond0 inet staticaddress 172.30.10.249netmask 255.255.255.0gateway 172.30.10.254post-up ifenslave bond0 enp2s0 enp3s0pre-down ifenslave -d bond0 enp2s0 enp3s0开机启动放在rc.local里面root@mec03:/etc/network# modprobe bonding 关闭网卡管理会与bonding冲突root@mec03:/etc/network# systemctl disable network-manager.service

2.设置apt-list源

root@mec03:~# cat /etc/apt/sources.listdeb http://mirrors.163.com/ubuntu/ xenial main restricted universe multiversedeb http://mirrors.163.com/ubuntu/ xenial-security main restricted universe multiversedeb http://mirrors.163.com/ubuntu/ xenial-updates main restricted universe multiversedeb http://mirrors.163.com/ubuntu/ xenial-proposed main restricted universe multiversedeb http://mirrors.163.com/ubuntu/ xenial-backports main restricted universe multiversedeb-src http://mirrors.163.com/ubuntu/ xenial main restricted universe multiversedeb-src http://mirrors.163.com/ubuntu/ xenial-security main restricted universe multiversedeb-src http://mirrors.163.com/ubuntu/ xenial-updates main restricted universe multiversedeb-src http://mirrors.163.com/ubuntu/ xenial-proposed main restricted universe multiversedeb-src http://mirrors.163.com/ubuntu/ xenial-backports main restricted universe multiverse

3.默认语言设置

root@mec03:~# cat /etc/default/locale #  File generated by update-locale# LANG="zh_CN.UTF-8"# LANGUAGE="zh_CN:zh"LANG="en_US.UTF-8"LANGUAGE="en_US:en"

二、安装Nvidia GTX 1050TI驱动

1.禁用系统默认自带nvidia驱动

root@mec03:~# lsmod | grep nouveaunouveau              1724416  1mxm_wmi                16384  1 nouveauwmi                    24576  2 mxm_wmi,nouveaui2c_algo_bit           16384  1 nouveauttm                   106496  1 nouveaudrm_kms_helper        172032  1 nouveaudrm                   401408  4 drm_kms_helper,ttm,nouveauvideo                  45056  1 nouveau

2.禁用模块

root@mec03:~# vim /etc/modprobe.d/blacklist.conf 在文件末尾添加如下几行:blacklist vga16fb blacklist nouveau blacklist rivafb blacklist rivatv blacklist nvidiafb

3.更新内核

root@mec03:~#  update-initramfs -uupdate-initramfs: Generating /boot/initrd.img-4.15.0-45-generic

4.重启

root@mec03:~#  reboot

5.上传cudnn_cudn.zip包

root@mec03:~#  rzroot@mec03:~# lscudnn_cuda  cudnn_cuda.ziproot@mec03:~# cd cudnn_cuda/root@mec03:~/cudnn_cuda# lscuda_10.0.130.1_linux.run                libcudnn7-dev_7.6.3.30-1+cuda10.0_amd64.debcuda_10.0.130_410.48_linux.run           libcudnn7-doc_7.6.3.30-1+cuda10.0_amd64.deblibcudnn7_7.6.3.30-1+cuda10.0_amd64.deb  NVIDIA-Linux-x86_64-435.21.run

6.安装驱动

root@mec03:~/cudnn_cuda# systemctl stop lightdm.service root@mec03:~/cudnn_cuda# sh NVIDIA-Linux-x86_64-435.21.runVerifying archive integrity... OKUncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 435.21........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................root@mec03:~/cudnn_cuda# lsmod | grep nvinvidia_drm             45056  0nvidia_modeset       1118208  1 nvidia_drmnvidia              19472384  1 nvidia_modesetdrm_kms_helper        172032  1 nvidia_drmdrm                   401408  3 drm_kms_helper,nvidia_drmipmi_msghandler        53248  2 ipmi_devintf,nvidia

三.安装cuda 10.1

root@mec03:~/cudnn_cuda# sh cuda_10.0.130_410.48_linux.runDo you accept the previously read EULA?accept/decline/quit: acceptInstall NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?(y)es/(n)o/(q)uit: nInstall the CUDA 10.0 Toolkit?(y)es/(n)o/(q)uit: yEnter Toolkit Location [ default is /usr/local/cuda-10.0 ]: Do you want to install a symbolic link at /usr/local/cuda?(y)es/(n)o/(q)uit: yInstall the CUDA 10.0 Samples?(y)es/(n)o/(q)uit: yEnter CUDA Samples Location [ default is /root ]: Installing the CUDA Toolkit in /usr/local/cuda-10.0 ...Installing the CUDA Toolkit in /usr/local/cuda-10.0 ...Missing recommended library: libGLU.soMissing recommended library: libX11.soMissing recommended library: libXi.soMissing recommended library: libXmu.soInstalling the CUDA Samples in /root ...Copying samples to /root/NVIDIA_CUDA-10.0_Samples now...Finished copying samples.============ Summary ============Driver:   Not SelectedToolkit:  Installed in /usr/local/cuda-10.0Samples:  Installed in /root, but missing recommended librariesPlease make sure that -   PATH includes /usr/local/cuda-10.0/bin -   LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as rootTo uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.0/binPlease see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA.***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 10.0 functionality to work.To install the driver using this installer, run the following command, replacing  with the name of this run file:    sudo .run -silent -driverLogfile is /tmp/cuda_install_9752.logroot@mec03:~/cudnn_cuda# vim /etc/ld.so.confroot@mec03:~/cudnn_cuda# ldconfigroot@mec03:~# cat /etc/profileexport PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}export CUDA_HOME=/usr/local/cudaroot@mec03:~# nvcc --versionnvcc: NVIDIA (R) Cuda compiler driverCopyright (c) 2005-2018 NVIDIA CorporationBuilt on Sat_Aug_25_21:08:01_CDT_2018Cuda compilation tools, release 10.0, V10.0.130

四.安装cuDNN 7.6

root@mec03:~/cudnn_cuda# dpkg -i libcudnn7_7.6.3.30-1+cuda10.0_amd64.deb Selecting previously unselected package libcudnn7.(Reading database ... 184057 files and directories currently installed.)Preparing to unpack libcudnn7_7.6.3.30-1+cuda10.0_amd64.deb ...Unpacking libcudnn7 (7.6.3.30-1+cuda10.0) ...Setting up libcudnn7 (7.6.3.30-1+cuda10.0) ...Processing triggers for libc-bin (2.23-0ubuntu11) ...root@mec03:~/cudnn_cuda# dpkg -i libcudnn7-dev_7.6.3.30-1+cuda10.0_amd64.deb Selecting previously unselected package libcudnn7-dev.(Reading database ... 184063 files and directories currently installed.)Preparing to unpack libcudnn7-dev_7.6.3.30-1+cuda10.0_amd64.deb ...Unpacking libcudnn7-dev (7.6.3.30-1+cuda10.0) ...Setting up libcudnn7-dev (7.6.3.30-1+cuda10.0) ...update-alternatives: using /usr/include/x86_64-linux-gnu/cudnn_v7.h to provide /usr/include/cudnn.h (libcudnn) in auto moderoot@mec03:~/cudnn_cuda# dpkg -i libcudnn7-doc_7.6.3.30-1+cuda10.0_amd64.deb Selecting previously unselected package libcudnn7-doc.(Reading database ... 184069 files and directories currently installed.)Preparing to unpack libcudnn7-doc_7.6.3.30-1+cuda10.0_amd64.deb ...Unpacking libcudnn7-doc (7.6.3.30-1+cuda10.0) ...Setting up libcudnn7-doc (7.6.3.30-1+cuda10.0) ...root@mec03:~/cudnn_cuda#  cp /usr/include/cudnn.h /usr/local/cuda/includeroot@mec03:~/cudnn_cuda# cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2#define CUDNN_MAJOR 7#define CUDNN_MINOR 6#define CUDNN_PATCHLEVEL 3--#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)#include "driver_types.h"

五.测试GPU效果

1.安装python3.6

root@mec03:~#  add-apt-repository ppa:jonathonf/python-3.6 A plain backport of *just* Python 3.6. System extensions/Python libraries may or may not work.Don't remove Python 3.5 from your system - it will break. More info: https://launchpad.net/~jonathonf/+archive/ubuntu/python-3.6Press [ENTER] to continue or ctrl-c to cancel adding itgpg: keyring `/tmp/tmpec5st1dk/secring.gpg' createdgpg: keyring `/tmp/tmpec5st1dk/pubring.gpg' createdgpg: requesting key F06FC659 from hkp server keyserver.ubuntu.comgpg: /tmp/tmpec5st1dk/trustdb.gpg: trustdb createdgpg: key F06FC659: public key "Launchpad PPA for J Fernyhough" importedgpg: Total number processed: 1gpg:               imported: 1  (RSA: 1)OKroot@mec03:~#  update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.5 1update-alternatives: using /usr/bin/python3.5 to provide /usr/bin/python3 (python3) in auto moderoot@mec03:~# update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.6 2update-alternatives: using /usr/bin/python3.6 to provide /usr/bin/python3 (python3) in auto moderoot@mec03:~# update-alternatives --install /usr/bin/python python /usr/bin/python2 100update-alternatives: using /usr/bin/python2 to provide /usr/bin/python (python) in auto moderoot@mec03:~# update-alternatives --install /usr/bin/python python /usr/bin/python3 150update-alternatives: using /usr/bin/python3 to provide /usr/bin/python (python) in auto moderoot@mec03:~# python3Python 3.6.8 (default, May  7 2019, 14:58:50) [GCC 5.4.0 20160609] on linuxType "help", "copyright", "credits" or "license" for more information.>>> 

2.安装pip3

root@mec03:~# apt install  python3-pip

3.安装tensorflow

root@mec03:~# pip3 install tensorflow-gpu==1.13.1 -i https://pypi.tuna.tsinghua.edu.cn/simpleCollecting tensorflow-gpu==1.13.1

4.测试gpu
测试python语句

import numpy
import tensorflow as tf
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(c))

root@mec03:~# python3Python 3.6.8 (default, May  7 2019, 14:58:50) [GCC 5.4.0 20160609] on linuxType "help", "copyright", "credits" or "license" for more information.>>> import numpyement=True))print(sess.run(c))>>> import tensorflow as tf/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.  _np_qint8 = np.dtype([("qint8", np.int8, 1)])/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.  _np_qint16 = np.dtype([("qint16", np.int16, 1)])/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.  _np_qint32 = np.dtype([("qint32", np.int32, 1)])/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.  np_resource = np.dtype([("resource", np.ubyte, 1)])>>> a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')>>> b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')>>> c = tf.matmul(a, b)>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))2019-09-14 12:27:18.309361: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA2019-09-14 12:27:18.360212: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero2019-09-14 12:27:18.360498: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3bb3a20 executing computations on platform CUDA. Devices:2019-09-14 12:27:18.360512: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce GTX 1050 Ti, Compute Capability 6.12019-09-14 12:27:18.379184: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz2019-09-14 12:27:18.380446: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3ccb2f0 executing computations on platform Host. Devices:2019-09-14 12:27:18.380503: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): , 2019-09-14 12:27:18.380792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392pciBusID: 0000:01:00.0totalMemory: 3.94GiB freeMemory: 3.66GiB2019-09-14 12:27:18.380852: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 02019-09-14 12:27:18.382037: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:2019-09-14 12:27:18.382075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 2019-09-14 12:27:18.382090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 2019-09-14 12:27:18.382242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3452 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)Device mapping:/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.12019-09-14 12:27:18.384493: I tensorflow/core/common_runtime/direct_session.cc:317] Device mapping:/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1>>> print(sess.run(c))MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:02019-09-14 12:27:20.118473: I tensorflow/core/common_runtime/placer.cc:1059] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0a: (Const): /job:localhost/replica:0/task:0/device:GPU:02019-09-14 12:27:20.118492: I tensorflow/core/common_runtime/placer.cc:1059] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0b: (Const): /job:localhost/replica:0/task:0/device:GPU:02019-09-14 12:27:20.118502: I tensorflow/core/common_runtime/placer.cc:1059] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0[[22. 28.] [49. 64.]]>>> 

5.查看GPU使用情况

root@mec03:~# nvidia-smi Fri Sep  6 19:42:42 2019    +-----------------------------------------------------------------------------+| Processes:                                                       GPU Memory ||  GPU       PID   Type   Process name                             Usage      ||=============================================================================||    0      9558      C   python3                                     3865MiB ||    0     12510      G   /usr/lib/xorg/Xorg                            39MiB ||    0     12608      G   gnome-shell                                   38MiB |+-----------------------------------------------------------------------------+Fri Sep  6 00:22:27 2019       +-----------------------------------------------------------------------------+| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     ||-------------------------------+----------------------+----------------------+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC || Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. ||===============================+======================+======================||   0  GeForce GTX 105...  Off  | 00000000:01:00.0  On |                  N/A || 31%   62C    P0    N/A /  80W |   3955MiB /  4038MiB |     97%      Default |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes:                                                       GPU Memory ||  GPU       PID   Type   Process name                             Usage      ||=============================================================================||    0      9558      C   python3                                     3865MiB ||    0     12510      G   /usr/lib/xorg/Xorg                            39MiB ||    0     12608      G   gnome-shell                                   38MiB |+-----------------------------------------------------------------------------+
0