Win10+GTX1050Ti+tensorflow-gpu1.8+CUDA9.0+Cudnn7.4.1环境搭建

前言

tensorflow CPU 版本实在不是一般的慢，前段时间捣鼓了几天，跳了很多坑终于成功地安装了GPU版本，速度蹭蹭地快了。自己安装也参考了很多博客，现整理一下环境的搭建过程。

一.本机配置

Win10
i5-8300H
GTX1050Ti

二.安装准备

Anaconda3 （安装时将路径添加到环境变量）
Anaconda官网
 清华镜像
VS2015社区版
官网下载免费社区版
CUDA9.0（在官网找到自己的显卡支持的版本）
各种版本的CUDA
cuDNN
下载CUDA对应的版本号cuDNN
(没有资源的话在官网下需要注册登录一啪啦的。。。而且国外网站下载炒鸡慢）

三.安装

1.Anaconda3 - 64位

先安装仅支持 CPU 的 TensorFlow：
这里是cpu版本，参考一下
再安装GPU版本
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --upgrade tensorflow-gpu==1.8
简单测试下TensorFlow是否能用：
```
 import tensorflow as tf
```

2. 安装VS2015

安装选项仅选择2015 更新3及 c++ 库（我安装时选了其他的选项，安装过程很慢，反复取消又修复了几次终于安好了）

3.安装cuda_9.0.176_win10.exe

安装 cuda9.0 前确保安装好 VS2015。
双击下载好的安装文件，选择提取目录（该目录为解压临时目录，可以自定义）
软件会进行兼容性检查，报告我的机器不兼容，忽略警告
注意事项：安装时选择[自定义]，取消[Visual Stiudio Integration]选项，如下图。
Win10+GTX1050Ti+tensorflow-gpu1.8+CUDA9.0+Cudnn7.4.1环境搭建
先别急着点下一步，把在解压临时目录这个CUDAVisualStudioIntegration文件夹拷到其他位置（桌面）保存，但是如果你把NVIDIA安装程序关闭了，那么这个默认文件就会消失的。
此文件下有以下文件：
Win10+GTX1050Ti+tensorflow-gpu1.8+CUDA9.0+Cudnn7.4.1环境搭建
点击下一步，这次会安装完毕，只是显示被去掉勾的那一项没被安装。
此时，将"CUDAVisualStudioIntegration\extras\visual_studio_integration\MSBuildExtensions"下的所有文件直接拷贝到“C:\Program Files(x86)\MSBuild\Microsoft.Cpp\v4.0\V140\BuildCustomizations”文件夹中就可以！

4.安装CUDA9.0补丁，依次安装

cuda_9.0.176.1_windows.exe 、cuda_9.0.176.2_windows.exe 、 cuda_9.0.176.3_windows.exe

5.解压cudnn-9.0-windows10-x64-v7.4.1.5 .zip文件，如下：

将bin、include、lib中的文件分别复制到CUDA的安装路径：.\NVIDIA GPU Computing Toolkit\CUDA\v9.0\下的bin,include,lib中。
检查下环境变量中是否有如下路径，没有的话根据CUDA安装路径添加进去：
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\include

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\extras\CUPTI\libx64

四.验证CUDA安装是否成功

1.打开命令提示符，执行：nvcc -V

Win10+GTX1050Ti+tensorflow-gpu1.8+CUDA9.0+Cudnn7.4.1环境搭建
显示了CUDA的版本号

2.利用VS2015编译测试文件

打开C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0，找到对应VS版本的sample，本例中为Samples_vs2015.sln，双击打开：
选择Release，X64
右键1_Utilities，点击build(build)
成功编译出现图片下方文字：成功5个…
Win10+GTX1050Ti+tensorflow-gpu1.8+CUDA9.0+Cudnn7.4.1环境搭建
至此，“C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0\bin\win64\Release”文件夹中会出现我们需要的deviceQuery和bandwidthTest。如下图：

3.验证deviceQuery 和 bandwidthTest

打开cmd：定位到 C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0\bin\win64\Release目录，分别输入deviceQuery，bandwidthTest并运行，出现如下类似信息便说明CUDA安装成功。
Win10+GTX1050Ti+tensorflow-gpu1.8+CUDA9.0+Cudnn7.4.1环境搭建

五.简单测试

至此，环境搭建成功！！！>_<
不知道为什么我的spyder不能用，所以在Pycharm上跑的代码。
跑之前设置一下解释器，找到anaconda的tensorflow环境下的python
Win10+GTX1050Ti+tensorflow-gpu1.8+CUDA9.0+Cudnn7.4.1环境搭建
在网上找了一个测试案例：

from datetime import datetime
import math
import time
import tensorflow as tf
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
batch_size = 32
num_batches = 100
# 该函数用来显示网络每一层的结构，展示tensor的尺寸

def print_activations(t):
    print(t.op.name, ' ', t.get_shape().as_list())

# with tf.name_scope('conv1') as scope  # 可以将scope之内的variable自动命名为conv1/xxx，便于区分不同组件

def inference(images):
    parameters = []
    # 第一个卷积层
    with tf.name_scope('conv1') as scope:
        # 卷积核、截断正态分布
        kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 64],
                                                 dtype=tf.float32, stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(images, kernel, [1, 4, 4, 1], padding='SAME')
        # 可训练
        biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32), trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv1 = tf.nn.relu(bias, name=scope)
        print_activations(conv1)
        parameters += [kernel, biases]
        # 再加LRN和最大池化层，除了AlexNet，基本放弃了LRN，说是效果不明显，还会减速？
        lrn1 = tf.nn.lrn(conv1, 4, bias=1.0, alpha=0.001 / 9, beta=0.75, name='lrn1')
        pool1 = tf.nn.max_pool(lrn1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool1')
        print_activations(pool1)
    # 第二个卷积层，只有部分参数不同
    with tf.name_scope('conv2') as scope:
        kernel = tf.Variable(tf.truncated_normal([5, 5, 64, 192], dtype=tf.float32, stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(pool1, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[192], dtype=tf.float32), trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv2 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        print_activations(conv2)
        # 稍微处理一下
        lrn2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9, beta=0.75, name='lrn2')
        pool2 = tf.nn.max_pool(lrn2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool2')
        print_activations(pool2)
    # 第三个
    with tf.name_scope('conv3') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 192, 384], dtype=tf.float32, stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(pool2, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[384], dtype=tf.float32), trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv3 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        print_activations(conv3)
    # 第四层
    with tf.name_scope('conv4') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 384, 256], dtype=tf.float32, stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv4 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        print_activations(conv4)
    # 第五个
    with tf.name_scope('conv5') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256], dtype=tf.float32, stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv5 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        print_activations(conv5)
        # 之后还有最大化池层
        pool5 = tf.nn.max_pool(conv5, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool5')
        print_activations(pool5)
        return pool5, parameters
# 全连接层
# 评估每轮计算时间，第一个输入是tf得Session，第二个是运算算子，第三个是测试名称
# 头几轮有显存加载，cache命中等问题，可以考虑只计算第10次以后的
def time_tensorflow_run(session, target, info_string):
    num_steps_burn_in = 10
    total_duration = 0.0
    total_duration_squared = 0.0
    # 进行num_batches+num_steps_burn_in次迭代
    # 用time.time()记录时间，热身过后，开始显示时间
    for i in range(num_batches + num_steps_burn_in):
        start_time = time.time()
        _ = session.run(target)
        duration = time.time() - start_time
        if i >= num_steps_burn_in:
            if not i % 10:
                print('%s:step %d, duration = %.3f' % (datetime.now(), i - num_steps_burn_in, duration))
            total_duration += duration
            total_duration_squared += duration * duration
        # 计算每轮迭代品均耗时和标准差sd
        mn = total_duration / num_batches
        vr = total_duration_squared / num_batches - mn * mn
        sd = math.sqrt(vr)
        print('%s: %s across %d steps, %.3f +/- %.3f sec / batch' % (datetime.now(), info_string, num_batches, mn, sd))
def run_benchmark():
    # 首先定义默认的Graph
    with tf.Graph().as_default():
        # 并不实用ImageNet训练，知识随机计算耗时
        image_size = 224
        images = tf.Variable(tf.random_normal([batch_size, image_size, image_size, 3], dtype=tf.float32, stddev=1e-1))
        pool5, parameters = inference(images)
        init = tf.global_variables_initializer()
        sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=False))
        sess.run(init)
        # 下面直接用pool5传入训练（没有全连接层）
        # 只是做做样子，并不是真的计算
        time_tensorflow_run(sess, pool5, "Forward")
        # 瞎弄的，伪装
        objective = tf.nn.l2_loss(pool5)
        grad = tf.gradients(objective, parameters)
        time_tensorflow_run(sess, grad, "Forward-backward")
run_benchmark()

使用CPU训练的结果
Win10+GTX1050Ti+tensorflow-gpu1.8+CUDA9.0+Cudnn7.4.1环境搭建
注释一下这两行代码，使用GPU

#os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
#os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

使用GPU训练的结果
Win10+GTX1050Ti+tensorflow-gpu1.8+CUDA9.0+Cudnn7.4.1环境搭建
可见，速度是质变呀！！

Win10+GTX1050Ti+tensorflow-gpu1.8+CUDA9.0+Cudnn7.4.1环境搭建

Win10+GTX1050Ti+tensorflow-gpu1.8+CUDA9.0+Cudnn7.4.1环境搭建

前言

一.本机配置

二.安装准备

三.安装

1.Anaconda3 - 64位

2. 安装VS2015

3.安装cuda_9.0.176_win10.exe

4.安装CUDA9.0补丁，依次安装

5.解压cudnn-9.0-windows10-x64-v7.4.1.5 .zip文件，如下：

四.验证CUDA安装是否成功

1.打开命令提示符，执行：nvcc -V

2.利用VS2015编译测试文件

3.验证deviceQuery 和 bandwidthTest

五.简单测试

参考博客

相关推荐