张量流记忆问题

问题描述：

我想用Tensorflow建立一个高斯RBM模型。但该程序将使用太多的内存。张量流记忆问题

gaussian_rbm.py

import tensorflow as tf 
import math 
import input_data 
import numpy as np 

def sample_prob(probs): 
    return tf.nn.relu(
     tf.sign(
      probs - tf.random_uniform(tf.shape(probs)))) 

class RBM(object): 
    """ represents a sigmoidal rbm """ 

    def __init__(self, name, input_size, output_size, gaussian_std_val=0.1): 
     with tf.name_scope("rbm_" + name): 
      self.weights = tf.Variable(
       tf.truncated_normal([input_size, output_size], 
        stddev=1.0/math.sqrt(float(input_size))), name="weights") 
      self.v_bias = tf.Variable(tf.zeros([input_size]), name="v_bias") 
      self.h_bias = tf.Variable(tf.zeros([output_size]), name="h_bias") 
      self.input = tf.placeholder("float", shape=[None, 784]) 

      #Gaussian 
      def_a = 1/(np.sqrt(2)*gaussian_std_val) 
      def_a = tf.constant(def_a, dtype=tf.float32) 
      self.a = tf.Variable(tf.ones(shape=[input_size]) * def_a, 
            name="a") 


    def propup(self, visible): 
     """ P(h|v) """ 
     return tf.nn.sigmoid(tf.matmul(visible, self.weights) + self.h_bias) 

    def propdown(self, hidden): 
     """ P(v|h) """ 
     # return tf.nn.sigmoid(tf.matmul(hidden, tf.transpose(self.weights)) + self.v_bias) 
     return (tf.matmul(hidden, tf.transpose(self.weights)) + self.v_bias)/(2 * (self.a * self.a)) 

    def sample_h_given_v(self, v_sample): 
     """ Generate a sample from the hidden layer """ 
     return sample_prob(self.propup(v_sample)) 

    def sample_v_given_h(self, h_sample): 
     """ Generate a sample from the visible layer """ 
     return self.sample_gaussian(self.propdown(h_sample)) 

    def gibbs_hvh(self, h0_sample): 
     """ A gibbs step starting from the hidden layer """ 
     v_sample = self.sample_v_given_h(h0_sample) 
     h_sample = self.sample_h_given_v(v_sample) 
     return [v_sample, h_sample] 

    def gibbs_vhv(self, v0_sample): 
     """ A gibbs step starting from the visible layer """ 
     h_sample = self.sample_h_given_v(v0_sample) 
     v_sample = self.sample_v_given_h(h_sample) 
     return [h_sample, v_sample] 

    def sample_gaussian(self, mean_field): 
     return tf.random_normal(shape=tf.shape(mean_field), 
           mean=mean_field, 
           stddev=1.0/(np.sqrt(2) * self.a)) 

    def cd1(self, learning_rate=0.1): 
     " One step of contrastive divergence, with Rao-Blackwellization " 
     h_start = self.sample_h_given_v(self.input) 
     v_end = self.sample_v_given_h(h_start) 
     h_end = self.sample_h_given_v(v_end) 
     w_positive_grad = tf.matmul(tf.transpose(self.input), h_start) 
     w_negative_grad = tf.matmul(tf.transpose(v_end), h_end) 

     update_w = self.weights + (learning_rate * (w_positive_grad - w_negative_grad)/tf.to_float(tf.shape(self.input)[0])) 

     update_vb = self.v_bias + (learning_rate * tf.reduce_mean(self.input - v_end, 0)) 

     update_hb = self.h_bias + (learning_rate * tf.reduce_mean(h_start - h_end, 0)) 

     return [update_w, update_vb, update_hb] 

    def cal_err(self): 
     err = self.input - self.gibbs_vhv(self.input)[1] 
     return tf.reduce_mean(err * err)

test_mnist.py

import tensorflow as tf 
import input_data 
from gaussian_RBM import RBM 

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) 
trX, trY, teX, teY = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labels 

rbm_modle = RBM(name="gaussian_rbm", input_size=784, output_size=1000) 

sess = tf.Session() 
init_op = tf.initialize_all_variables() 
sess.run(init_op) 

for i in range(100): 
    print "step: %s"%i 
    for start, end in zip(range(0, len(trX), 128), range(128, len(trX), 128)): 

     rbm_modle.weights, rbm_modle.v_bias, rbm_modle.h_bias = \ 
      sess.run(rbm_modle.cd1(), feed_dict={rbm_modle.input : trX[start : end]}) 

     if start % 1280 == 0: 
      print sess.run(rbm_modle.cal_err(), feed_dict={rbm_modle.input : teX})

输出是

运行test_mnist.py提取MNIST_data /列车图像 - 的idx3-ubyte.gz 提取MNIST_data/train-labels-idx1-ubyte.gz提取 MNIST_data/t10k-images-idx3-ubyte.g ž提取 MNIST_data/T10K-标签-IDX1-ubyte.gz我 tensorflow/stream_executor/CUDA/cuda_gpu_executor.cc：从SYSFS读900]成功 NUMA节点具有负值（-1），但必须至少是一个NUMA节点，所以返回NUMA节点零点 tensorflow/core/common_runtime/gpu/gpu_init.cc：102] Found device 0 with properties：name：GeForce GTX 560 major：2 minor：1 memory_lockRate（GHz）1.62 pciBusID 0000：01：00.0总内存： 1018.69MiB空闲内存：916.73MiB I tensorflow/core/common_runtime/gpu/gpu_init.cc：126] DMA：0 I tensorflow/core/common_runtime/gpu/gpu_init.cc：136] 0：YI tensorflow/core/common_runtime/gpu/gpu_device.cc：684]忽略GPU 设备（设备：0，名称：GeForce GTX 560，PCI总线ID：0000：01：00.0）与Cuda计算能力2.1。最低要求的Cuda能力为。步骤：0 0.0911714 0.0781856 0.0773076 0.0770751 0.0776582 0.0764748 0.0755164 0.0741131 0.0726497 0.0712237 0.0701839 0.0686315 0.0664856 0.0658309 0.0646239 0.0626652 0.0616178 0.0610061 0.0598332 0.0588843 0.0587477 0.0572056 0.0561556 0.0554848死亡

有没有一些方法来监视内存？有人可以帮我吗？

答

可以的监察GPU显存的命令nvidia-smi

它看起来像你的GPU不支持tensorflow运行所需CUDA的更高版本。你可以检查CUDA-Enabled GeForce Products

从你的输出看来，tensorflow足够聪明，不会使用GPU，所以无论你的模型/批量大小对于你的RAM来说都太大，或者你有内存泄漏。

尝试使用log_device_placement = True运行正在运行的会话，以便在运行'top'来监视内存时查看tensorflow正在一步一步做的事情吗？

with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:

感谢您的帮助。我正在尝试用cpu-only tensorflow来运行此代码。但是我的程序仍然会中途死亡。 – MilkKnight

答

答案似乎是正确的，（计算能力不足，以运行最新版本CUDA/Tensorflow的

然而，最低限度的要求似乎是“计算Capabilites = 3.0”，因为我的GTX_770M能够运行Tensorflow 1.0/8.0 CUDA（见下文）

和/或试图从源头tensorflow重新编译，并包含生成过程中的2.0的目标（这是建议默认3.5-5.5）

有一个好的一天！！

+-----------------------------------------------------------------------------+ 
| NVIDIA-SMI 375.51     Driver Version: 375.51     | 
|-------------------------------+----------------------+----------------------+ 
| GPU Name  Persistence-M| Bus-Id  Disp.A | Volatile Uncorr. ECC | 
| Fan Temp Perf Pwr:Usage/Cap|   Memory-Usage | GPU-Util Compute M. | 
|===============================+======================+======================| 
| 0 GeForce GTX 770M Off | 0000:01:00.0  N/A |     N/A | 
|100% 48C P0 N/A/N/A | 2819MiB/3017MiB |  N/A  Default | 
+-------------------------------+----------------------+----------------------+ 

+-----------------------------------------------------------------------------+ 
| Processes:              GPU Memory | 
| GPU  PID Type Process name        Usage  | 
|=============================================================================| 
| 0     Not Supported           | 
+-----------------------------------------------------------------------------+

答

训练循环可能存在问题，导致计算机内存不足。

为了您的每次循环，你在呼唤：

sess.run(rbm_modle.cd1(), feed_dict={rbm_modle.input : trX[start : end]})

这里面rbm_modle.cd1()功能，要创建若干新的业务，如tf.matmul()，所以每次通话时间rbm_modle.cd1()你将创造新的业务，这将导致在每次迭代后使用的内存增加。

您应该定义循环前的所有操作，然后在运行sess.run()的操作期间不创建新操作。

答

确保有通过使你的图表训练

tf.get_default_graph().finalize()

之前只读TensorFlow会每次都会尝试添加一个新的节点抛出一个异常没有内存泄漏。

相关推荐