tensorflow错误：恢复检查点文件

问题描述：

我建立了我自己的卷积神经网络，其中我跟踪所有训练的变量值的移动平均值（tensorflow 1.0）：tensorflow错误：恢复检查点文件

variable_averages = tf.train.ExponentialMovingAverage(
     0.9999, global_step) 
variables_averages_op = variable_averages.apply(tf.trainable_variables()) 
train_op = tf.group(apply_gradient_op, variables_averages_op) 
saver = tf.train.Saver(tf.global_variables(), max_to_keep=10) 
summary_op = tf.summary.merge(summaries) 
init = tf.global_variables_initializer() 
sess = tf.Session(config=tf.ConfigProto(
     allow_soft_placement=True, 
     log_device_placement=False)) 
sess.run(init) 
# start queue runners 
tf.train.start_queue_runners(sess=sess) 

summary_writer = tf.summary.FileWriter(FLAGS.train_dir, sess.graph) 

# training loop 
start_time = time.time() 
for step in range(FLAGS.max_steps): 
     _, loss_value = sess.run([train_op, loss]) 
     duration = time.time() - start_time 
     start_time = time.time() 
     assert not np.isnan(loss_value), 'Model diverged with loss = NaN' 

     if step % 1 == 0: 
      # print current model status 
      num_examples_per_step = FLAGS.batch_size * FLAGS.num_gpus 
      examples_per_sec = num_examples_per_step/duration 
      sec_per_batch = duration/FLAGS.num_gpus 
      format_str = '{} step{}, loss {}, {} examples/sec, {} sec/batch' 
      print(format_str.format(datetime.now(), step, loss_value, examples_per_sec, sec_per_batch)) 
     if step % 50 == 0: 
      summary_str = sess.run(summary_op) 
      summary_writer.add_summary(summary_str, step) 
     if step % 10 == 0 or step == FLAGS.max_steps: 
      print('save checkpoint') 
      # save checkpoint file 
      checkpoint_file = os.path.join(FLAGS.train_dir, 'model.ckpt') 
      saver.save(sess, checkpoint_file, global_step=step)

这workes罚款和检查点文件都保存（保护程序版本V2）。然后，我尝试恢复用于评估模型的其他脚本中的检查点。在那里，我有这样的一段代码

# Restore the moving average version of the learned variables for eval. 
variable_averages = tf.train.ExponentialMovingAverage(
    MOVING_AVERAGE_DECAY) 
variables_to_restore = variable_averages.variables_to_restore() 
saver = tf.train.Saver(variables_to_restore)

在那里我得到错误“NotFoundError（见上文回溯）：主要CONV 1 /变/ ExponentialMovingAverage检查点未发现”，其中CONV 1 /变量/是一个变量的作用域。

甚至在我尝试恢复变量之前，这个错误仍然存在。你能帮忙解决吗？

在此先感谢

TheJude

答

我解决它以这种方式：
呼叫tf.reset_default_graph()之前创建的图表第二ExponentialMovingAverage（...）。

# reset the graph before create a new ema 
tf.reset_default_graph() 
# Restore the moving average version of the learned variables for eval. 
variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY) 
variables_to_restore = variable_averages.variables_to_restore() 
saver = tf.train.Saver(variables_to_restore)

花了我两小时... ...

tensorflow错误：恢复检查点文件

相关推荐