1.了解不同的优化器

关于各种优化器的理解和选择问题，可以看我之前写的这篇博客：
https://blog.****.net/qq_40314507/article/details/79933289

2.书写优化器代码

我们分别用SGD,Momentum，RMSprop，Adam作为优化器，训练前面几篇博客的数据，看看效果如何。

2.1定义网络

代码如下：

class module_net(nn.Module):
    def __init__(self, num_input, num_hidden, num_output):
        super(module_net, self).__init__()
        self.layer1 = nn.Linear(num_input, num_hidden)
        
        self.layer2 = nn.ReLU()
        
        self.layer3 = nn.Linear(num_hidden, num_hidden)
        
        self.dropout3 = nn.Dropout(p=0.5)

        self.layer4 = nn.ReLU()

        self.layer5 = nn.Linear(num_hidden, num_hidden)

        self.dropout5 = nn.Dropout(p=0.5)

        self.layer6 = nn.ReLU()
        
        self.layer7 = nn.Linear(num_hidden, num_output)
        
        
    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.dropout3(x)
        x = self.layer4(x)
        x = self.layer5(x)
        x = self.dropout5(x)
        x = self.layer6(x)
        x = self.layer7(x)
        
        return x

2.2为每个优化器创建一个网络类

为了对比每一种优化器, 我们给他们各自创建一个神经网络, 但这个神经网络都来自同一个 Net 形式.

net_SGD         = module_net(8,10,1)
net_Momentum    = module_net(8,10,1)
net_RMSprop     = module_net(8,10,1)
net_Adam        = module_net(8,10,1)
nets = [net_SGD, net_Momentum, net_RMSprop, net_Adam]

2.3创建不同的优化器

接下来在创建不同的优化器, 用来训练不同的网络. 并创建一个 loss_func 用来计算误差. 我们用几种常见的优化器, SGD , Momentum , RMSprop , Adam .

opt_SGD         = torch.optim.SGD(net_SGD.parameters(), lr=LR)
opt_Momentum    = torch.optim.SGD(net_Momentum.parameters(), lr=LR, momentum=0.8)
opt_RMSprop     = torch.optim.RMSprop(net_RMSprop.parameters(), lr=LR, alpha=0.9)
opt_Adam        = torch.optim.Adam(net_Adam.parameters(), lr=LR, betas=(0.9, 0.99))

loss_func = torch.nn.MSELoss()
losses_his = [[], [], [], []]   # 记录 training 时不同神经网络的 loss
Accuracy_list = [[],[],[],[]]   # 记录 training 时不同神经网络的 准确率
optimizers = [opt_SGD, opt_Momentum, opt_RMSprop, opt_Adam]
criterion = nn.BCEWithLogitsLoss().to(device) # CrossEntropyLoss=softmax+cross entropy

2.4初始化正则项

weight_decay=0.01 # 正则化参数
for net  in nets:
    if weight_decay>0:
        reg_loss=Regularization(net, weight_decay, p=2).to(device)
    else:
        print("no regularization")

2.5训练

for e in range(100):
    for net, opt, l_his ,acc_his in zip(nets, optimizers, losses_his,Accuracy_list):
        out = net.forward(Variable(x))   #这里省略了 mo_net.forward()
        loss = criterion(out, Variable(y))
        #--------------------用于求准确率-------------------------#
        out_class=(out[:]>0).float()  #将out矩阵中大于0的转化为1，小于0的转化为0，存入a中
        right_num=torch.sum(y==out_class).float()  #分类对的数值
        precision=right_num/out.shape[0]  #准确率
        #--------------------求准确率结束-------------------------#
        opt.zero_grad()    
        loss.backward()
        opt.step()
        if (e + 1) % 1 == 0:
            l_his.append(loss.data[0])
            acc_his.append(precision)
            
        if (e + 1) % 50 == 0:
            print('epoch: {}, loss: {}，precision{},right_num{}'.format(e+1, loss.data[0],precision,right_num))

2.6画图

我这里只训练了100次，我们来看看loss曲线和acc曲线，代码如下：

x1=list(range(100))
labels_loss=["SGD_loss","Momentum_loss","RMSprop_loss","Adam_loss"]
labels_acc=["SGD_acc","Momentum_acc","RMSprop_acc","Adam_acc"]
plt.figure()
for i in range(4):
    plt.plot(x1, losses_his[i])
    plt.legend(labels =labels_loss,loc = 'upper right')
plt.close(0)
plt.figure()
for i in range(4):
    plt.plot(x1, Accuracy_list[i])
    plt.legend(labels =labels_acc,loc = 'lowver right')

结果：
【PyTorch 深度学习】6.PyTorch理解更多神经网络优化方法
我们可以看到最快的RMSprop，最慢的SGD。

2.7测试集测试

代码：

x_test_tensor=x_test_tensor.float()
y_test_tensor=y_test_tensor.float()
opt_name=['SGD','Momentum','RMSprop','Adam']
for net ,name in zip(nets,opt_name):
    out_test=net.forward(Variable(x_test_tensor)) 
    loss_test = criterion(out_test, Variable(y_test_tensor))
    out_test_class=(out_test[:]>0).float()  #将out矩阵中大于0的转化为1，小于0的转化为0，存入a中
    right_num_test=torch.sum(y_test_tensor==out_test_class).float()  #分类对的数值
    precision_test=right_num_test/out_test.shape[0]  #准确率
    loss_test=loss_test.data[0]
    print('opt:{}, loss_test:{}, precision_test:{}, right_num_test:{}'.format(name,loss_test,precision_test,right_num_test))

结果如下：
【PyTorch 深度学习】6.PyTorch理解更多神经网络优化方法
Adam效果最好。

3.测试优化器

3.1增加迭代次数

我们上面只迭代了100次，几种优化器效果都还可以，如果我们增加迭代次数呢，首先我增加到1000次，loss和acc曲线如下：
【PyTorch 深度学习】6.PyTorch理解更多神经网络优化方法
这时候RMSprop的loss值甚至到了0.3左右，但这时候测试集表现如何呢？我们来看看：

说明后面两个过拟合严重。
我们现在迭代一万次再来看看，首先是loss曲线和acc曲线:

最低也达到了0.3以下，我们来看看测试集表现：
【PyTorch 深度学习】6.PyTorch理解更多神经网络优化方法
可以看见，过拟合更加严重了。

3.2减小网络层数

我们数据量不多，但是用了好几层网络，现在我们减小网络层数后再来看看效果。
修改后的网络：

class module_net(nn.Module):
    def __init__(self, num_input, num_hidden, num_output):
        super(module_net, self).__init__()
        self.layer1 = nn.Linear(num_input, num_hidden)
        
        self.dropout1 = nn.Dropout(p=0.5)
       
        self.layer2 = nn.ReLU()
        
        self.layer3 = nn.Linear(num_hidden, num_output)
        
    def forward(self, x):
        x = self.layer1(x)
        x = self.dropout1(x)
        x = self.layer2(x)
        x = self.layer3(x)

直接迭代一万次：
【PyTorch 深度学习】6.PyTorch理解更多神经网络优化方法
可以看到SGD虽然最慢，但最终在此数据集上，它与其他三个模型都会相遇到同一水平上。
在测试集上：

loss值相差不大。

【PyTorch 深度学习】6.PyTorch理解更多神经网络优化方法