一个简单的3层神经网络模型part2

背景知识要求
摘要
正文

生成训练集和交叉验证集
训练模型
3层神经网络模型的代码（较上文增加了learning_rate自动递减）
模型评估代码
模型评估结果-MinMaxScale数据集
模型评估结果-Standardscale数据集

结论
附表：各种归一化方法的对比如下：
参考

背景知识要求

Python基础
Pandas库基础
Matplotlib库基础
sklearn库基础
机器学习-神经网络基础知识

摘要

上一篇文章“一个简单的3层神经网络模型part1”https://mp.****.net/mdeditor/90204383# 使用3层神经网络训练链家的房源数据，模型评估结果表明，精度和AUC等指标很差。
相比于上一篇，本文增加了特征预处理（StandardScale和MinMaxScale）、样本均衡处理。采用K-Fold交叉验证。
同样的网络模型，本文较上文的神经网络模型表现更佳。

正文

生成训练集和交叉验证集

处理流程：从csv文件读入数据，数据随机排序，判断样本是否均衡，过采样使样本均衡。使用Kfold生成5组训练集和5组测试集，对于每一组训练集使用fit_transform归一化，对于验证集使用transform进行归一化。归一化方法使用两种，分别是StandardScale和MinMaxScale。

import numpy as np
import pandas as pd
from collections import Counter
from sklearn import utils
from sklearn.utils.class_weight import compute_class_weight
from imblearn.over_sampling import RandomOverSampler
from sklearn.model_selection import KFold
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler

pd.set_option('display.max_columns', 1000)


def loadDataSet():
    df = pd.read_csv(r'./lianjia_processed.csv', sep=',')

    # 样本随机化排序
    df = utils.shuffle(df)

    # is_two_five列->标签Y
    Y = df["is_two_five"].values

    # 标签Y 2值化
    Y[[Y == 5]] = 1
    Y[[Y != 1]] = 0

    # 特征X赋值
    df.drop("is_two_five", axis=1, inplace=True)
    X = df.values

    # 判断样本是否均衡
    class_weight = 'balanced'
    sample_type = np.array([0, 1])
    weight = compute_class_weight(class_weight, sample_type, Y)
    print(sorted(Counter(Y).items()))

    # 不均衡，则过采样使样本均衡
    ros = RandomOverSampler(random_state=0)
    X_resampled, Y_resampled = ros.fit_sample(X, Y)
    print(sorted(Counter(Y_resampled).items()))

    return X_resampled, Y_resampled


X, Y = loadDataSet()


kFolder = KFold(n_splits=5, shuffle=False, random_state=None)
norm_type = 1 # 0：MinMaxscale 1：stdscale
csv_index = 1
for train_index, test_index in kFolder.split(X):
    # print('Train: %s | test: %s' % (train_index, test_index), '\n')
    X_train, X_test = X[train_index], X[test_index]

    # 训练集和测试集使用MinMaxScaler预处理
    if norm_type == 0:
        scaler = MinMaxScaler()
    elif norm_type == 1:
        scaler = StandardScaler()

    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)
    Y_train, Y_test = Y[train_index], Y[test_index]

    # 转为DataFrame
    X_train_pd = pd.DataFrame(X_train)
    Y_train_pd = pd.DataFrame(Y_train)
    X_test_pd = pd.DataFrame(X_test)
    Y_test_pd = pd.DataFrame(Y_test)

    # 保存训练集合验证集
    if norm_type == 0:
        X_train_pd.to_csv("./dataSet_minmax_scaled/X_train_%d.csv" % csv_index, index=False)
        Y_train_pd.to_csv("./dataSet_minmax_scaled/Y_train_%d.csv" % csv_index, index=False)
        X_test_pd.to_csv("./dataSet_minmax_scaled/X_valid_%d.csv" % csv_index, index=False)
        Y_test_pd.to_csv("./dataSet_minmax_scaled/Y_valid_%d.csv" % csv_index, index=False)
    elif norm_type == 1:
        X_train_pd.to_csv("./dataSet_std_scaled/X_train_%d.csv" % csv_index, index=False)
        Y_train_pd.to_csv("./dataSet_std_scaled/Y_train_%d.csv" % csv_index, index=False)
        X_test_pd.to_csv("./dataSet_std_scaled/X_valid_%d.csv" % csv_index, index=False)
        Y_test_pd.to_csv("./dataSet_std_scaled/Y_valid_%d.csv" % csv_index, index=False)
    csv_index += 1

print("data set create succeed!")

训练模型

训练StandardScale预处理的5组训练集数据，生成五组模型参数。
训练MinMaxScale预处理的5组训练集数据，生成另外五组模型参数。

#!/usr/bin/env Python
# coding=utf-8
import time
import pickle
import pandas as pd
from Layer3NN import nn_model


# 模型超参数调整
n_h = 8
iter_num = 550000
learning_rate = 2.5
lr_decreaseRate = 0.9

norm_type = 1 # 0：MinMaxscale 1：stdscale
for i in range(5):
    print('*'*30)
    print("training round %d" % (i+1))
    print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))
    if norm_type == 0:
        X_train = pd.read_csv(r'./dataSet_minmax_scaled/X_train_%d.csv' % (i+1), sep=',')
        Y_train = pd.read_csv(r'./dataSet_minmax_scaled/Y_train_%d.csv' % (i+1), sep=',')
    elif norm_type == 1:
        X_train = pd.read_csv(r'./dataSet_std_scaled/X_train_%d.csv' % (i+1), sep=',')
        Y_train = pd.read_csv(r'./dataSet_std_scaled/Y_train_%d.csv' % (i+1), sep=',')

    X_tr = X_train.values
    Y_tr = Y_train.values

    # 训练模型
    parameters = nn_model(X_tr.T,
                          Y_tr.T,
                          n_h,
                          num_iterations=iter_num,
                          print_cost=True,
                          learning_rate=learning_rate,
                          lr_decreaseRate=lr_decreaseRate)

    print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))

    # 保存参数
    if norm_type == 0:
        with open('./parameters_minmax/parameter_%d.pickle' % (i + 1), 'wb') as f:
            pickle.dump(parameters, f, pickle.HIGHEST_PROTOCOL)
    elif norm_type == 1:
        with open('./parameters_std/parameter_%d.pickle' % (i + 1), 'wb') as f:
            pickle.dump(parameters, f, pickle.HIGHEST_PROTOCOL)
pass

3层神经网络模型的代码（较上文增加了learning_rate自动递减）

参考了Andrew Ng的神经网络模型,隐藏层激励函数为tanh，输出层激励函数为sigmoid，输入层12个神经元，输出层1个神经元，超参数为隐藏层神经元数量n_h和学习率learning_rate，使用梯度下降法和反向传播计算损失函数的代价，代码如下：

#!/usr/bin/env Python
# coding=utf-8
import numpy as np


def sigmoid(x):
    """
    Compute the sigmoid of x

    Arguments:
    x -- A scalar or numpy array of any size.

    Return:
    s -- sigmoid(x)
    """
    s = 1/(1+np.exp(-x))
    return s


def layer_sizes(X, Y):
    """
    Arguments:
    X -- input dataset of shape (input size, number of examples)
    Y -- labels of shape (output size, number of examples)
    
    Returns:
    n_x -- the size of the input layer
    n_y -- the size of the output layer
    """
    n_x = X.shape[0]  # size of input layer
    n_y = Y.shape[0]  # size of output layer
    return (n_x, n_y)


def initialize_parameters(n_x, n_h, n_y):
    """
    Argument:
    n_x -- size of the input layer
    n_h -- size of the hidden layer
    n_y -- size of the output layer

    Returns:
    params -- python dictionary containing your parameters:
                    W1 -- weight matrix of shape (n_h, n_x)
                    b1 -- bias vector of shape (n_h, 1)
                    W2 -- weight matrix of shape (n_y, n_h)
                    b2 -- bias vector of shape (n_y, 1)
    """

    np.random.seed(1)  # we set up a seed so that your output matches ours although the initialization is random.

    W1 = np.random.randn(n_h, n_x) * 0.01
    b1 = np.zeros((n_h, 1))
    W2 = np.random.randn(n_y, n_h) * 0.01
    b2 = np.zeros((n_y, 1))

    assert (W1.shape == (n_h, n_x))
    assert (b1.shape == (n_h, 1))
    assert (W2.shape == (n_y, n_h))
    assert (b2.shape == (n_y, 1))

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}

    return parameters


def forward_propagation(X, parameters):
    """
    Argument:
    X -- input data of size (n_x, m)
    parameters -- python dictionary containing your parameters (output of initialization function)

    Returns:
    A2 -- The sigmoid output of the second activation
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
    """
    # Retrieve each parameter from the dictionary "parameters"
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    # Implement Forward Propagation to calculate A2 (probabilities)
    Z1 = np.dot(W1, X)+b1
    A1 = np.tanh(Z1)
    Z2 = np.dot(W2, A1)+b2
    A2 = sigmoid(Z2)

    assert(A2.shape == (3, X.shape[1]))

    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}

    return A2, cache


def compute_cost(A2, Y, parameters):
    """
    Computes the cross-entropy cost given in equation (13)

    Arguments:
    A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)
    parameters -- python dictionary containing your parameters W1, b1, W2 and b2

    Returns:
    cost -- cross-entropy cost given equation (13)
    """

    m = Y.shape[1]  # number of example

    # Compute the cross-entropy cost
    logprobs = np.multiply(np.log(A2), Y) + np.multiply(np.log(1-A2), 1-Y)
    cost = - np.sum(logprobs)/m

    cost = np.squeeze(cost)     # makes sure cost is the dimension we expect.
                                # E.g., turns [[17]] into 17
    assert(isinstance(cost, float))

    return cost


def backward_propagation(parameters, cache, X, Y):
    """
    Implement the backward propagation using the instructions above.

    Arguments:
    parameters -- python dictionary containing our parameters
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2".
    X -- input data of shape (2, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)

    Returns:
    grads -- python dictionary containing your gradients with respect to different parameters
    """
    m = X.shape[1]

    # First, retrieve W1 and W2 from the dictionary "parameters".
    W1 = parameters["W1"]
    W2 = parameters["W2"]

    # Retrieve also A1 and A2 from dictionary "cache".
    A1 = cache["A1"]
    A2 = cache["A2"]

    # Backward propagation: calculate dW1, db1, dW2, db2.
    dZ2 = A2-Y
    dW2 = np.dot(dZ2, A1.T)/m
    db2 = np.sum(dZ2, axis=1, keepdims=True)/m
    dZ1 = np.multiply(np.dot(W2.T, dZ2), 1 - np.power(A1, 2))
    dW1 = np.dot(dZ1, X.T)/m
    db1 = np.sum(dZ1, axis=1, keepdims=True)/m

    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}

    return grads


def update_parameters(parameters, grads, learning_rate=0.005):
    """
    Updates parameters using the gradient descent update rule given above

    Arguments:
    parameters -- python dictionary containing your parameters
    grads -- python dictionary containing your gradients

    Returns:
    parameters -- python dictionary containing your updated parameters
    """
    # Retrieve each parameter from the dictionary "parameters"
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]

    # Retrieve each gradient from the dictionary "grads"
    dW1 = grads["dW1"]
    db1 = grads["db1"]
    dW2 = grads["dW2"]
    db2 = grads["db2"]

    # Update rule for each parameter
    W1 = W1-learning_rate*dW1
    b1 = b1-learning_rate*db1
    W2 = W2-learning_rate*dW2
    b2 = b2-learning_rate*db2

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}

    return parameters


def nn_model(X, Y, n_h, num_iterations=10000, print_cost=False, learning_rate=0.0005):
    """
    Arguments:
    X -- dataset of shape (2, number of examples)
    Y -- labels of shape (1, number of examples)
    n_h -- size of the hidden layer
    num_iterations -- Number of iterations in gradient descent loop
    print_cost -- if True, print the cost every 1000 iterations

    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """

    np.random.seed(3)
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[1]

    # Initialize parameters, then retrieve W1, b1, W2, b2. Inputs: "n_x, n_h, n_y". Outputs = "W1, b1, W2, b2, parameters".
    parameters = initialize_parameters(n_x, n_h, n_y)
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]

    # Loop (gradient descent)

    for i in range(0, num_iterations):

        # Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".
        A2, cache = forward_propagation(X, parameters)

        # Cost function. Inputs: "A2, Y, parameters". Outputs: "cost".
        cost = compute_cost(A2, Y, parameters)

        # Back propagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".
        grads = backward_propagation(parameters, cache, X, Y)

        # Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".
        lr = learning_rate
        if i / num_iterations in np.arange(0.1, 1, 0.1):
            lr = lr_decreaseRate*lr

        parameters = update_parameters(parameters, grads, learning_rate=lr)

        # Print the cost every 10 iterations
        if print_cost and i % 500 == 0:
            print("Cost after iteration %i: %f" % (i, cost))

        if math.isnan(cost):
            break

    return parameters


def predict(parameters, X):
    """
    Using the learned parameters, predicts a class for each example in X

    Arguments:
    parameters -- python dictionary containing your parameters
    X -- input data of size (n_x, m)

    Returns
    predictions -- vector of predictions of our model (red: 0 / blue: 1)
    """

    # Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
    A2, cache = forward_propagation(X, parameters)
    # predictions = (A2 > 0.5)
    return A2

模型评估代码

使用如下代码对模型进行评估：

#!/usr/bin/env Python
# coding=utf-8
import pickle
import pandas as pd
import matplotlib.pyplot as plt
from Layer3NN import predict
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, fbeta_score
from sklearn.metrics import roc_curve, auc


norm_type = 1 # 0：MinMaxscale 1：stdscale
for i in range(5):

    print('*'*30)
    print("predict round %d" % (i+1))

    # 读入训练完的参数
    if norm_type == 0:
        with open('./parameters_minmax/parameter_%d.pickle' % (i + 1), 'rb') as f:
            parameters = pickle.load(f)
    elif norm_type == 1:
        with open('./parameters_std/parameter_%d.pickle' % (i + 1), 'rb') as f:
            parameters = pickle.load(f)

    # 读训练集数据
    if norm_type == 0:
        X_train = pd.read_csv(r'./dataSet_minmax_scaled/X_train_%d.csv' % (i+1), sep=',')
        Y_train = pd.read_csv(r'./dataSet_minmax_scaled/Y_train_%d.csv' % (i+1), sep=',')
    elif norm_type == 1:
        X_train = pd.read_csv(r'./dataSet_std_scaled/X_train_%d.csv' % (i+1), sep=',')
        Y_train = pd.read_csv(r'./dataSet_std_scaled/Y_train_%d.csv' % (i+1), sep=',')

    X_tr = X_train.values
    Y_tr = Y_train.values

    # 预测-训练集
    predictTrainSet = predict(parameters, X_tr.T)
    predictTrainSet = predictTrainSet.reshape(-1)
    trainY = Y_tr.T.reshape(-1)

    # 评价-训练集
    fpr_tr, tpr_tr, threshold_tr = roc_curve(Y_tr, predictTrainSet)  # 计算真正率和假正率
    roc_auc_tr = auc(fpr_tr, tpr_tr)  # 计算auc的值
    lw = 2
    plt.figure(figsize=(10, 10))
    plt.plot(fpr_tr, tpr_tr, color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc_tr)  # 假正率为横坐标，真正率为纵坐标做曲线
    plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver operating characteristic training example')
    plt.legend(loc="lower right")
    plt.show()
    print('TrainSet Accuracy: %.3f' % accuracy_score(y_true=trainY, y_pred=predictTrainSet > 0.5))
    print('TrainSet Precision: %.3f' % precision_score(y_true=trainY, y_pred=predictTrainSet > 0.5))
    print('TrainSet Recall: %.3f' % recall_score(y_true=trainY, y_pred=predictTrainSet > 0.5))
    print('TrainSet F1: %.3f' % f1_score(y_true=trainY, y_pred=predictTrainSet > 0.5))
    print('TrainSet F_beta: %.3f' % fbeta_score(y_true=trainY, y_pred=predictTrainSet > 0.5, beta=0.8))

    # 读交叉验证集数据
    if norm_type == 0:
        X_test = pd.read_csv(r'./dataSet_minmax_scaled/X_valid_%d.csv' % (i + 1), sep=',')
        Y_test = pd.read_csv(r'./dataSet_minmax_scaled/Y_valid_%d.csv' % (i + 1), sep=',')
    elif norm_type == 1:
        X_test = pd.read_csv(r'./dataSet_std_scaled/X_valid_%d.csv' % (i + 1), sep=',')
        Y_test = pd.read_csv(r'./dataSet_std_scaled/Y_valid_%d.csv' % (i + 1), sep=',')

    X_t = X_test.values
    Y_t = Y_test.values

    # 预测-交叉验证集
    predictValidSet = predict(parameters, X_t.T)
    predictValidSet = predictValidSet.reshape(-1)
    validY = Y_t.T.reshape(-1)

    # 评价-交叉验证集
    fpr_va, tpr_va, threshold_va = roc_curve(validY, predictValidSet)  # 计算真正率和假正率
    roc_auc_va = auc(fpr_va, tpr_va)  # 计算auc的值
    lw = 2
    plt.figure(figsize=(10, 10))
    plt.plot(fpr_va, tpr_va, color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc_va)  # 假正率为横坐标，真正率为纵坐标做曲线
    plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver operating characteristic validation example')
    plt.legend(loc="lower right")
    plt.show()
    print('ValidSet Accuracy: %.3f' % accuracy_score(y_true=validY, y_pred=predictValidSet > 0.5))
    print('ValidSet Precision: %.3f' % precision_score(y_true=validY, y_pred=predictValidSet > 0.5))
    print('ValidSet Recall: %.3f' % recall_score(y_true=validY, y_pred=predictValidSet > 0.5))
    print('ValidSet F1: %.3f' % f1_score(y_true=validY, y_pred=predictValidSet > 0.5))
    print('ValidSet F_beta: %.3f' % fbeta_score(y_true=validY, y_pred=predictValidSet > 0.5, beta=0.8))

模型评估结果-MinMaxScale数据集

评估使用MinMaxScale预处理的数据集，评估结果：

predict round 1
TrainSet Accuracy: 0.823
TrainSet Precision: 0.842
TrainSet Recall: 0.739
TrainSet F1: 0.787
TrainSet F_beta: 0.799
ValidSet Accuracy: 0.606
ValidSet Precision: 0.871
ValidSet Recall: 0.540
ValidSet F1: 0.667
ValidSet F_beta: 0.703

predict round 2
TrainSet Accuracy: 0.848
TrainSet Precision: 0.905
TrainSet Recall: 0.738
TrainSet F1: 0.813
TrainSet F_beta: 0.832
ValidSet Accuracy: 0.700
ValidSet Precision: 0.912
ValidSet Recall: 0.639
ValidSet F1: 0.752
ValidSet F_beta: 0.782

predict round 3
TrainSet Accuracy: 0.861
TrainSet Precision: 0.894
TrainSet Recall: 0.764
TrainSet F1: 0.824
TrainSet F_beta: 0.839
ValidSet Accuracy: 0.681
ValidSet Precision: 0.939
ValidSet Recall: 0.641
ValidSet F1: 0.762
ValidSet F_beta: 0.795

predict round 4
TrainSet Accuracy: 0.816
TrainSet Precision: 0.842
TrainSet Recall: 0.827
TrainSet F1: 0.834
TrainSet F_beta: 0.836
ValidSet Accuracy: 0.722
ValidSet Precision: 0.482
ValidSet Recall: 0.750
ValidSet F1: 0.587
ValidSet F_beta: 0.560

4组训练集的中best AUC结果
一个简单的3层神经网络模型part2
该训练集对应的AUC结果

模型评估结果-Standardscale数据集

评估使用StandardScale预处理的数据集，评估结果：

predict round 1
TrainSet Accuracy: 0.858
TrainSet Precision: 0.831
TrainSet Recall: 0.850
TrainSet F1: 0.840
TrainSet F_beta: 0.838
ValidSet Accuracy: 0.737
ValidSet Precision: 0.934
ValidSet Recall: 0.696
ValidSet F1: 0.798
ValidSet F_beta: 0.824

predict round 2
TrainSet Accuracy: 0.879
TrainSet Precision: 0.905
TrainSet Recall: 0.810
TrainSet F1: 0.855
TrainSet F_beta: 0.865
ValidSet Accuracy: 0.670
ValidSet Precision: 0.919
ValidSet Recall: 0.611
ValidSet F1: 0.734
ValidSet F_beta: 0.768

predict round 3
TrainSet Accuracy: 0.871
TrainSet Precision: 0.859
TrainSet Recall: 0.842
TrainSet F1: 0.850
TrainSet F_beta: 0.852
ValidSet Accuracy: 0.637
ValidSet Precision: 0.875
ValidSet Recall: 0.609
ValidSet F1: 0.718
ValidSet F_beta: 0.747

predict round 4
TrainSet Accuracy: 0.838
TrainSet Precision: 0.854
TrainSet Recall: 0.858
TrainSet F1: 0.856
TrainSet F_beta: 0.856
ValidSet Accuracy: 0.777
ValidSet Precision: 0.538
ValidSet Recall: 0.812
ValidSet F1: 0.647
ValidSet F_beta: 0.620

4组训练集的中best AUC结果
一个简单的3层神经网络模型part2
该训练集对应的中AUC结果

结论

1、同样的神经网络模型，本文比上一篇文章的指标好，主要得益于有效的特征预处理。
2、本文使用K-fold验交叉验证，比上一篇文章提高了评估结果的可信度。
3、下一篇文章将通过Sklearn自带的神经网络分类器与本文3层网络模型进行比较，来确认训练集精度<0.9,验证集训练精度<0.8的原因。
一般模型精度低一般原因如下：

模型存在high bias
模型high variance
数据本身的固有缺陷
模型代价函数收敛到了局部最优解

4、可以看出，本文使用Standardscale归一化的数据集，跑出来的效果比MinMaxScale平均来说要好一些。原因是我们的问题是只有一个hidden layer的多层感知机（MLP）的分类问题。每个hidden unit表示一个超平面，每个超平面是一个分类边界。参数w（weight）决定超平面的方向，参数b（bias）决定超平面离原点的距离。如果b是一些小的随机参数（事实上，b确实被初始化为很小的随机参数），那么所有的超平面都几乎穿过原点。所以，如果data没有中心化在原点周围，那么这个超平面可能没有穿过这些data，也就是说，这些data都在超平面的一侧。这样的话，局部极小点（local minima）很有可能出现。所以，在这种情况下，标准化到[-1, 1]比[0, 1]更好。Standardscale比MinMaxScale要好一些，参见附表。

附表：各种归一化方法的对比如下：

一个简单的3层神经网络模型part2

参考

Andrew Ng 机器学习。
分类器的不同的性能评价指标https://blog.****.net/winycg/article/details/80378847
数据预处理之特征标准化https://blog.****.net/lipengcn/article/details/50263927

一个简单的3层神经网络模型part2

一个简单的3层神经网络模型part2

背景知识要求

摘要

正文

生成训练集和交叉验证集

训练模型

3层神经网络模型的代码（较上文增加了learning_rate自动递减）

模型评估代码

模型评估结果-MinMaxScale数据集

模型评估结果-Standardscale数据集

结论

附表：各种归一化方法的对比如下：

参考

相关推荐