一个简单的3层神经网络模型part2
一个简单的3层神经网络模型part2
背景知识要求
Python基础
Pandas库基础
Matplotlib库基础
sklearn库基础
机器学习-神经网络基础知识
摘要
上一篇文章“一个简单的3层神经网络模型part1”https://mp.****.net/mdeditor/90204383# 使用3层神经网络训练链家的房源数据,模型评估结果表明,精度和AUC等指标很差。
相比于上一篇,本文增加了特征预处理(StandardScale和MinMaxScale)、样本均衡处理。采用K-Fold交叉验证。
同样的网络模型,本文较上文的神经网络模型表现更佳。
正文
生成训练集和交叉验证集
处理流程:从csv文件读入数据,数据随机排序,判断样本是否均衡,过采样使样本均衡。使用Kfold生成5组训练集和5组测试集,对于每一组训练集使用fit_transform归一化,对于验证集使用transform进行归一化。归一化方法使用两种,分别是StandardScale和MinMaxScale。
import numpy as np
import pandas as pd
from collections import Counter
from sklearn import utils
from sklearn.utils.class_weight import compute_class_weight
from imblearn.over_sampling import RandomOverSampler
from sklearn.model_selection import KFold
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
pd.set_option('display.max_columns', 1000)
def loadDataSet():
df = pd.read_csv(r'./lianjia_processed.csv', sep=',')
# 样本随机化排序
df = utils.shuffle(df)
# is_two_five列->标签Y
Y = df["is_two_five"].values
# 标签Y 2值化
Y[[Y == 5]] = 1
Y[[Y != 1]] = 0
# 特征X赋值
df.drop("is_two_five", axis=1, inplace=True)
X = df.values
# 判断样本是否均衡
class_weight = 'balanced'
sample_type = np.array([0, 1])
weight = compute_class_weight(class_weight, sample_type, Y)
print(sorted(Counter(Y).items()))
# 不均衡,则过采样使样本均衡
ros = RandomOverSampler(random_state=0)
X_resampled, Y_resampled = ros.fit_sample(X, Y)
print(sorted(Counter(Y_resampled).items()))
return X_resampled, Y_resampled
X, Y = loadDataSet()
kFolder = KFold(n_splits=5, shuffle=False, random_state=None)
norm_type = 1 # 0:MinMaxscale 1:stdscale
csv_index = 1
for train_index, test_index in kFolder.split(X):
# print('Train: %s | test: %s' % (train_index, test_index), '\n')
X_train, X_test = X[train_index], X[test_index]
# 训练集和测试集使用MinMaxScaler预处理
if norm_type == 0:
scaler = MinMaxScaler()
elif norm_type == 1:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Y_train, Y_test = Y[train_index], Y[test_index]
# 转为DataFrame
X_train_pd = pd.DataFrame(X_train)
Y_train_pd = pd.DataFrame(Y_train)
X_test_pd = pd.DataFrame(X_test)
Y_test_pd = pd.DataFrame(Y_test)
# 保存训练集合验证集
if norm_type == 0:
X_train_pd.to_csv("./dataSet_minmax_scaled/X_train_%d.csv" % csv_index, index=False)
Y_train_pd.to_csv("./dataSet_minmax_scaled/Y_train_%d.csv" % csv_index, index=False)
X_test_pd.to_csv("./dataSet_minmax_scaled/X_valid_%d.csv" % csv_index, index=False)
Y_test_pd.to_csv("./dataSet_minmax_scaled/Y_valid_%d.csv" % csv_index, index=False)
elif norm_type == 1:
X_train_pd.to_csv("./dataSet_std_scaled/X_train_%d.csv" % csv_index, index=False)
Y_train_pd.to_csv("./dataSet_std_scaled/Y_train_%d.csv" % csv_index, index=False)
X_test_pd.to_csv("./dataSet_std_scaled/X_valid_%d.csv" % csv_index, index=False)
Y_test_pd.to_csv("./dataSet_std_scaled/Y_valid_%d.csv" % csv_index, index=False)
csv_index += 1
print("data set create succeed!")
训练模型
训练StandardScale预处理的5组训练集数据,生成五组模型参数。
训练MinMaxScale预处理的5组训练集数据,生成另外五组模型参数。
#!/usr/bin/env Python
# coding=utf-8
import time
import pickle
import pandas as pd
from Layer3NN import nn_model
# 模型超参数调整
n_h = 8
iter_num = 550000
learning_rate = 2.5
lr_decreaseRate = 0.9
norm_type = 1 # 0:MinMaxscale 1:stdscale
for i in range(5):
print('*'*30)
print("training round %d" % (i+1))
print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))
if norm_type == 0:
X_train = pd.read_csv(r'./dataSet_minmax_scaled/X_train_%d.csv' % (i+1), sep=',')
Y_train = pd.read_csv(r'./dataSet_minmax_scaled/Y_train_%d.csv' % (i+1), sep=',')
elif norm_type == 1:
X_train = pd.read_csv(r'./dataSet_std_scaled/X_train_%d.csv' % (i+1), sep=',')
Y_train = pd.read_csv(r'./dataSet_std_scaled/Y_train_%d.csv' % (i+1), sep=',')
X_tr = X_train.values
Y_tr = Y_train.values
# 训练模型
parameters = nn_model(X_tr.T,
Y_tr.T,
n_h,
num_iterations=iter_num,
print_cost=True,
learning_rate=learning_rate,
lr_decreaseRate=lr_decreaseRate)
print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))
# 保存参数
if norm_type == 0:
with open('./parameters_minmax/parameter_%d.pickle' % (i + 1), 'wb') as f:
pickle.dump(parameters, f, pickle.HIGHEST_PROTOCOL)
elif norm_type == 1:
with open('./parameters_std/parameter_%d.pickle' % (i + 1), 'wb') as f:
pickle.dump(parameters, f, pickle.HIGHEST_PROTOCOL)
pass
3层神经网络模型的代码(较上文增加了learning_rate自动递减)
参考了Andrew Ng的神经网络模型,隐藏层激励函数为tanh,输出层激励函数为sigmoid,输入层12个神经元,输出层1个神经元,超参数为隐藏层神经元数量n_h和学习率learning_rate,使用梯度下降法和反向传播计算损失函数的代价,代码如下:
#!/usr/bin/env Python
# coding=utf-8
import numpy as np
def sigmoid(x):
"""
Compute the sigmoid of x
Arguments:
x -- A scalar or numpy array of any size.
Return:
s -- sigmoid(x)
"""
s = 1/(1+np.exp(-x))
return s
def layer_sizes(X, Y):
"""
Arguments:
X -- input dataset of shape (input size, number of examples)
Y -- labels of shape (output size, number of examples)
Returns:
n_x -- the size of the input layer
n_y -- the size of the output layer
"""
n_x = X.shape[0] # size of input layer
n_y = Y.shape[0] # size of output layer
return (n_x, n_y)
def initialize_parameters(n_x, n_h, n_y):
"""
Argument:
n_x -- size of the input layer
n_h -- size of the hidden layer
n_y -- size of the output layer
Returns:
params -- python dictionary containing your parameters:
W1 -- weight matrix of shape (n_h, n_x)
b1 -- bias vector of shape (n_h, 1)
W2 -- weight matrix of shape (n_y, n_h)
b2 -- bias vector of shape (n_y, 1)
"""
np.random.seed(1) # we set up a seed so that your output matches ours although the initialization is random.
W1 = np.random.randn(n_h, n_x) * 0.01
b1 = np.zeros((n_h, 1))
W2 = np.random.randn(n_y, n_h) * 0.01
b2 = np.zeros((n_y, 1))
assert (W1.shape == (n_h, n_x))
assert (b1.shape == (n_h, 1))
assert (W2.shape == (n_y, n_h))
assert (b2.shape == (n_y, 1))
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}
return parameters
def forward_propagation(X, parameters):
"""
Argument:
X -- input data of size (n_x, m)
parameters -- python dictionary containing your parameters (output of initialization function)
Returns:
A2 -- The sigmoid output of the second activation
cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
"""
# Retrieve each parameter from the dictionary "parameters"
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
# Implement Forward Propagation to calculate A2 (probabilities)
Z1 = np.dot(W1, X)+b1
A1 = np.tanh(Z1)
Z2 = np.dot(W2, A1)+b2
A2 = sigmoid(Z2)
assert(A2.shape == (3, X.shape[1]))
cache = {"Z1": Z1,
"A1": A1,
"Z2": Z2,
"A2": A2}
return A2, cache
def compute_cost(A2, Y, parameters):
"""
Computes the cross-entropy cost given in equation (13)
Arguments:
A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
Y -- "true" labels vector of shape (1, number of examples)
parameters -- python dictionary containing your parameters W1, b1, W2 and b2
Returns:
cost -- cross-entropy cost given equation (13)
"""
m = Y.shape[1] # number of example
# Compute the cross-entropy cost
logprobs = np.multiply(np.log(A2), Y) + np.multiply(np.log(1-A2), 1-Y)
cost = - np.sum(logprobs)/m
cost = np.squeeze(cost) # makes sure cost is the dimension we expect.
# E.g., turns [[17]] into 17
assert(isinstance(cost, float))
return cost
def backward_propagation(parameters, cache, X, Y):
"""
Implement the backward propagation using the instructions above.
Arguments:
parameters -- python dictionary containing our parameters
cache -- a dictionary containing "Z1", "A1", "Z2" and "A2".
X -- input data of shape (2, number of examples)
Y -- "true" labels vector of shape (1, number of examples)
Returns:
grads -- python dictionary containing your gradients with respect to different parameters
"""
m = X.shape[1]
# First, retrieve W1 and W2 from the dictionary "parameters".
W1 = parameters["W1"]
W2 = parameters["W2"]
# Retrieve also A1 and A2 from dictionary "cache".
A1 = cache["A1"]
A2 = cache["A2"]
# Backward propagation: calculate dW1, db1, dW2, db2.
dZ2 = A2-Y
dW2 = np.dot(dZ2, A1.T)/m
db2 = np.sum(dZ2, axis=1, keepdims=True)/m
dZ1 = np.multiply(np.dot(W2.T, dZ2), 1 - np.power(A1, 2))
dW1 = np.dot(dZ1, X.T)/m
db1 = np.sum(dZ1, axis=1, keepdims=True)/m
grads = {"dW1": dW1,
"db1": db1,
"dW2": dW2,
"db2": db2}
return grads
def update_parameters(parameters, grads, learning_rate=0.005):
"""
Updates parameters using the gradient descent update rule given above
Arguments:
parameters -- python dictionary containing your parameters
grads -- python dictionary containing your gradients
Returns:
parameters -- python dictionary containing your updated parameters
"""
# Retrieve each parameter from the dictionary "parameters"
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
# Retrieve each gradient from the dictionary "grads"
dW1 = grads["dW1"]
db1 = grads["db1"]
dW2 = grads["dW2"]
db2 = grads["db2"]
# Update rule for each parameter
W1 = W1-learning_rate*dW1
b1 = b1-learning_rate*db1
W2 = W2-learning_rate*dW2
b2 = b2-learning_rate*db2
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}
return parameters
def nn_model(X, Y, n_h, num_iterations=10000, print_cost=False, learning_rate=0.0005):
"""
Arguments:
X -- dataset of shape (2, number of examples)
Y -- labels of shape (1, number of examples)
n_h -- size of the hidden layer
num_iterations -- Number of iterations in gradient descent loop
print_cost -- if True, print the cost every 1000 iterations
Returns:
parameters -- parameters learnt by the model. They can then be used to predict.
"""
np.random.seed(3)
n_x = layer_sizes(X, Y)[0]
n_y = layer_sizes(X, Y)[1]
# Initialize parameters, then retrieve W1, b1, W2, b2. Inputs: "n_x, n_h, n_y". Outputs = "W1, b1, W2, b2, parameters".
parameters = initialize_parameters(n_x, n_h, n_y)
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
# Loop (gradient descent)
for i in range(0, num_iterations):
# Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".
A2, cache = forward_propagation(X, parameters)
# Cost function. Inputs: "A2, Y, parameters". Outputs: "cost".
cost = compute_cost(A2, Y, parameters)
# Back propagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".
grads = backward_propagation(parameters, cache, X, Y)
# Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".
lr = learning_rate
if i / num_iterations in np.arange(0.1, 1, 0.1):
lr = lr_decreaseRate*lr
parameters = update_parameters(parameters, grads, learning_rate=lr)
# Print the cost every 10 iterations
if print_cost and i % 500 == 0:
print("Cost after iteration %i: %f" % (i, cost))
if math.isnan(cost):
break
return parameters
def predict(parameters, X):
"""
Using the learned parameters, predicts a class for each example in X
Arguments:
parameters -- python dictionary containing your parameters
X -- input data of size (n_x, m)
Returns
predictions -- vector of predictions of our model (red: 0 / blue: 1)
"""
# Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
A2, cache = forward_propagation(X, parameters)
# predictions = (A2 > 0.5)
return A2
模型评估代码
使用如下代码对模型进行评估:
#!/usr/bin/env Python
# coding=utf-8
import pickle
import pandas as pd
import matplotlib.pyplot as plt
from Layer3NN import predict
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, fbeta_score
from sklearn.metrics import roc_curve, auc
norm_type = 1 # 0:MinMaxscale 1:stdscale
for i in range(5):
print('*'*30)
print("predict round %d" % (i+1))
# 读入训练完的参数
if norm_type == 0:
with open('./parameters_minmax/parameter_%d.pickle' % (i + 1), 'rb') as f:
parameters = pickle.load(f)
elif norm_type == 1:
with open('./parameters_std/parameter_%d.pickle' % (i + 1), 'rb') as f:
parameters = pickle.load(f)
# 读训练集数据
if norm_type == 0:
X_train = pd.read_csv(r'./dataSet_minmax_scaled/X_train_%d.csv' % (i+1), sep=',')
Y_train = pd.read_csv(r'./dataSet_minmax_scaled/Y_train_%d.csv' % (i+1), sep=',')
elif norm_type == 1:
X_train = pd.read_csv(r'./dataSet_std_scaled/X_train_%d.csv' % (i+1), sep=',')
Y_train = pd.read_csv(r'./dataSet_std_scaled/Y_train_%d.csv' % (i+1), sep=',')
X_tr = X_train.values
Y_tr = Y_train.values
# 预测-训练集
predictTrainSet = predict(parameters, X_tr.T)
predictTrainSet = predictTrainSet.reshape(-1)
trainY = Y_tr.T.reshape(-1)
# 评价-训练集
fpr_tr, tpr_tr, threshold_tr = roc_curve(Y_tr, predictTrainSet) # 计算真正率和假正率
roc_auc_tr = auc(fpr_tr, tpr_tr) # 计算auc的值
lw = 2
plt.figure(figsize=(10, 10))
plt.plot(fpr_tr, tpr_tr, color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc_tr) # 假正率为横坐标,真正率为纵坐标做曲线
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic training example')
plt.legend(loc="lower right")
plt.show()
print('TrainSet Accuracy: %.3f' % accuracy_score(y_true=trainY, y_pred=predictTrainSet > 0.5))
print('TrainSet Precision: %.3f' % precision_score(y_true=trainY, y_pred=predictTrainSet > 0.5))
print('TrainSet Recall: %.3f' % recall_score(y_true=trainY, y_pred=predictTrainSet > 0.5))
print('TrainSet F1: %.3f' % f1_score(y_true=trainY, y_pred=predictTrainSet > 0.5))
print('TrainSet F_beta: %.3f' % fbeta_score(y_true=trainY, y_pred=predictTrainSet > 0.5, beta=0.8))
# 读交叉验证集数据
if norm_type == 0:
X_test = pd.read_csv(r'./dataSet_minmax_scaled/X_valid_%d.csv' % (i + 1), sep=',')
Y_test = pd.read_csv(r'./dataSet_minmax_scaled/Y_valid_%d.csv' % (i + 1), sep=',')
elif norm_type == 1:
X_test = pd.read_csv(r'./dataSet_std_scaled/X_valid_%d.csv' % (i + 1), sep=',')
Y_test = pd.read_csv(r'./dataSet_std_scaled/Y_valid_%d.csv' % (i + 1), sep=',')
X_t = X_test.values
Y_t = Y_test.values
# 预测-交叉验证集
predictValidSet = predict(parameters, X_t.T)
predictValidSet = predictValidSet.reshape(-1)
validY = Y_t.T.reshape(-1)
# 评价-交叉验证集
fpr_va, tpr_va, threshold_va = roc_curve(validY, predictValidSet) # 计算真正率和假正率
roc_auc_va = auc(fpr_va, tpr_va) # 计算auc的值
lw = 2
plt.figure(figsize=(10, 10))
plt.plot(fpr_va, tpr_va, color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc_va) # 假正率为横坐标,真正率为纵坐标做曲线
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic validation example')
plt.legend(loc="lower right")
plt.show()
print('ValidSet Accuracy: %.3f' % accuracy_score(y_true=validY, y_pred=predictValidSet > 0.5))
print('ValidSet Precision: %.3f' % precision_score(y_true=validY, y_pred=predictValidSet > 0.5))
print('ValidSet Recall: %.3f' % recall_score(y_true=validY, y_pred=predictValidSet > 0.5))
print('ValidSet F1: %.3f' % f1_score(y_true=validY, y_pred=predictValidSet > 0.5))
print('ValidSet F_beta: %.3f' % fbeta_score(y_true=validY, y_pred=predictValidSet > 0.5, beta=0.8))
模型评估结果-MinMaxScale数据集
评估使用MinMaxScale预处理的数据集,评估结果:
predict round 1
TrainSet Accuracy: 0.823
TrainSet Precision: 0.842
TrainSet Recall: 0.739
TrainSet F1: 0.787
TrainSet F_beta: 0.799
ValidSet Accuracy: 0.606
ValidSet Precision: 0.871
ValidSet Recall: 0.540
ValidSet F1: 0.667
ValidSet F_beta: 0.703
predict round 2
TrainSet Accuracy: 0.848
TrainSet Precision: 0.905
TrainSet Recall: 0.738
TrainSet F1: 0.813
TrainSet F_beta: 0.832
ValidSet Accuracy: 0.700
ValidSet Precision: 0.912
ValidSet Recall: 0.639
ValidSet F1: 0.752
ValidSet F_beta: 0.782
predict round 3
TrainSet Accuracy: 0.861
TrainSet Precision: 0.894
TrainSet Recall: 0.764
TrainSet F1: 0.824
TrainSet F_beta: 0.839
ValidSet Accuracy: 0.681
ValidSet Precision: 0.939
ValidSet Recall: 0.641
ValidSet F1: 0.762
ValidSet F_beta: 0.795
predict round 4
TrainSet Accuracy: 0.816
TrainSet Precision: 0.842
TrainSet Recall: 0.827
TrainSet F1: 0.834
TrainSet F_beta: 0.836
ValidSet Accuracy: 0.722
ValidSet Precision: 0.482
ValidSet Recall: 0.750
ValidSet F1: 0.587
ValidSet F_beta: 0.560
4组训练集的中best AUC结果
该训练集对应的AUC结果
模型评估结果-Standardscale数据集
评估使用StandardScale预处理的数据集,评估结果:
predict round 1
TrainSet Accuracy: 0.858
TrainSet Precision: 0.831
TrainSet Recall: 0.850
TrainSet F1: 0.840
TrainSet F_beta: 0.838
ValidSet Accuracy: 0.737
ValidSet Precision: 0.934
ValidSet Recall: 0.696
ValidSet F1: 0.798
ValidSet F_beta: 0.824
predict round 2
TrainSet Accuracy: 0.879
TrainSet Precision: 0.905
TrainSet Recall: 0.810
TrainSet F1: 0.855
TrainSet F_beta: 0.865
ValidSet Accuracy: 0.670
ValidSet Precision: 0.919
ValidSet Recall: 0.611
ValidSet F1: 0.734
ValidSet F_beta: 0.768
predict round 3
TrainSet Accuracy: 0.871
TrainSet Precision: 0.859
TrainSet Recall: 0.842
TrainSet F1: 0.850
TrainSet F_beta: 0.852
ValidSet Accuracy: 0.637
ValidSet Precision: 0.875
ValidSet Recall: 0.609
ValidSet F1: 0.718
ValidSet F_beta: 0.747
predict round 4
TrainSet Accuracy: 0.838
TrainSet Precision: 0.854
TrainSet Recall: 0.858
TrainSet F1: 0.856
TrainSet F_beta: 0.856
ValidSet Accuracy: 0.777
ValidSet Precision: 0.538
ValidSet Recall: 0.812
ValidSet F1: 0.647
ValidSet F_beta: 0.620
4组训练集的中best AUC结果
该训练集对应的中AUC结果
结论
1、同样的神经网络模型,本文比上一篇文章的指标好,主要得益于有效的特征预处理。
2、本文使用K-fold验交叉验证,比上一篇文章提高了评估结果的可信度。
3、下一篇文章将通过Sklearn自带的神经网络分类器与本文3层网络模型进行比较,来确认训练集精度<0.9,验证集训练精度<0.8的原因。
一般模型精度低一般原因如下:
- 模型存在high bias
- 模型high variance
- 数据本身的固有缺陷
- 模型代价函数收敛到了局部最优解
4、可以看出,本文使用Standardscale归一化的数据集,跑出来的效果比MinMaxScale平均来说要好一些。原因是我们的问题是只有一个hidden layer的多层感知机(MLP)的分类问题。每个hidden unit表示一个超平面,每个超平面是一个分类边界。参数w(weight)决定超平面的方向,参数b(bias)决定超平面离原点的距离。如果b是一些小的随机参数(事实上,b确实被初始化为很小的随机参数),那么所有的超平面都几乎穿过原点。所以,如果data没有中心化在原点周围,那么这个超平面可能没有穿过这些data,也就是说,这些data都在超平面的一侧。这样的话,局部极小点(local minima)很有可能出现。 所以,在这种情况下,标准化到[-1, 1]比[0, 1]更好。Standardscale比MinMaxScale要好一些,参见附表。
附表:各种归一化方法的对比如下:
参考
- Andrew Ng 机器学习。
- 分类器的不同的性能评价指标https://blog.****.net/winycg/article/details/80378847
- 数据预处理之特征标准化https://blog.****.net/lipengcn/article/details/50263927