逻辑回归实现
参考<<机器学习实战>>中讲解以及上一章理论推导,我们来深入了解一下。
测试数据
机器学习实战中给出一个testSet.txt
-0.017612 14.053064 0
-1.395634 4.662541 1
-0.752157 6.538620 0
-1.322371 7.152853 0
0.423363 11.054677 0
0.406704 7.067335 1
0.667394 12.741452 0
-2.460150 6.866805 1
0.569411 9.548755 0
-0.026632 10.427743 0
每一列分别代表
数据加载:
def load_data_set():
"""
load data
:return: the mat of data
the lable of data
"""
data_mat = []
label_mat = []
fr = open('testSet.txt')
for line in fr.readlines():
line_arr = line.strip().split()
data_mat.append([1.0, float(line_arr[0]), float(line_arr[1])])
label_mat.append(int(line_arr[2]))
return data_mat, label_mat
上述数据是按
利用下面代码可视化
def plot_best_fit(weights):
"""
plot the data_set and logRegress line
:param weights:
:return:
"""
import matplotlib.pyplot as plt
data_mat, label_mat = load_data_set()
data_arr = array(data_mat)
n = shape(data_arr)[0]
xcord1 = []
ycord1 = []
xcord2 = []
ycord2 = []
for i in range(n):
if int(label_mat[i]) == 1:
xcord1.append(data_arr[i, 1]);
ycord1.append(data_arr[i, 2])
else:
xcord2.append(data_arr[i, 1]);
ycord2.append(data_arr[i, 2])
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')
ax.scatter(xcord2, ycord2, s=30, c='green')
x = arange(-3.0, 3.0, 0.1)
if weights is not None:
y = (-weights[0] - weights[1] * x) / weights[2]
ax.plot(x, y.transpose())
plt.xlabel('X1')
plt.ylabel('X2')
plt.show()
直接传None进去
plot_best_fit(None)
效果如下:
梯度下降
梯度算法的迭代公式:
上一章计算
其中
梯度上升算法的伪代码:
- 1 每个回归系数初始化为1
- 2 重复R次
- 计算整个数据集梯度
- 使用 更新回归系数的向量
- 返回回归系数
from numpy import *
def sigmoid(in_x):
return 1.0 / (1 + exp(-in_x))
def sigmoid_test(in_x):
return 1.0 / (1 + exp(-in_x))
def grad_ascent(data_mat_in, class_labels):
data_matrix = mat(data_mat_in) # convert to NumPy matrix
label_mat = mat(class_labels).transpose() # convert to NumPy matrix
m, n = shape(data_matrix)
alpha = 0.001
max_cycles = 500
weights = ones((n, 1))
for k in range(max_cycles): # heavy on matrix operations
h = sigmoid(data_matrix * weights) # matrix mult
error = (label_mat - h) # vector subtraction
weights = weights + alpha * data_matrix.transpose() * error # matrix mult
return weights
data_arr, label_mat = load_data_set()
weights = grad_ascent(data_arr, label_mat)
print(weights)
plot_best_fit(weights)
** 随机梯度上升算法**
梯度下降算法在每次更新权值向量的时候都需要遍历整个数据集,该方法对小数据集尚可。但如果有数十亿样本和成千上万的特征时,它的计算复杂度就太高了。一种改进的方法是一次仅用一个样本点的回归误差来更新权值向量,这个方法叫随机梯度下降算法。由于可以在遇到新样本的时候再对分类器进行增量式更新,所以随机梯度上升算法是一个在线学习算法;与此对应,一次处理完所有数据的算法(如梯度上升算法)被称作“批处理”。
def stoc_grad_ascent0(data_matrix, class_labels):
m, n = shape(data_matrix)
data_matrix = array(data_matrix)
alpha = 0.01
weights = ones(n) # initialize to all ones
for i in range(m):
h = sigmoid(sum(data_matrix[i] * weights))
error = class_labels[i] - h
weights = weights + alpha * error * data_matrix[i]
return weights
if __name__ == "__main__":
data_arr, label_mat = load_data_set()
weights = stoc_grad_ascent0(data_arr, label_mat)
print(weights)
plot_best_fit(weights)
可以看到,最终拟合出来的直线效果并不如梯度上升算法,大约错了1/3的样本。
不过这种比较并不公平,毕竟随机梯度上升算法每次迭代的复杂度小得多,而且也只迭代了样本个数(200)次。
改进的随机梯度上升算法
既然随机梯度上升算法最终给出的参数不好,那是否仅仅是因为参数没有足够收敛,而算法本质是优秀的呢?对此,可以逐步减小步长,避免参数周期性的抖动。
def stoc_grad_ascent1(data_matrix, class_labels, num_iter=150):
m, n = shape(data_matrix)
data_matrix =array(data_matrix)
weights = ones(n) # initialize to all ones
for j in range(num_iter):
data_index = list(range(m))
for i in range(m):
alpha = 4 / (1.0 + j + i) + 0.0001 # apha decreases with iteration, does not
rand_index = int(random.uniform(0, len(data_index))) # go to 0 because of the constant
h = sigmoid(sum(data_matrix[rand_index] * weights))
error = class_labels[rand_index] - h
weights = weights + alpha * error * data_matrix[rand_index]
del (data_index[rand_index])
return weights