吴恩达机器学习练习2:Regularized logistic regression
本小节主要练习正则化logistic分类。
1、原始数据的可视化
data = load('ex2data2.txt');%将数据导入data中
X = data(:, [1, 2]); y = data(:, 3);%读取data的第1、2列为输入X,第3列为输出y(y=0或y=1)
plotData(X, y);%调用plotData函数绘图
hold on;
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')
legend('y = 1', 'y = 0')
hold off;
其中补充plotData函数为:
function plotData(X, y)
figure; hold on;
pos = find(y==1);%返回向量y中数值为1的位置,pos也为向量
neg = find(y==0);%返回向量y中数值为0的位置,neg也为向量
%绘制y==1的点,使用红+表示
plot(X(pos,1),X(pos,2),'r+','LineWidth',2,'MarkerSize',7);
%绘制y==0的点,使用蓝o表示
plot(X(neg,1),X(neg,2),'bo','LineWidth',2,'MarkerSize',7);
hold off;
end
其数据可视化图形为:
2、扩充特征矢量
原始数据仅有两个特征矢量x1和x2,为更好拟合数据,扩充特征矢量至28维,扩充为:
X = mapFeature(X(:,1), X(:,2));
function out = mapFeature(X1, X2)
degree = 6;
out = ones(size(X1(:,1)));%out为全为1的列向量
for i = 1:degree
for j = 0:i
out(:, end+1) = (X1.^(i-j)).*(X2.^j);%增加out矢量列数据
end
end
end
得到的out矩阵为:
设置其初始化参数:
% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);
% Set regularization parameter lambda to 1
lambda = 1;
3、代价函数和梯度
正则化后的代价函数为:
为防止过拟合,对于特征参数的高次项系数增加惩罚,即代价函数增加theta^2,但无需对theta0(matlab编程中theta0即为theta(1))进行惩罚。
其梯度计算结果为(分为j=0和j≥1两种情况):
% Compute and display initial cost and gradient for regularized logistic
% regression
[cost, grad] = costFunctionReg(initial_theta, X, y, lambda);
完善costFunctionReg函数:
function [J, grad] = costFunctionReg(theta, X, y, lambda)
m = length(y); % number of training examples
J = 0;
grad = zeros(size(theta));
J = ((log(sigmoid(X*theta)))'*y + (log(1-sigmoid(X*theta)))'*(1-y))/(-m)+...
(theta'*theta - theta(1)*theta(1))*lambda/(2*m);
grad = (sigmoid(X*theta)-y)'*X/m +theta'*lambda/m;
grad(1) = (sigmoid(X*theta)-y)'*X(:,1)/m;
end
grad采用矩阵进行运算,由于梯度在计算时grad(1)计算方式不同,则只需重新计算grad(1)并刷新数据即可。
验证代码的正确性,输出当初始theta全为0的代价值和梯度的前5项,计算出来的结果与期望值一致。
fprintf('Cost at initial theta (zeros): %f\n', cost);
fprintf('Expected cost (approx): 0.693\n');
fprintf('Gradient at initial theta (zeros) - first five values only:\n');
fprintf(' %f \n', grad(1:5));
fprintf('Expected gradients (approx) - first five values only:\n');
fprintf(' 0.0085\n 0.0188\n 0.0001\n 0.0503\n 0.0115\n');
其输出结果为:
Cost at initial theta (zeros): 0.693147
Expected cost (approx): 0.693
Gradient at initial theta (zeros) - first five values only:
0.008475
0.018788
0.000078
0.050345
0.011501
Expected gradients (approx) - first five values only:
0.0085
0.0188
0.0001
0.0503
0.0115
将theta全部置为1,lambda置为10,重新计算:
% Compute and display cost and gradient
% with all-ones theta and lambda = 10
test_theta = ones(size(X,2),1);
[cost, grad] = costFunctionReg(test_theta, X, y, 10);
fprintf('\nCost at test theta (with lambda = 10): %f\n', cost);
fprintf('Expected cost (approx): 3.16\n');
fprintf('Gradient at test theta - first five values only:\n');
fprintf(' %f \n', grad(1:5));
fprintf('Expected gradients (approx) - first five values only:\n');
fprintf(' 0.3460\n 0.1614\n 0.1948\n 0.2269\n 0.0922\n');
其输出结果与期望一致,保证代码为正确,其输出结果为:
Cost at test theta (with lambda = 10): 3.164509
Expected cost (approx): 3.16
Gradient at test theta - first five values only:
0.346045
0.161352
0.194796
0.226863
0.092186
Expected gradients (approx) - first five values only:
0.3460
0.1614
0.1948
0.2269
0.0922
4、使用fminunc函数进行优化计算
% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);
% Set regularization parameter lambda to 1 (you should vary this)
lambda = 1;
% Set Options
options = optimset('GradObj', 'on', 'MaxIter', 400);
% Optimize
[theta, J, exit_flag] = ...
fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);
% Plot Boundary
plotDecisionBoundary(theta, X, y);
hold on;
title(sprintf('lambda = %g', lambda))
% Labels and Legend
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')
legend('y = 1', 'y = 0', 'Decision boundary')
% Compute accuracy on our training set
p = predict(theta, X);
fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);
fprintf('Expected accuracy (with lambda = 1): 83.1 (approx)\n');
(1)设置lambda = 0,数据训练的准确度为87.3%,其决策边界为:
lambda = 0 Train Accuracy: 87.288136
(2)设置lambda = 1,数据训练准确度为83%,其决策边界为:
lambda = 1 Train Accuracy: 83.050847
(3)设置lambda = 10,数据训练准确度为74.6%,其决策边界为:
lambda = 10 Train Accuracy: 74.576271
(4)设置lambda = 50,数据训练的准确度为66.95%,其决策边界为:
lambda = 50 Train Accuracy: 66.949153
(5)设置lambda = 100,数据训练的准确度为61%,其决策边界为:
lambda = 100 Train Accuracy: 61.016949
综上,lambda = 0即未对theta进行惩罚,其为过拟合状态,theta = 10、50、100时,其为欠拟合状态,设置lambda=1时的效果较好。
且当lambda=1时,其数据训练的准确度为83%,效果较好。