吴恩达机器学习练习2:Regularized logistic regression

本小节主要练习正则化logistic分类。
1、原始数据的可视化

data = load('ex2data2.txt');%将数据导入data中
X = data(:, [1, 2]); y = data(:, 3);%读取data的第1、2列为输入X,第3列为输出y(y=0或y=1)
plotData(X, y);%调用plotData函数绘图
hold on;
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')
legend('y = 1', 'y = 0')
hold off;

其中补充plotData函数为:

function plotData(X, y)
figure; hold on;
pos = find(y==1);%返回向量y中数值为1的位置,pos也为向量
neg = find(y==0);%返回向量y中数值为0的位置,neg也为向量

%绘制y==1的点,使用红+表示
plot(X(pos,1),X(pos,2),'r+','LineWidth',2,'MarkerSize',7);
%绘制y==0的点,使用蓝o表示
plot(X(neg,1),X(neg,2),'bo','LineWidth',2,'MarkerSize',7);
hold off;
end

其数据可视化图形为:
吴恩达机器学习练习2:Regularized logistic regression
2、扩充特征矢量
原始数据仅有两个特征矢量x1和x2,为更好拟合数据,扩充特征矢量至28维,扩充为:
吴恩达机器学习练习2:Regularized logistic regression

X = mapFeature(X(:,1), X(:,2));
function out = mapFeature(X1, X2)
degree = 6;
out = ones(size(X1(:,1)));%out为全为1的列向量
for i = 1:degree
    for j = 0:i
        out(:, end+1) = (X1.^(i-j)).*(X2.^j);%增加out矢量列数据
    end
end
end

得到的out矩阵为:
吴恩达机器学习练习2:Regularized logistic regression
设置其初始化参数:

% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);

% Set regularization parameter lambda to 1
lambda = 1;

3、代价函数和梯度
正则化后的代价函数为:
吴恩达机器学习练习2:Regularized logistic regression
为防止过拟合,对于特征参数的高次项系数增加惩罚,即代价函数增加theta^2,但无需对theta0(matlab编程中theta0即为theta(1))进行惩罚。
其梯度计算结果为(分为j=0和j≥1两种情况):
吴恩达机器学习练习2:Regularized logistic regression
吴恩达机器学习练习2:Regularized logistic regression

% Compute and display initial cost and gradient for regularized logistic
% regression
[cost, grad] = costFunctionReg(initial_theta, X, y, lambda);

完善costFunctionReg函数:

function [J, grad] = costFunctionReg(theta, X, y, lambda)
m = length(y); % number of training examples
J = 0;
grad = zeros(size(theta));
J = ((log(sigmoid(X*theta)))'*y + (log(1-sigmoid(X*theta)))'*(1-y))/(-m)+...
    (theta'*theta - theta(1)*theta(1))*lambda/(2*m);
grad = (sigmoid(X*theta)-y)'*X/m +theta'*lambda/m;
grad(1) = (sigmoid(X*theta)-y)'*X(:,1)/m;
end

grad采用矩阵进行运算,由于梯度在计算时grad(1)计算方式不同,则只需重新计算grad(1)并刷新数据即可。

验证代码的正确性,输出当初始theta全为0的代价值和梯度的前5项,计算出来的结果与期望值一致。

fprintf('Cost at initial theta (zeros): %f\n', cost);
fprintf('Expected cost (approx): 0.693\n');
fprintf('Gradient at initial theta (zeros) - first five values only:\n');
fprintf(' %f \n', grad(1:5));
fprintf('Expected gradients (approx) - first five values only:\n');
fprintf(' 0.0085\n 0.0188\n 0.0001\n 0.0503\n 0.0115\n');

其输出结果为:

Cost at initial theta (zeros): 0.693147
Expected cost (approx): 0.693
Gradient at initial theta (zeros) - first five values only:
 0.008475 
 0.018788 
 0.000078 
 0.050345 
 0.011501 
Expected gradients (approx) - first five values only:
 0.0085
 0.0188
 0.0001
 0.0503
 0.0115

将theta全部置为1,lambda置为10,重新计算:

% Compute and display cost and gradient
% with all-ones theta and lambda = 10
test_theta = ones(size(X,2),1);
[cost, grad] = costFunctionReg(test_theta, X, y, 10);

fprintf('\nCost at test theta (with lambda = 10): %f\n', cost);
fprintf('Expected cost (approx): 3.16\n');
fprintf('Gradient at test theta - first five values only:\n');
fprintf(' %f \n', grad(1:5));
fprintf('Expected gradients (approx) - first five values only:\n');
fprintf(' 0.3460\n 0.1614\n 0.1948\n 0.2269\n 0.0922\n');

其输出结果与期望一致,保证代码为正确,其输出结果为:

Cost at test theta (with lambda = 10): 3.164509
Expected cost (approx): 3.16
Gradient at test theta - first five values only:
 0.346045 
 0.161352 
 0.194796 
 0.226863 
 0.092186 
Expected gradients (approx) - first five values only:
 0.3460
 0.1614
 0.1948
 0.2269
 0.0922

4、使用fminunc函数进行优化计算

% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);

% Set regularization parameter lambda to 1 (you should vary this)
lambda = 1;

% Set Options
options = optimset('GradObj', 'on', 'MaxIter', 400);

% Optimize
[theta, J, exit_flag] = ...
	fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);

% Plot Boundary
plotDecisionBoundary(theta, X, y);
hold on;
title(sprintf('lambda = %g', lambda))

% Labels and Legend
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')

legend('y = 1', 'y = 0', 'Decision boundary')
% Compute accuracy on our training set
p = predict(theta, X);

fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);
fprintf('Expected accuracy (with lambda = 1): 83.1 (approx)\n');

(1)设置lambda = 0,数据训练的准确度为87.3%,其决策边界为:
吴恩达机器学习练习2:Regularized logistic regression

lambda = 0 Train Accuracy: 87.288136

(2)设置lambda = 1,数据训练准确度为83%,其决策边界为:
吴恩达机器学习练习2:Regularized logistic regression

lambda = 1 Train Accuracy: 83.050847

(3)设置lambda = 10,数据训练准确度为74.6%,其决策边界为:
吴恩达机器学习练习2:Regularized logistic regression

lambda = 10 Train Accuracy: 74.576271

(4)设置lambda = 50,数据训练的准确度为66.95%,其决策边界为:
吴恩达机器学习练习2:Regularized logistic regression

lambda = 50 Train Accuracy: 66.949153

(5)设置lambda = 100,数据训练的准确度为61%,其决策边界为:
吴恩达机器学习练习2:Regularized logistic regression

lambda = 100 Train Accuracy: 61.016949

综上,lambda = 0即未对theta进行惩罚,其为过拟合状态,theta = 10、50、100时,其为欠拟合状态,设置lambda=1时的效果较好。
且当lambda=1时,其数据训练的准确度为83%,效果较好。