Andrew Ng's Coursera Machine Leaning Coding Hw 1

Andrew Ng’s Coursera Machine Leaning Coding Hw 1

Author: Yu-Shih Chen
December 21, 2018 4:17AM

Intro：
本人目前是在加州上大学的大二生，对人工智能和数据科学有浓厚的兴趣所以在上学校的课的同时也喜欢上一些网课。主要目的是希望能够通过在这个平台上分享自己的笔记来达到自己更好的学习/复习效果所以notes可能会有点乱，有些我认为我自己不需要再复习的内容我也不会重复。当然，如果你也在上这门网课，然后刚好看到了我的notes，又刚好觉得我的notes可能对你有点用，那我也会很开心哈哈！有任何问题或建议OR单纯的想交流/单纯想做朋友的话可以加我的微信：y802088

Week 2 Coding Assignment

大纲：

Warm-up Exercise
Plot Data
Cost Function
Gradient Descent

Warm-up Exercise

function A = warmUpExercise()
%WARMUPEXERCISE Example function in octave
%   A = WARMUPEXERCISE() is an example function that returns the 5x5 identity matrix

A = [];
% ============= YOUR CODE HERE ==============
% Instructions: Return the 5x5 identity matrix 
%               In octave, we return values by defining which variables
%               represent the return values (at the top of the file)
%               and then set them accordingly. 

A = eye(5,5) 

% ===========================================


end

这个没什么好讲的，就是做一个5x5的identity matrix，一行就完事了。

Plot Data

function plotData(x, y)
%PLOTDATA Plots the data points x and y into a new figure 
%   PLOTDATA(x,y) plots the data points and gives the figure axes labels of
%   population and profit.

figure; % open a new figure window

% ====================== YOUR CODE HERE ======================
% Instructions: Plot the training data into a figure using the 
%               "figure" and "plot" commands. Set the axes labels using
%               the "xlabel" and "ylabel" commands. Assume the 
%               population and revenue data have been passed in
%               as the x and y arguments of this function.
%
% Hint: You can use the 'rx' option with plot to have the markers
%       appear as red crosses. Furthermore, you can make the
%       markers larger by using plot(..., 'rx', 'MarkerSize', 10);

data = load('ex1data1.txt');
X = data(:,1);
y = data(:,2);
m = size(X,1); % number of training sets
plot(X,y,'rx','MarkerSize',10);
ylabel('Profit in %10,000s');
xlabel('Population of City in 10,000s');

% ============================================================

end

这里就是extract我们需要的数据也就是X（features）和y（results）。Specifically，我们要通过一个城市的population（X)去预测profit for food truck（y）。 这个section只是把提供的数据库给用xy图表画出来而已：
Andrew Ng's Coursera Machine Leaning Coding Hw 1

Compute Cost

这个section要写出我们的J（cost function）也就是误差公式：
Andrew Ng's Coursera Machine Leaning Coding Hw 1

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.

h_x = X * theta;
J = sum((h_x - y).^2) / (2*m)

% =========================================================================
end

这里注意好矩阵之间的关系就好。这里X加了1列的 ‘1’（默认值，详细看coursera的****）之后是 m x 2 （m = sample的总量），而theta被我们初始为theta = zeros(2, 1); 也就是2 x 1的matrix（全部为0）。所以h_x （预测值）就是X * theta，出来的是个m x 1 的vector，也就跟我们的y一样（请参考linear algebra的矩阵乘法）。 之后再把h_x带到我们的公式里就好，简单粗暴。

Gradient Descent

这里有2种写法，个人认为第二种比较全面所以比较好，但第一次做的时候可能第一种会比较容易理解。

第一种：

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by 
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta. 
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    %
    
    h_x = X * theta; % m x 1 vector
    %Do not need to loop over theta since only 2
    temp1 = theta(1) - (alpha * sum(h_x - y) / m);
    temp2 = theta(2) - (alpha * sum((h_x - y).* X(:,2))/m);
    % Store in temp because we don't want to change theta value before using it.
    theta(1) = temp1;
    theta(2) = temp2;
   
    % ============================================================

    % Save the cost J in every iteration    
    J_history(iter) = computeCost(X, y, theta);

end

end

这里有几个需要注意的点：

需要将theta存进temp里面，因为如果直接assign的话它运行theta2的时候就会使用一个跟theta1不一样的theta（因为被更改theta1的时候改掉了）。
如果theta的元素更多，那将会需要用for loop来给所有的theta做gradient descent（也就是第二种写法）

第二种：

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by 
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta. 
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    %
    
    h_x = X * theta; % m x 1 vector
    temp = theta;
    % For loop to loop over elements in temp
    for i = 1:size(theta,1)
        temp(i) = theta(i) - (alpha * sum((h_x - y).* X(:,i))/m);
    % Store in temp because we don't want to change theta value before using it.
    theta = temp;
    
    % ============================================================

    % Save the cost J in every iteration    
    J_history(iter) = computeCost(X, y, theta);

end

end

这里是将theta存进一个temp的矩阵（还是一样的原因，避免更改theta的值）然后用一个for loop计算所有在‘temp’里面的元素的gradient descent，最后将计算好的temp来更改theta。这是一次的gradient descent的循环，将这个环节进行多次便能找到理想的theta值。
用contour graph和xy-graph来visualize我们的结果：
Andrew Ng's Coursera Machine Leaning Coding Hw 1
我们可以看到这个红色的‘x’也就是我们的误差值已经到达了3d图接近谷底的地方，也就是接近最低值的地方。

我们可以拿来跟最开始的graph作比较，可以发现这是一个还算不错的line of fit。做到这里就可以恭喜你做出了你的第一个用machine learning算出的预测公式啦！（此处应有掌声啪啪啪）

Week2 的coding作业（required section）就到这里啦。
Thanks for reading！

Andrew Ng's Coursera Machine Leaning Coding Hw 1