机器学习:模型评估(混淆矩阵、ROC、AUC)
Confusion Matrix
Confusion Matrix is a performance measurement for machine learning classification.
Understand TP, FP, FN, TN
- True Positive: predicted positive and it’s true.
- False Positive: type 1 error, predicted positive and it’s false.
- False Negative: type 2 error, predicted negative and it’s false.
- True Negative: predicted negative and it’s true.
Understand confusion matrix through math
-
TPR/Recall/Sensitivity
Out of all the positive classes, how much we predicted correctly. (所有样例预测为正例,召回率为100%)
-
Precision
Out of all the positive classes we have predicted correctly, how many are actually positive.
-
F-measure
It is difficult to compare two models with low precision and high recall or vice versa. F-score helps to measure Recall and Precision at the same time. It uses Harmonic Mean in place of Arithmetic Mean by punishing the extreme values more.
Area Under the Curve and Receiver Operating Characteristics
AUC - ROC curve is a performance measurement for classification problem at various thresholds settings.
ROC is a probability curve and AUC represents degree or measure of separability. It tells how much model is capable of distinguishing between classes
. Higher the AUC, better the model is at predicting 0s as 0s and 1s as 1s. By analogy, Higher the AUC, better the model is at distinguishing between patients with disease and no disease. AUC is 0.5, it means model has no class separation capacity whatsoever
The ROC curve is plotted with TPR against the FPR where TPR is on y-axis and FPR is on the x-axis. We could evaluate a binary classification model with different classification thresholds. and then got ROC curve.
所有样本预测为正例时,FN = TN = 0, TPR = FPR = 1,因此调整属于正例的概率阈值从1.0到0.0 :
- TN ↓、FP ↑,FPR=FP/TN+FP ↑
- FN ↓、TP ↑,TPR=TP/TP+FN ↑
Reference: