机器学习回归模型精度

Congratulations on completing your Machine Learning (ML) pipeline! In the second part of this series, I’ll talk about some metrics and graphics beyond the area under the ROC curve that can be helpful in choosing which model or models we’d like to move forward and implement in practice, keeping our project’s objectives in mind. Specifically, in this article we’ll:

恭喜您完成了机器学习(ML)流程！在本系列的第二部分中，我将讨论ROC曲线下方区域之外的一些度量标准和图形，这些度量标准和图形将有助于选择我们想要继续前进并在实践中实现的一个或多个模型，并保持项目目标心里。具体来说，在本文中，我们将：

1. Discuss why the Precision/Recall trade-off might be the optimal framework for many problems, and2. Display graphics and summaries that can turn a predictive modeling exercise into something highly impactful for decision making in practice.

1.讨论为什么“精确度/召回率”折衷方案可能是解决许多问题的最佳框架，以及2。显示图形和摘要，可以将预测性建模工作转变为对实践中的决策具有重大影响的事物。

什么是“精确度和召回率”(我以前见过)吗？ (What are Precision and Recall (and have I seen them before)?)

It’s likely you’ve seen Recall before if you have been working with classification problems, because it’s also known as Sensitivity. To review, say we are interested in predicting a 2-class event, Win or Lose. Then Recall = Sensitivity = the probability that a true Win is predicted as a Win; essentially it’s the proportion of true Wins that we successfully capture in the predicted Win bucket. Next, Precision is aka Positive Predicted Value (PPV), which I find a bit more descriptive. For PPV we’re flipping things around, saying: Okay, of the predicted Wins that I have in that bucket, which ones are actually true Wins. Of course both of these are important: we’d like to capture all the true Wins in our predicted Win bucket, and also for all the predicted Wins to turn out to be true Wins. But as you’d expect, these will be somewhat at odds with each other, and their trade-off is captured in the Precision — Recall (PR) curve. We can compute the area under this curve (AUC) just as we compute the AUC for the ROC curve. While the ROC curve has a reference diagonal line from 0 to 1, indicating a model performance equal to flipping a coin, the reference for the AUC PR curve is a horizontal line representing the frequency of true Wins.

如果您一直在处理分类问题，那么您很可能曾经看过Recall，因为它也被称为Sensitivity。回顾一下，假设我们对预测2级赛事“赢或输”感兴趣。然后，召回率=敏感性=预测一个真正的胜利为胜利的概率；从本质上讲，这是我们在预测的“胜利”时段成功捕获的真实胜利的比例。接下来，精度又称为正预测值(PPV)，我发现它更具描述性。对于PPV，我们正在四处转转，说：好吧，在我所预期的胜利中，哪些实际上是真正的胜利。当然，这两个都是很重要的：我们希望在预测的Win桶中捕获所有真实的Win，并且还希望所有预测的Win都变成真实的Win。但是，正如您所期望的那样，它们之间会有些矛盾，并且它们的权衡关系已在“精确度-召回率(PR)”曲线中体现出来。正如我们计算ROC曲线的AUC一样，我们可以计算该曲线下的面积(AUC)。 ROC曲线的参考对角线为0到1，表示模型性能等同于掷硬币，而AUC PR曲线的参考线为代表真实获胜频率的水平线。

Now that we understand the Precision — Recall trade-off, why would it be important? In many classification problems, there is an event of greater interest, or one that is associated with a greater cost or reward, than the other outcome. So while the AUC ROC curve is important, the AUC PR curve may be more germane to model performance and selection for many prediction problems. At this point you might say, ‘You’ve been talking about a trade-off, but how do I make these trades?’ It is by varying the probability cut-off level that we require to call something a ‘Win.’ The lower this level is set, the higher the Sensitivity/Recall (‘We’ve correctly captured most of the true Wins as predicted Wins’) but there may be many false-positives. The higher the cut-off, the higher the Precision/PPV (‘Most of the predicted Wins are actually true Wins’) but a large number of true Wins will not be classified as such (false-negatives). Alright, before this discussion becomes too abstract, let’s continue with an example.

既然我们了解了精度-回顾权衡，为什么它很重要？在许多分类问题中，有一个事件比另一个结果具有更大的兴趣，或者与更大的成本或报酬相关联。因此，尽管AUC ROC曲线很重要，但AUC PR曲线可能与模型性能和许多预测问题的选择更紧密相关。此时，您可能会说：“您一直在讨论权衡问题，但是我该如何进行这些交易？” 通过改变我们称为“胜利”所需的概率临界水平。设置的级别越低，灵敏度/召回率就越高(“我们已经正确地捕获了大多数真实的胜利，如预期的胜利”)，但是可能有很多错误的肯定。临界值越高，Precision / PPV越高(“大多数预测的胜利实际上是真实的胜利”)，但是大量的真实胜利不会被归类为此类(假阴性)。好了，在此讨论变得过于抽象之前，让我们继续一个示例。

数据 (Data)

In this article I’ll be using results from 7 Machine Learning (ML) models that analyzed the German Credit classification data available in the R caret package. It consists of 1000 observations of individuals determined to have either Good (n=700) or Bad (n=300) credit. The are 61 predictors that cover a variety of factors that may have to do with credit, such as loan features, banking information, demographics, employment, etc. The data were separated into a 70%-30% training-test split, preserving the 70% Good credit frequency in both. The training set was split into 5 CV sets, repeated 5 times, to determine optimal tuning parameters. Modeling was done using the caret package in R. The models fit were the following:

在本文中，我将使用来自7个机器学习(ML)模型的结果，这些模型分析了R caret软件包中可用的German Credit分类数据。它由1000个观察结果组成，这些观察结果被确定为信誉良好(n = 700)或不良(n = 300)的个人。有61个预测变量，涵盖了可能与信贷有关的多种因素，例如贷款特征，银行信息，人口统计学，就业情况等。数据分为70％-30％的培训测试部分，从而保留了两者都有70％的良好信用频率。将训练集分为5个CV集，重复5次，以确定最佳调整参数。使用R中的插入符号包进行建模。适合的模型如下：

* Linear Discriminant Analysis (LDA)* Partial Least Squares (PLS)* Support Vector Machines (SVM, with radial kernel function)* Neural Network (NN)* Recursive Partitioning (Rpart, single tree) * Random Forests (RF)* Gradient Boosting Machines (GBM)

*线性判别分析(LDA)*偏最小二乘(PLS)*支持向量机(具有径向核函数的SVM)*神经网络(NN)*递归分区(Rpart，单树)*随机森林(RF)*梯度提升机器(GBM)

Below are both the AUC ROC and AUC PR curves for the seven ML learning fits applied to the hold-out test set (i.e. not involved in the model cross-validation). With the exception of the single tree Rpart model, the model performance in the test set is similar across the fits. Looking at an actual Precision — Recall curve, the trade-off becomes apparent: when our Recall or Sensitivity is low, we are not correctly predicting many of the Good credit subjects, but those we are classifying as Good are mostly actually Good. This corresponds to using a high value for the probability cut-off for ‘Good’. As we decrease the probability cut-off, the Sensitivity/Recall improves, but at the cost of the Precision/PPV (Many of the predicted Good credit subjects actually have Bad Credit).

以下是应用于保留测试集(即不涉及模型交叉验证)的七个ML学习拟合的AUC ROC和AUC PR曲线。除了单树Rpart模型外，测试集中的模型性能在整个拟合中相似。观察实际的Precision-Recall曲线，权衡变得很明显：当我们的Recall或Sensitivity低时，我们无法正确预测许多Good信用科目，但我们归类为Good的科目实际上是Good。这对应于为“良好”的概率临界值使用较高的值。随着我们降低概率临界值，灵敏度/召回率提高了，但是却以精确度/ PPV为代价(许多预测的良好信用主体实际上具有不良信用)。

The data in the German Credit problem is an example of a lopsided classification penalty structure. In this case, it’s worse to classify a subject as having Good credit when their credit is actually Bad. This may result in someone receiving a loan or credit card on which they are likely to default. On the other hand, it is profitable to make loans and issue credit to those who are capable of fulfilling the credit terms, but not as much is made from a successful loan as is lost from an unsuccessful one.

德国信用问题中的数据是不正确的分类罚款结构的一个示例。在这种情况下，当他们的信用实际上为不良时，将其归为具有良好信用的主题就更糟了。这可能导致某人收到他们可能拖欠的贷款或信用卡。另一方面，向有能力履行信贷条件的人发放贷款和发放信贷是有利可图的，但成功贷款所产生的收益不如失败贷款所产生的损失。

Another intuitive way to visualize the effects of different cut-offs on the necessary trade-offs is to examine the distributions of predicted probabilities on our test data by their ground truth. An example is shown below; the Rpart model has been removed to improve the figure scaling, and the vertical green line represents a potential cut-off that could be used to determine model-predicted positives (here it’s 0.7). This type of figure allows regions of true and false positives to be displayed in a way that is easily digestible. It’s clear from the plot that the models will require different probability cut-offs to achieve the same performance. For example, the PLS model will require lower cut-offs to achieve similar Sensitivity and PPV as the others. Another insight from this figure is that all but the highest cut-offs will retain some false-positives, and for some models they may be impossible to eliminate completely.

可视化不同临界值对必要折衷效果的另一种直观直观显示方式是，通过我们的基本事实检查我们测试数据上的预测概率分布。一个例子如下所示; Rpart模型已删除，以改善图形缩放比例，垂直的绿线表示一个潜在的截止点，可用于确定模型预测的阳性(此处为0.7)。这种类型的图形允许以易于消化的方式显示真假阳性区域。从图中可以明显看出，这些模型需要不同的概率截止值才能实现相同的性能。例如，PLS模型将需要较低的截止值，以实现与其他类似的灵敏度和PPV。该图的另一个见解是，除了最高分界值以外，所有分值都将保留一些假阳性，对于某些模型，可能无法完全消除。

After we have the big picture, we can dive a bit more into the details, examining the models for the desired sweet spot to meet our objectives. For example, if a PPV value of 76% were tolerable (i.e. 24% false positives), the RF model indicates this could be achieved at a Sensitivity of 93% using a cut-off of 0.5. (Note that these values should be viewed as estimates, taken from the one test sample. An alternative could involve computing these for each CV holdout sample and examining their distribution). However, this false positive rate may be too high, and if we’re less concerned with missing out on the true positives, we might prefer the other end of the spectrum. The PLS model at a cut-off of 0.7 would only capture 22% of the true positives, but 95% of those classified as positive would truly be so. The nature of our risk function along with the practical constraints associated with our model implementation can lead us to the optimal choice. In the credit example, expected gains and losses in dollars (or euros) could be assumed for Good and Bad credit and computed for each cut-off option.

大致了解之后，我们可以进一步深入研究细节，检查模型以找到符合我们目标的所需最佳位置。例如，如果PPV值为76％是可以容忍的(即24％的假阳性)，则RF模型表明可以使用0.5的临界值在93％的灵敏度下实现。 (请注意，这些值应视为从一个测试样本中获取的估计值。另一种选择可能包括为每个CV保持样本计算这些值并检查其分布)。但是，这个假阳性率可能太高，如果我们不太担心错过真正的阳性率，那么我们可能更喜欢另一端。截止值为0.7的PLS模型只能捕获22％的真实阳性，但是被分类为阳性的95％的真实阳性确实如此。我们的风险函数的性质以及与模型实现相关的实际约束条件可以使我们做出最佳选择。在信用示例中，可以假设良好和不良信用以美元(或欧元)为单位的预期损益，并针对每个截止选项进行计算。

In conclusion, strategically considering Precision-Recall, predicted probability distributions and an analysis of model performance at different cut-offs together with how the model we develop will be used in practice will lead to machine learning pipelines that have a real opportunity to impact larger business goals and objectives.

总之，从战略上考虑精确召回，预测的概率分布以及在不同临界点的模型性能分析以及我们在实践中如何使用我们开发的模型，将导致机器学习管道真正有机会影响更大的业务目标和目的。

翻译自: https://towardsdatascience.com/machine-learning-model-implementation-precision-recall-and-probability-cut-offs-49429ed644c5

机器学习回归模型精度

机器学习回归模型精度_机器学习模型实施精度召回和概率截止。

什么是“精确度和召回率”(我以前见过)吗？ (What are Precision and Recall (and have I seen them before)?)

数据 (Data)

相关推荐