计算R中的AUC？

问题描述：

给定一个分数向量和一个实际类标签向量，如何计算R语言或简单英语中二进制分类器的单数AUC度量标准？计算R中的AUC？

第9页的"AUC: a Better Measure..."似乎需要知道类标签，这里是an example in MATLAB，我不明白

R(Actual == 1))

因为R（不要与R语言混淆）是指一个向量，但用作函数？

对于其他人谁不知道，显然AUC是我用过的“区域在[受试者工作特征（http://en.wikipedia.org/wiki/Receiver_operating_characteristic）曲线” – Justin 2011-02-04 21:30:40

答

正如其他人所提到的，您可以使用ROCR软件包计算AUC。使用ROCR软件包，您还可以绘制ROC曲线，升力曲线和其他模型选择度量。

您可以直接使用AUC等于真阳性得分大于真阴性的概率的事实，直接计算AUC而不使用任何包。

例如，如果pos.scores是含有分数的正例的矢量，并且neg.scores是包含负例子则AUC是由近似的矢量：

> mean(sample(pos.scores,1000,replace=T) > sample(neg.scores,1000,replace=T)) 
[1] 0.7261

将给AUC的近似。您也可以估算AUC的方差通过引导：

> aucs = replicate(1000,mean(sample(pos.scores,1000,replace=T) > sample(neg.scores,1000,replace=T)))

对于我的测试数据集，除了我不需要安装pROC外，您的复制值与@ jonw's（是0.8504，您的0.850591）非常相似。谢谢 – Andrew 2011-02-07 16:44:01

@Andrew @eric这是一个可怕的答案。您不*估计AUC的方差 - 您只估计重采样过程的方差。要说服自己，试着改变`样本`中的样本数量......将它除以10，将方差乘以10.将其乘以10，并将方差除以10.这当然不是期望的计算行为AUC的变化。 – Calimo 2014-02-14 12:25:21

答

The ROCR package将计算AUC其他统计中：

auc.tmp <- performance(pred,"auc"); auc <- as.numeric([email protected])

ROCR绘制性能，但我不知道它是如何计算“单数AUC指标”（来自原始问题）。 – Andrew 2011-02-07 16:17:32

`auc.tmp Itamar 2011-12-14 09:19:08

答

与包pROC可以使用函数auc()像是从帮助页面下面的例子：

> data(aSAH) 
> 
> # Syntax (response, predictor): 
> auc(aSAH$outcome, aSAH$s100b) 
Area under the curve: 0.7314

link to pROC

答

我通常使用DiagnosisMed软件包中的功能ROC。我喜欢它生成的图形。 AUC与它的置信区间一起被返回，并且在图上也被提及。

ROC(classLabels,scores,Full=TRUE)

截至2016年7月20日这个链接https://cran.r-project.org/web/packages/DiagnosisMed/index.html说`Package' DiagnosisMed“已从CRAN存储库中删除。” – arun 2016-07-20 20:58:27

我也很抱歉。 – 2016-07-21 06:28:05

答

除了Erik的响应线，你也应该能够直接从pos.scores和NEG比较所有可能的对数值计算的ROC。得分：

score.pairs <- merge(pos.scores, neg.scores) 
names(score.pairs) <- c("pos.score", "neg.score") 
sum(score.pairs$pos.score > score.pairs$neg.score)/nrow(score.pairs)

肯定比样品的方法效率较低或PROC :: AUC，但比以前更稳定，需要比后者少安装。

相关：当我尝试这个时，它给出了与pROC的值类似的结果，但不完全相同（关闭0.02左右）;结果更接近样本方法，N很高。如果有人有想法，为什么我可能会感兴趣。

答

无需任何额外的软件包：

true_Y = c(1,1,1,1,2,1,2,1,2,2) 
probs = c(1,0.999,0.999,0.973,0.568,0.421,0.382,0.377,0.146,0.11) 

getROC_AUC = function(probs, true_Y){ 
    probsSort = sort(probs, decreasing = TRUE, index.return = TRUE) 
    val = unlist(probsSort$x) 
    idx = unlist(probsSort$ix) 

    roc_y = true_Y[idx]; 
    stack_x = cumsum(roc_y == 2)/sum(roc_y == 2) 
    stack_y = cumsum(roc_y == 1)/sum(roc_y == 1)  

    auc = sum((stack_x[2:length(roc_y)]-stack_x[1:length(roc_y)-1])*stack_y[2:length(roc_y)]) 
    return(list(stack_x=stack_x, stack_y=stack_y, auc=auc)) 
} 

aList = getROC_AUC(probs, true_Y) 

stack_x = unlist(aList$stack_x) 
stack_y = unlist(aList$stack_y) 
auc = unlist(aList$auc) 

plot(stack_x, stack_y, type = "l", col = "blue", xlab = "False Positive Rate", ylab = "True Positive Rate", main = "ROC") 
axis(1, seq(0.0,1.0,0.1)) 
axis(2, seq(0.0,1.0,0.1)) 
abline(h=seq(0.0,1.0,0.1), v=seq(0.0,1.0,0.1), col="gray", lty=3) 
legend(0.7, 0.3, sprintf("%3.3f",auc), lty=c(1,1), lwd=c(2.5,2.5), col="blue", title = "AUC")

enter image description here

答

从ISL 9.6.3 ROC Curves代码相结合，与@J一起。 Won。对这个问题和其他几个地方的回答，下面绘制了ROC曲线，并在曲线右下角打印了AUC。

probs以下是二元分类的预测概率的数值向量，test$label包含测试数据的真实标签。

require(ROCR) 
require(pROC) 

rocplot <- function(pred, truth, ...) { 
    predob = prediction(pred, truth) 
    perf = performance(predob, "tpr", "fpr") 
    plot(perf, ...) 
    area <- auc(truth, pred) 
    area <- format(round(area, 4), nsmall = 4) 
    text(x=0.8, y=0.1, labels = paste("AUC =", area)) 

    # the reference x=y line 
    segments(x0=0, y0=0, x1=1, y1=1, col="gray", lty=2) 
} 

rocplot(probs, test$label, col="blue")

这给了这样一个情节：

答

我找到了一些解决方案在这里的是缓慢的和/或混乱（和他们中的一些不正确处理的关系），所以我在我的R包mltools中编写了我自己的基于data.table的函数auc_roc()。

library(data.table) 
library(mltools) 

preds <- c(.1, .3, .3, .9) 
actuals <- c(0, 0, 1, 1) 

auc_roc(preds, actuals) # 0.875 

auc_roc(preds, actuals, returnDT=TRUE) 
    Pred CountFalse CountTrue CumulativeFPR CumulativeTPR AdditionalArea CumulativeArea 
1: 0.9   0   1   0.0   0.5   0.000   0.000 
2: 0.3   1   1   0.5   1.0   0.375   0.375 
3: 0.1   1   0   1.0   1.0   0.500   0.875

答

当前最高票答案是不正确的，因为它无视关系。当正面和负面分数相等时，AUC应该是0.5。下面是更正的例子。

computeAUC <- function(pos.scores, neg.scores, n_sample=100000) { 
    # Args: 
    # pos.scores: scores of positive observations 
    # neg.scores: scores of negative observations 
    # n_samples : number of samples to approximate AUC 

    pos.sample <- sample(pos.scores, n_sample, replace=T) 
    neg.sample <- sample(neg.scores, n_sample, replace=T) 
    mean(1.0*(pos.sample > neg.sample) + 0.5*(pos.sample==neg.sample)) 
}

计算R中的AUC？

相关推荐