在函数中表示中间处理步骤的适当表示法是什么？

问题描述：

所以我们可以说我已经定义了以下功能中的一组文件阅读：在函数中表示中间处理步骤的适当表示法是什么？

read_File <- function(file){ 
    # read Excel file 
    df1 <- read.xls(file,sheet=1, pattern="Name", header=T, na.strings=("##"), stringsAsFactors=F) 
    # remove rows where Name is empty 
    df2 <- df1[-which(df1$Name==""),] 
    # remove rows where "Name" is repeated 
    df3 <- df2[-which(df2$Name=="Name"),] 
    # remove all empty columns (anything with an auto-generated colname) 
    df4 <- df3[, -grep("X\\.{0,1}\\d*$", colnames(df3))] 
    row.names(df4) <- NULL 
    df4$FileName <- file 
    return(df4) 
}

它工作正常，像这样的，但感觉不好的形式来定义df1...df4代表中间步骤。有没有更好的方式来做到这一点，而不影响可读性？

答

我没有理由分开保存中间对象，除非它们需要多次使用。这是不是在你的代码的情况下，所以我会取代你的所有df[0-9]与df：

read_File <- function(file){ 
    # read Excel file 
    df <- read.xls(file,sheet = 1, pattern = "Name", header = T, 
        na.strings = ("##"), stringsAsFactors = F) 
    # remove rows where Name is empty 
    df <- df[-which(df$Name == ""), ] 
    # remove rows where "Name" is repeated 
    df <- df[-which(df$Name == "Name"), ] 
    # remove all empty columns (anything with an auto-generated colname) 
    df <- df[, -grep("X\\.{0,1}\\d*$", colnames(df))] 
    row.names(df) <- NULL 
    df$FileName <- file 
    return(df) 
}

df3是不是一个很好的描述性的变量名 - 它不会告诉你任何有关的变量，然后df。按顺序命名变量的步骤也会产生维护负担：如果您需要在中间添加新步骤，则需要重命名所有后续对象以保持一致性 - 这听起来令人烦恼，并且可能对bug有风险。（或者有些东西像df2.5，这很难看，并且不能很好地概括）。通常，我认为sequentially named variables are almost always bad practice, even when they are separate objects that you need saved。

此外，保持中间物体周围不利于记忆。在大多数情况下，这并不重要，但是如果您的数据大于保存所有中间步骤，将大大增加处理过程中使用的内存量。

评论非常好，很多细节 - 他们告诉你所有你需要知道的代码中发生了什么。

如果是我，我可能会结合一些步骤，这样的事情：

read_File <- function(file){ 
    # read Excel file 
    df <- read.xls(file,sheet = 1, pattern = "Name", header = T, 
        na.strings = ("##"), stringsAsFactors = F) 
    # remove rows where Name is bad: 
    bad_names <- c("", "Name") 
    df <- df[-which(df$Name %in% bad_names), ] 

    # remove all empty columns (anything with an auto-generated colname) 
    df <- df[, -grep("X\\.{0,1}\\d*$", colnames(df))] 
    row.names(df) <- NULL 
    df$FileName <- file 
    return(df) 
}

有一个bad_names矢量省略节省了线，更参数 - 这将是微不足道的，以促进bad_names到函数参数（可能使用默认值c("", "Name")），以便用户可以自定义它。

同意。这是非常好的代码，不需要额外的变量。当我开发代码时，我确实倾向于使用'temp'或'junk'作为变量名称，但它有时会困扰我。 – Kevin

在函数中表示中间处理步骤的适当表示法是什么？

相关推荐