从r出生日期到年龄段更改一列

从r出生日期到年龄段更改一列

问题描述:

我首次使用data.table。从r出生日期到年龄段更改一列

我在我的表中有一个约40万年龄的专栏。我需要将它们从出生日期转换为年龄。

这样做的最好方法是什么?

this blog entry的评论,我发现eeptools包中的age_calc函数。它处理边缘情况(闰年等),检查输入并看起来相当健壮。

library(eeptools) 
x <- as.Date(c("2011-01-01", "1996-02-29")) 
age_calc(x[1],x[2]) # default is age in months 

[1] 46.73333 224.83118

age_calc(x[1],x[2], units = "years") # but you can set it to years 

[1] 3.893151 18.731507

floor(age_calc(x[1],x[2], units = "years")) 

[1] 3 18

为您的数据

yourdata$age <- floor(age_calc(yourdata$birthdate, units = "years")) 

假设你想在整数岁。

假设你有一个data.table,你可以做如下:

library(data.table) 
library(lubridate) 
# toy data 
X = data.table(birth=seq(from=as.Date("1970-01-01"), to=as.Date("1980-12-31"), by="year")) 
Sys.Date() 

方法1:使用 “as.period” 从lubriate包

X[, age := as.period(Sys.Date() - birth)][] 
     birth     age 
1: 1970-01-01 44y 0m 327d 0H 0M 0S 
2: 1971-01-01 43y 0m 327d 6H 0M 0S 
3: 1972-01-01 42y 0m 327d 12H 0M 0S 
4: 1973-01-01 41y 0m 326d 18H 0M 0S 
5: 1974-01-01 40y 0m 327d 0H 0M 0S 
6: 1975-01-01 39y 0m 327d 6H 0M 0S 
7: 1976-01-01 38y 0m 327d 12H 0M 0S 
8: 1977-01-01 37y 0m 326d 18H 0M 0S 
9: 1978-01-01 36y 0m 327d 0H 0M 0S 
10: 1979-01-01 35y 0m 327d 6H 0M 0S 
11: 1980-01-01 34y 0m 327d 12H 0M 0S 

选项2:如果你不知道就像选项1的格式一样,你可以这样做:

yr = duration(num = 1, units = "years") 
X[, age := new_interval(birth, Sys.Date())/yr][] 
# you get 
     birth  age 
1: 1970-01-01 44.92603 
2: 1971-01-01 43.92603 
3: 1972-01-01 42.92603 
4: 1973-01-01 41.92329 
5: 1974-01-01 40.92329 
6: 1975-01-01 39.92329 
7: 1976-01-01 38.92329 
8: 1977-01-01 37.92055 
9: 1978-01-01 36.92055 
10: 1979-01-01 35.92055 
11: 1980-01-01 34.92055 

相信选项2应该是更可取的。

我一直在想这件事,并且对迄今为止的两个答案感到不满。我喜欢使用lubridate,正如@KFB所做的那样,但我也希望事情能够很好地包装在一个函数中,就像我在使用eeptools包的答案中一样。因此,这里的使用lubridate区间方法与一些不错的选择包装函数:

#' Calculate age 
#' 
#' By default, calculates the typical "age in years", with a 
#' \code{floor} applied so that you are, e.g., 5 years old from 
#' 5th birthday through the day before your 6th birthday. Set 
#' \code{floor = FALSE} to return decimal ages, and change \code{units} 
#' for units other than years. 
#' @param dob date-of-birth, the day to start calculating age. 
#' @param age.day the date on which age is to be calculated. 
#' @param units unit to measure age in. Defaults to \code{"years"}. Passed to \link{\code{duration}}. 
#' @param floor boolean for whether or not to floor the result. Defaults to \code{TRUE}. 
#' @return Age in \code{units}. Will be an integer if \code{floor = TRUE}. 
#' @examples 
#' my.dob <- as.Date('1983-10-20') 
#' age(my.dob) 
#' age(my.dob, units = "minutes") 
#' age(my.dob, floor = FALSE) 
age <- function(dob, age.day = today(), units = "years", floor = TRUE) { 
    calc.age = interval(dob, age.day)/duration(num = 1, units = units) 
    if (floor) return(as.integer(floor(calc.age))) 
    return(calc.age) 
} 

使用示例:

> my.dob <- as.Date('1983-10-20') 

> age(my.dob) 
[1] 31 

> age(my.dob, floor = FALSE) 
[1] 31.15616 

> age(my.dob, units = "minutes") 
[1] 16375680 

> age(seq(my.dob, length.out = 6, by = "years")) 
[1] 31 30 29 28 27 26 
+0

这是我正在寻找的答案。 ([我们再次见面](http://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207)) – Ben 2017-02-01 21:06:48

+0

警告消息: 'new_interval'已弃用;改用'间隔'。在'1.5.0'版本中已弃用。 – malajisi 2018-02-28 08:35:18

+0

@malajisi谢谢,更新。 – Gregor 2018-02-28 14:38:41

我并不高兴与任何反应的,当涉及到计算时代几个月或几年,当处理闰年时,所以这是我使用lubridate软件包的功能。

基本上,它会将fromto之间的间隔切分为(最多)年度块,然后调整该块是否为闰年的时间间隔。总间隔是每个块的年龄的总和。

library(lubridate) 

#' Get Age of Date relative to Another Date 
#' 
#' @param from,to the date or dates to consider 
#' @param units the units to consider 
#' @param floor logical as to whether to floor the result 
#' @param simple logical as to whether to do a simple calculation, a simple calculation doesn't account for leap year. 
#' @author Nicholas Hamilton 
#' @export 
age <- function(from, to = today(), units = "years", floor = FALSE, simple = FALSE) { 

    #Account for Leap Year if Working in Months and Years 
    if(!simple && length(grep("^(month|year)",units)) > 0){ 
    df = data.frame(from,to) 
    calc = sapply(1:nrow(df),function(r){ 

     #Start and Finish Points 
     st = df[r,1]; fn = df[r,2] 

     #If there is no difference, age is zero 
     if(st == fn){ return(0) } 

     #If there is a difference, age is not zero and needs to be calculated 
     sign = +1 #Age Direction 
     if(st > fn){ tmp = st; st = fn; fn = tmp; sign = -1 } #Swap and Change sign 

     #Determine the slice-points 
     mid = ceiling_date(seq(st,fn,by='year'),'year') 

     #Build the sequence 
     dates = unique(c(st,mid,fn)) 
     dates = dates[which(dates >= st & dates <= fn)] 

     #Determine the age of the chunks 
     chunks = sapply(head(seq_along(dates),-1),function(ix){ 
     k = 365/(365 + leap_year(dates[ix])) 
     k*interval(dates[ix], dates[ix+1])/duration(num = 1, units = units) 
     }) 

     #Sum the Chunks, and account for direction 
     sign*sum(chunks) 
    }) 

    #If Simple Calculation or Not Months or Not years 
    }else{ 
    calc = interval(from,to)/duration(num = 1, units = units) 
    } 

    if (floor) calc = as.integer(floor(calc)) 
    calc 
} 

我喜欢做这个使用lubridate包,借用我原来在另一post遇到的语法。

有必要根据R日期对象标准化输入日期,最好使用lubridate::mdy()lubridate::ymd()或相似的函数(如适用)。您可以使用​​函数生成描述两个日期之间所用时间的间隔,然后使用duration()函数来定义如何将该间隔“切块”。

我总结了简单的情况下,计算从下面两个日期的时代,使用最新的语法R.

df$DOB <- mdy(df$DOB) 
df$EndDate <- mdy(df$EndDate) 
df$Calc_Age <- interval(start= df$DOB, end=df$EndDate)/      
        duration(n=1, unit="years") 

年龄可能向下调整至最接近的整数完全使用基本R'地板()`函数,如下所示:

df$Calc_AgeF <- floor(df$Calc_Age) 

可替换地,在基R中的digits=参数round()函数可用于舍向上或向下,并指定小数的确切数目在返回值中,像这样:

df$Calc_Age2 <- round(df$Calc_Age, digits = 2) ## 2 decimals 
df$Calc_Age0 <- round(df$Calc_Age, digits = 0) ## nearest integer 

值得注意的是,一旦输入日期通过以上(即,​​和duration()功能)中所述的计算步骤中通过,返回值将是数字,并不再在R.日期对象这是显著而lubridate::floor_date()严格限于日期时间对象。

无论输入日期是否出现在data.tabledata.frame对象中,上述语法都适用。