修改文本文件中不是唯一行的字段

问题描述：

我正在处理一个脚本，该脚本需要从单个文本文件中获取重复的行并更改日期字段的值，但仅更改日期字段的值。该字段分隔符是TAB所以...修改文本文件中不是唯一行的字段

# cat enviando4 
1414743351  2014-11-01 09:00:00 
1414743351  2014-10-31 09:15:51 
1414743351  2014-10-30 23:00:00 
1414743351  2014-10-31 09:15:51 
1414743351  2014-10-30 23:00:00 
1414743351  2014-10-31 10:25:00 
1414743351  2014-10-31 09:15:51 
1414743351  2014-11-01 10:25:00

我按日期排序行：

/斌/排序enviando4 -k2 -t $ '\ T' -o enviando4

# cat enviando4 
1414743351  2014-10-30 23:00:00 
1414743351  2014-10-30 23:00:00 
1414743351  2014-10-31 09:15:51 
1414743351  2014-10-31 09:15:51 
1414743351  2014-10-31 09:15:51 
1414743351  2014-10-31 10:25:00 
1414743351  2014-11-01 09:00:00 
1414743351  2014-11-01 10:25:00

现在我需要添加至少4分钟（从不减去）至少有一个重复的日期，所以我将只有唯一的日期。它看起来像这样：

# cat enviando4 
1414743351  2014-10-30 23:04:00 --> add 4 
1414743351  2014-10-30 23:00:00 --> no change 
1414743351  2014-10-31 09:19:51 --> add 4 
1414743351  2014-10-31 09:23:51 --> add 8 
1414743351  2014-10-31 09:15:51 --> no change 
1414743351  2014-10-31 10:25:00 --> unique, no change 
1414743351  2014-11-01 09:00:00 --> unique, no change 
1414743351  2014-11-01 10:25:00 --> unique, no change

并验证这些更改没有使新的重复值。我被困在这。谢谢。

如果颠倒排序顺序，那么你将需要4分钟添加到*最后*重复行 - 更容易在类似AWK做的事情 - 保持一条线在缓冲区中，检查密钥的更改，敲入持有的线路，... – Dinesh 2014-10-31 00:25:12

答

你的任务并不困难。 Bash有奇妙的日期操作实用程序。你需要做的是sort the original列表，然后read each line的排序文件，compare the date/time to the previous日期时间和使用计数器，增加一个counter * 4min偏移量和write the new date/time to your output file.处理时间调整有很多方法。最简单的方法是将日期/时间字符串转换为自纪元以来的秒数。然后只需将偏移量添加到重复时间并将其转换回所需的日期/时间格式即可。

以下示例显示了执行此操作的一种方法。有几个可以组合的操作，但我已经将偏移量计算分开，以使其更具可读性。该脚本将输入文件作为第一个参数（我已将其设置为默认为dat/env4.dat，用于我的测试，按照您的设置进行设置）。然后，该脚本将排序到临时文件，读取临时文件，使时间调整为重复，然后将输出写入inputfile.out，在退出之前删除临时文件。让我知道如果您有任何疑问：

#!/bin/bash 

ifn="${1:-dat/env4.dat}"   # set input filename (ifn) and validate 

[ -r "$ifn" ] || { 
    printf "\n Error: input file not readable. Usage: %s [<filename> (dat/env4.dat)]\n\n" "${0//*\//}" >&2 
    exit 1 
} 

## initialize variables 
tfn="/tmp/${ifn//*\//}.tmp"   # set temp filename (tfn) 
ofn="${ifn}.out"     # set output filename (ofn) 
:> "$ofn"       # truncate output file 
pdate=0        # initialize prior date 
cnt=0        # counter variable 
tos=240        # time offset in seconds (4 min.) 
tse=0        # time since epoch in seconds 

sort "$ifn" > "$tfn"    # sort input file into temp file & validate 

[ -r "$tfn" ] || { 
    printf "\n Error: sort failed to produce a tmp file or tmp file not readable\n\n" >&2 
    exit 1 
} 

## read temp file into index/idate and add 4 min to each successive duplicate 
while read -r index idate || [ -n "$idate" ]; do 

    if [ "$pdate" = "$idate" ]; then 
     tse=$(date -d "$idate" +%s) # get time since epoch for idate 
     cnt=$((cnt+1))    # increase counter 
     nos=$((cnt*tos))   # set new time offset (not Nitrous Oxide) 
     ntm=$((tse+nos))   # set new time including offset 
     # write new time to output 
     printf "%s\t%s\n" "$index" "$(date -d "@${ntm}" +"%F %T")" >> "$ofn" 
    else 
     cnt=0; nos=0    # reset counter and new time offset 
     # write output unchanged 
     printf "%s\t%s\n" "$index" "$idate" >> "$ofn" 
    fi 

    pdate="$idate"     # save current date/time as prior date/time 

done <"$tfn" 

[ -r "$tfn" ] && rm "$tfn"   # remove temp file

输入文件：

$ cat dat/env4.dat 
1414743351  2014-11-01 09:00:00 
1414743351  2014-10-31 09:15:51 
1414743351  2014-10-30 23:00:00 
1414743351  2014-10-31 09:15:51 
1414743351  2014-10-30 23:00:00 
1414743351  2014-10-31 10:25:00 
1414743351  2014-10-31 09:15:51 
1414743351  2014-11-01 10:25:00

输出文件：

$ cat dat/env4.dat.out 
1414743351  2014-10-30 23:00:00 
1414743351  2014-10-30 23:04:00 
1414743351  2014-10-31 09:15:51 
1414743351  2014-10-31 09:19:51 
1414743351  2014-10-31 09:23:51 
1414743351  2014-10-31 10:25:00 
1414743351  2014-11-01 09:00:00 
1414743351  2014-11-01 10:25:00

注：如果您想翻转重复，以便更大的抵消时间首先出现，你应该可以操作e在输出文件上。在offset while loop内执行此操作会使此问题的逻辑过于复杂。如果要在offset while loop中包含附加代码，基本方法是将先前日期和任何匹配日期存储在数组中，然后对阵列日期/时间值进行偏移并将其按相反顺序写出。每遇到新的日期/时间，请取消设置阵列。

附录包括电子邮件和调整现场

如果您有兴趣增加在输出到包括e-mail开头，然后包括在time adjustment的date portion和time portion之间在new date field，你可以通过简单地在开始添加电子邮件，然后通过分割返回date新串入date part和time part并在输出插入两者之间00:0n:00这样做比较容易。无论您使用printf还是echo都没有区别。 printf更加灵活，但有些时候echo也提供了优势。

注意：在下面的代码，我形成00:0n:000（与n是4, 8, etc..假定只有2重复如果有3个或更多，你将不得不处理该调整逻辑以形成00:nn:00如果调整。时间比8 minutes（例如12, 16, 20, ...为3rd, 4th, 5th, ...一式两份）。更大的

让我知道，如果你有进一步的问题。

## beginning part of script unchanged 
# tse=0        # time since epoch in seconds 
email="[email protected]"    # email to output 
adjtm=4        # simple value to provide adjustment in 00:04:00, etc. 

sort "$ifn" > "$tfn"    # sort input file into temp file & validate 

[ -r "$tfn" ] || { 
    printf "\n Error: sort failed to produce a tmp file or tmp file not readable\n\n" >&2 
    exit 1 
} 

## read temp file into index/idate and add 4 min to each successive duplicate 
while read -r index idate || [ -n "$idate" ]; do 

    if [ "$pdate" = "$idate" ]; then 
     tse=$(date -d "$idate" +%s) # get time since epoch for idate 
     cnt=$((cnt+1))    # increase counter 
     adj=$((cnt*adjtm))   # compute 4, 8, ... for 00:0n:00 output 
     nos=$((cnt*tos))   # set new time offset (not Nitrous Oxide) 
     ntm=$((tse+nos))   # set new time including offset 
     ndt="$(date -d "@${ntm}" +"%F %T")" # new date/time value 
     nd1=${ndt% *}    # date portion (first field) of ntd 
     nd2=${ndt#* }    # time portion (second filed) of ntd 
     ncmb="$nd1 00:0${adj}:00 $nd2" # new combined "date 00:0n:00 time" string 
     # write new time to output 
     printf "%s\t%s\t%s\n" "$email" "$index" "$ncmb" >> "$ofn" 
    else 
     cnt=0; nos=0    # reset counter and new time offset 
     nd1=${idate% *}    # date portion (first field) of idate 
     nd2=${idate#* }    # time portion (second filed) of idate 
     ncmb="$nd1 00:00:00 $nd2" # new combined "date 00:00:00 time" string (no adj) 
     # write output unchanged 
     printf "%s\t%s\t%s\n" "$email" "$index" "$ncmb" >> "$ofn" 
    fi 

    pdate="$idate"     # save current date as prior date 

done <"$tfn" 

[ -r "$tfn" ] && rm "$tfn"   # remove temp file

输出文件：（与相同输入）

$ bash env4-2.sh 
[email protected] 1414743351  2014-10-30 00:00:00 23:00:00 
[email protected] 1414743351  2014-10-30 00:04:00 23:04:00 
[email protected] 1414743351  2014-10-31 00:00:00 09:15:51 
[email protected] 1414743351  2014-10-31 00:04:00 09:19:51 
[email protected] 1414743351  2014-10-31 00:08:00 09:23:51 
[email protected] 1414743351  2014-10-31 00:00:00 10:25:00 
[email protected] 1414743351  2014-11-01 00:00:00 09:00:00 
[email protected] 1414743351  2014-11-01 00:00:00 10:25:00

谢谢你的脚本:-)。对我帮助很大。我已经有一段时间试图使脚本适应我的需求，而且我遇到了一些问题：printf“％s \ t％s \ n”“$ index”“$（date -d”@ $ { ntm}“+”％F％T“）”>>“$ ofn”我使用的字段多于两个。这是输出：[email protected] 186808 2014-11-02 00:04:00 12:06:00。旧的日期是2014-11-02 12:06:00，我得到00:04:00和12：006：00而不是2014-11-02 12:10:00。如有必要，我可以发布我的代码。 – 2014-11-01 22:51:52

无需发布。所有的改变都将由读取控制，然后在读取后进行串接。之前我写的addemdum，确认基本上我们只需要不考虑'00：04：00'，并考虑日期字符串是'2014年11月2日12：06：00'检查重复和调整时间的目的你的总字符串'[email protected] 186808 2014-11-02 00:04:00 12：06：00' – 2014-11-02 00:17:35

是的。我最初的字符串是：PG 1358 [email protected] 186808 2014年11月2日12点06分00秒0 2，它应该看看到底：PG 1358 [email protected] 186808 2014年11月2日12:10： 00 0 2我使用它写回是行：printf的 “％S \ t％S \ t％S \ t％S \ t％S \ t％S \ t％S \ n” “$ SISTEMA” “$ MENSAXE_ID”“$ EMAIL”“$ NUMERO_DESTINATARIOS”“$（/ bin/date -d”@ $ {NTM}“+”％F％T“）”“$ ESTADO”“$ GRUPO”>> pendientes.sorted但我得到：pg 1358 [email protected] 186808 2014-11-02 00:04:00 12:06:00 0 2你的脚本的作品真的很好。 Thx – 2014-11-02 00:37:12

修改文本文件中不是唯一行的字段

相关推荐