如何比较unix中不同列的文件?

问题描述:

我想比较Today.txt和Main.txt的文件名。 如果匹配,则用新文件打印所有匹配的Main.txt文件的6列,如matches.txt。如何比较unix中不同列的文件?

和不与Main.txt相匹配的文件,然后列出TODAY.txt的文件名和时间在一个新的文件说unmatched.txt

Main.txt

date  filename   timestamp space count status 
Nov 4 +CHCK01_20161104.txt 06:39 2.15M 17153 on_time 
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24  On_Time 
Nov 4 AR02_20161104.txt 09:31 0.00M 7  On_Time 
Nov 4 AR01_20161104.txt 09:31 0.04M 433  On_Time 

今天。 TXT

filename  time 
CHCK01_20161104.txt 06:03 
CHCK05_20161104.txt 11:10 
CHCK09_20161104.txt 21:46 
AR01_20161104.txt 09:36 
AR02_20161104.txt 09:36 
ifs01_20161104.txt 21:16 
TRIPS11_20161104.txt 09:16 

所需的输出: matched.txt

Nov 4 +CHCK01_20161104.txt 06:39 2.15M 17153 on_time 
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24  On_Time 
Nov 4 AR02_20161104.txt 09:31 0.00M 7  On_Time 
Nov 4 AR01_20161104.txt 09:31 0.04M 433  On_Time 

unmatched.txt

CHCK05_20161104.txt 11:10 
CHCK09_20161104.txt 21:46 
ifs01_20161104.txt 21:16 

能否请你帮我在这吗?

非常感谢!

awk来救援!

$ awk 'FNR==1{next} 
     NR==FNR{a[$1]=$2; next} 
     $3 in a{print; delete a[$3]} 
      END{for(k in a) print k,a[k] > "unmatched"}' today main > matched 

$ head *matched 

==> matched <== 
Nov 4 CHCK01_20161104.txt 06:39 2.15M 17153 on_time 
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24  On_Time 
Nov 4 AR02_20161104.txt 09:31 0.00M 7  On_Time 
Nov 4 AR01_20161104.txt 09:31 0.04M 433  On_Time 

==> unmatched <== 
ifs01_20161104.txt 21:16 
CHCK09_20161104.txt 21:46 
CHCK05_20161104.txt 11:10 
+0

对于制表符分隔的输出可以设置'-v OFS ='\ t'' – karakfa

+0

我有个问题给你。我正在使用加号(+)符号打印处于进度目录中的文件,如示例。 11月4日+ CHCK01_20161104.txt 06:39 2.15M 17153 on_time 处于进展中的文件将附加加号(+)符号和其他文件将在main.txt中具有相同的名称 我希望文件带有+符号和其他文件在我需要的输出(匹配),请建议如何比较main.txt和Today.txt以获得匹配和unmatched.txt?非常感谢 ! – Barcode

随着awk,分别用于matchedunmatched

$ awk 'NR==FNR{a[$1]; next} $3 in a{print > "matched.txt"}' Today.txt Main.txt 
$ cat matched.txt 
Nov 4 CHCK01_20161104.txt 06:39 2.15M 17153 on_time 
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24  On_Time 
Nov 4 AR02_20161104.txt 09:31 0.00M 7  On_Time 
Nov 4 AR01_20161104.txt 09:31 0.04M 433  On_Time 

$ awk 'NR==FNR{a[$3]; next} !($1 in a) && FNR>1{print > "unmatched.txt"}' Main.txt Today.txt 
$ cat unmatched.txt 
CHCK05_20161104.txt 11:10 
CHCK09_20161104.txt 21:46 
ifs01_20161104.txt 21:16 
  • 逻辑是类似的,与第一文件变量的所需的列初始化数组aawk
  • 然后基于从第二文件是否文件名应存在或不存在于a中,打印到所需的输出文件


随着grepawk组合:

$ grep -Ff <(awk 'NR>1{print $1}' Today.txt) Main.txt 
Nov 4 CHCK01_20161104.txt 06:39 2.15M 17153 on_time 
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24  On_Time 
Nov 4 AR02_20161104.txt 09:31 0.00M 7  On_Time 
Nov 4 AR01_20161104.txt 09:31 0.04M 433  On_Time 

$ grep -vFf <(awk 'NR>1{print $3}' Main.txt) Today.txt | tail -n+2 
CHCK05_20161104.txt 11:10 
CHCK09_20161104.txt 21:46 
ifs01_20161104.txt 21:16 

下面是使用管道功率的答案。

tail -n +2 /tmp/today | while read a b; do \ 
    if ! grep $a /tmp/main >> /tmp/matched; then \ 
     echo $a $b; \ 
    fi; \ 
done > /tmp/unmatched 

说明

打印的/ tmp /今天除了第一行

tail -n +2 /tmp/today 

阅读两个变量

while read a b 

文件用grep在$一个在/ tmp /主要和存储在一个文件中

grep $a /tmp/main >> /tmp/matched 

如果grep的返回非零然后回声$ a和$ b

echo $a $b 

输出:

[email protected]:~# cat /tmp/matched 
Nov 4 CHCK01_20161104.txt 06:39 2.15M 17153 on_time 
Nov 4 AR01_20161104.txt 09:31 0.04M 433  On_Time 
Nov 4 AR02_20161104.txt 09:31 0.00M 7  On_Time 
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24  On_Time 
[email protected]:~# cat /tmp/unmatched 
CHCK05_20161104.txt 11:10 
CHCK09_20161104.txt 21:46 
ifs01_20161104.txt 21:16 
[email protected]:~#