如何比较unix中不同列的文件?
问题描述:
我想比较Today.txt和Main.txt的文件名。 如果匹配,则用新文件打印所有匹配的Main.txt文件的6列,如matches.txt。如何比较unix中不同列的文件?
和不与Main.txt相匹配的文件,然后列出TODAY.txt的文件名和时间在一个新的文件说unmatched.txt
Main.txt
date filename timestamp space count status
Nov 4 +CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
今天。 TXT
filename time
CHCK01_20161104.txt 06:03
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
AR01_20161104.txt 09:36
AR02_20161104.txt 09:36
ifs01_20161104.txt 21:16
TRIPS11_20161104.txt 09:16
所需的输出: matched.txt
Nov 4 +CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
unmatched.txt
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt 21:16
能否请你帮我在这吗?
非常感谢!
答
awk
来救援!
$ awk 'FNR==1{next}
NR==FNR{a[$1]=$2; next}
$3 in a{print; delete a[$3]}
END{for(k in a) print k,a[k] > "unmatched"}' today main > matched
$ head *matched
==> matched <==
Nov 4 CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
==> unmatched <==
ifs01_20161104.txt 21:16
CHCK09_20161104.txt 21:46
CHCK05_20161104.txt 11:10
答
随着awk
,分别用于matched
和unmatched
$ awk 'NR==FNR{a[$1]; next} $3 in a{print > "matched.txt"}' Today.txt Main.txt
$ cat matched.txt
Nov 4 CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
$ awk 'NR==FNR{a[$3]; next} !($1 in a) && FNR>1{print > "unmatched.txt"}' Main.txt Today.txt
$ cat unmatched.txt
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt 21:16
- 逻辑是类似的,与第一文件变量的所需的列初始化数组
a
到awk
- 然后基于从第二文件是否文件名应存在或不存在于
a
中,打印到所需的输出文件
随着grep
和awk
组合:
$ grep -Ff <(awk 'NR>1{print $1}' Today.txt) Main.txt
Nov 4 CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
$ grep -vFf <(awk 'NR>1{print $3}' Main.txt) Today.txt | tail -n+2
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt 21:16
答
下面是使用管道功率的答案。
tail -n +2 /tmp/today | while read a b; do \
if ! grep $a /tmp/main >> /tmp/matched; then \
echo $a $b; \
fi; \
done > /tmp/unmatched
说明
打印的/ tmp /今天除了第一行
tail -n +2 /tmp/today
阅读两个变量
while read a b
文件用grep在$一个在/ tmp /主要和存储在一个文件中
grep $a /tmp/main >> /tmp/matched
如果grep的返回非零然后回声$ a和$ b
echo $a $b
输出:
[email protected]:~# cat /tmp/matched
Nov 4 CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
[email protected]:~# cat /tmp/unmatched
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt 21:16
[email protected]:~#
对于制表符分隔的输出可以设置'-v OFS ='\ t'' – karakfa
我有个问题给你。我正在使用加号(+)符号打印处于进度目录中的文件,如示例。 11月4日+ CHCK01_20161104.txt 06:39 2.15M 17153 on_time 处于进展中的文件将附加加号(+)符号和其他文件将在main.txt中具有相同的名称 我希望文件带有+符号和其他文件在我需要的输出(匹配),请建议如何比较main.txt和Today.txt以获得匹配和unmatched.txt?非常感谢 ! – Barcode