awk命令不能正常工作，输出错误，sed命令？

问题描述：

我在我的数据集上试过这个小脚本，出于某种原因，我没有得到所需的输出结果？有人可以看看吗？也许你可以弄明白吗？另外如果你可以提供SED命令解决方案。awk命令不能正常工作，输出错误，sed命令？

脚本

awk -v RS= -F '<connection name="|<hostPort>' ' 
{ 
sub(/".*/, "", $2) 
split($3, tokens, /[:<]/) 
printf "%-6s %s %s\n", $2, tokens[1], tokens[2] 
} 
'

输入

<hostPort>srv1:33333</hostPort> 
<hostPort>srv2:33333</hostPort> 
<connection name="boing_ny__Primary__" transport="tcp"> 
<hostPort>srv1:33333</hostPort> 
<connection name="boing_ny__Backup__" transport="tcp"> 
<hostPort>srv2:33333</hostPort> 
<connection name="boy_ny__Primary__" transport="tcp"> 
<hostPort>srv1:6666</hostPort> 
<connection name="boy_ny__Backup__" transport="tcp"> 
<hostPort>srv2:6666</hostPort> 
<connection name="song_ny__Primary__" transport="tcp"> 
<hostPort>srv1:55555</hostPort> 
<connection name="song_ny__Backup__" transport="tcp"> 
<hostPort>srv2:55555</hostPort> 
<connection name="bob_ny__Primary__" transport="tcp"> 
<hostPort>srv3:33333</hostPort> 
<connection name="bob_ny__Backup__" transport="tcp"> 
<hostPort>srv4:33333</hostPort> 
<hostPort>srv1:4444</hostPort> 
<hostPort>srv2:4444</hostPort> 
<hostPort>srv1:4444</hostPort>

电流输出

srv1:33333</hostPort> 
srv2 33333

期望的输出

boing_ny__Primary__ srv1 33333 
boing_ny__Backup__ srv2 33333 
boy_ny__Primary__  srv1 6666 
boy_ny__Backup__ srv2 6666 
song_ny__Primary__ srv1 55555 
song_ny__Backup__ srv2 55555 
bob_ny__Primary__ srv3 33333 
bob_ny__Backup__ srv4 33333

使用XML解析器不AWK/SED – anubhava

.UP给你测试/使用/投掷'AWK -v RS = ''“{$ 1 = $ 1 ;匹配（$ 0，/ connection name =“（[^”] +）。* （。*）/，a）; gsub（/：/，“”，a [2]）;} length（a [1 ]）{print a [1]，a [2]}' –

请用散文形容你的小脚本的目的。这将帮助读者理解代码（或其中的错误），并可能帮助您找到问题。使用各种角度来理解你自己的代码和它的不当行为。它类似于https://ericlippert.com/2014/03/05/how-to-debug-small-programs/ – Yunnosch

答

作为评论指出做这项工作的正确方法是使用适当的解析器。

至于实验，这个GNU awk似乎是在提供输入数据的情况下完成这项工作，但不能保证可靠的解决方案，因为XML数据可能会在您的文件中有所不同。

awk '/connection name=/{a=$0;getline; \ 
print gensub(/(.*connection name=["])(.[^"]*)(["].*)/,"\\2","g",a), \ 
gensub(/(<.*>)(.[^:]*)([:])(.[^<]*)(<[/].*>)/,"\\2 \\4","g",$0)}' file1 

#Output: 
boing_ny__Primary__ srv1 33333 
boing_ny__Backup__ srv2 33333 
boy_ny__Primary__ srv1 6666 
boy_ny__Backup__ srv2 6666 
song_ny__Primary__ srv1 55555 
song_ny__Backup__ srv2 55555 
bob_ny__Primary__ srv3 33333 
bob_ny__Backup__ srv4 33333

如何工作的：
当我们发现含有/connection name=/记录，我们这个记录$0存储到一个变量a，我们得到与getline下一行，然后我们使用和打印两sed的喜欢使用gensub换人：

gensub(/(.*connection name=["])(.[^"]*)(["].*)/,"\\2","g",a) 
#all chars up to first " --|  |  |  | | | 
#after " and up to the next "------|  |  | | | 
#after last " up to the end of $0 --------|  | | | 
#replace with group 2 ----------------------------| | | 
#global replacement------------------------------------| | 
#target = a = previous record-----------------------------| 

#With a = <connection name="boing_ny__Primary__" transport="tcp"> 
#Above gensub will return group2 = boing_ny__Primary__ 


gensub    (/(<.*>)(.[^:]*)([:])(.[^<]*)(<[/].*>)/,"\\2 \\4","g",$0) 
#all chars between < >--|  |  |  |  |   |  | | 
#all chars up to : -------------|  |  |  |   |  | | 
#literal : ---------------------------|  |  |   |  | | 
#the part after : and before < -------------|  |   |  | | 
#the last < > part ----------------------------------|   |  | | 
#use group 2 and 4 ---------------------------------------------|  | | 
#global replacement ---------------------------------------------------| | 
#target = $0 current record ----------------------------------------------| 

#With $0 = <hostPort>srv2:33333</hostPort> 
#Above gensub will return group 2 = srv2 and group 4 = 33333 --> srv2 33333

一般AWK gensub synthax是gensub(regexp, replacement, how [, target])和取代一部分返回/施加在gensub功能 - see man page of gensub.

@theuniverseisflat重新更新 - 错误的概念之前 - 现在应该没问题。 –

+ ve很好的解释:) – RavinderSingh13

答

尝试：

awk '/connection/{match($0,/"[^"]*/);VAL=substr($0,RSTART+1,RLENGTH-1);next} /hostPort/ && VAL{match($0,/>.*</);print VAL FS substr($0,RSTART+1,RLENGTH-2)}' Input_file

，很快会补充解释。

EDIT2：以下是相同的解释。

awk '/connection/{             #### Looking for a line which has string connection in it. 
         match($0,/"[^"]*/);       #### Using match function here to match a regex where it starts from " and looks for first occurrence of ". 
         VAL=substr($0,RSTART+1,RLENGTH-1);   #### Now creating a variable named VAL whose value is substring of RSTART and LENGTH, where RLENGTH and RSTART are the default keywords of awk and they will be SET when a REGEX match is found. RSTART will give the index of starting point of match and RLENGTH will give the length of that regex match. 
         next           #### Using next keyword to skip all further statements.        
       } 
    /hostPort/ && VAL{            #### Checking here 2 conditions, it checks for a line which has hostport string and value of variable VAL is NOT NULL, if these conditions are TRUE then perform following actions. 
         match($0,/>.*</);        #### using match function of awk to get the srv values so putting here regex so match from >.*< get everything between > to <. 
         print VAL FS substr($0,RSTART+1,RLENGTH-2) #### printing value of VAL(which got created in previous condition) then printing the substring of RSTART and RLENGTH values here. 
        } 
    ' Input_file              #### Mentioning the Input_file here.

请使用换行符来提高可读性，而不需要太多滚动。如果两个版本之间的唯一区别是评论，那么为什么不删除第一个（和第二个）版本？ – Yunnosch

@Yunnosch：他们之所以被分开是因为评论版本不会被执行，它只是出于理解的目的，通常我会在给这个答案的时候提供这个答案而忘了提及它。 – RavinderSingh13

啊。我没有意识到awk没有评论功能。这当然是一个很好的理由。感谢您的解释。 – Yunnosch

答

稳健： cat input | awk -F'\"|>' '{print $2}' | awk -F'<' '{print $1}' | sed -z 's/_\n/_ /g' | grep -v ^srv | tr ":" " "

你可以在命令中减少很多事情。 – RavinderSingh13

plz让我知道 – moni

awk，sed或grep，他们能够自己读取文件，你在这里完成了UUOC（无用的猫）。作为一个初步的观点，我们可以明确地使用awk或sed来做到这一点。 – RavinderSingh13

答

$ awk -F'[":<>]' '/hostPort/{if (n!="") print n, $3, $4; n=""; next} {n=$3}' file 
boing_ny__Primary__ srv1 33333 
boing_ny__Backup__ srv2 33333 
boy_ny__Primary__ srv1 6666 
boy_ny__Backup__ srv2 6666 
song_ny__Primary__ srv1 55555 
song_ny__Backup__ srv2 55555 
bob_ny__Primary__ srv3 33333 
bob_ny__Backup__ srv4 33333

awk命令不能正常工作，输出错误，sed命令？

相关推荐