遍历两个外部文件列表并在bash脚本中执行命令

问题描述：

我正在创建一个脚本，可以将filelist1（tar文件列表）和filelist2（目录列表）。我需要遍历/读取这些文件列表，并将filelist1 mv中的第一个文件放在filelist2中的第一个目录中。一旦那里，我将提取和执行此文件夹中的文件的其他活动。试图自动化，因为我将每天有130个加上tar文件，每个文件包含75到200个必须处理的文件。下面是我在制品（WIP）工作的脚本：遍历两个外部文件列表并在bash脚本中执行命令

############################################################################# 
############################################################################# 
# 
# Incremental load script v1 
# Created 02/09/2015 NHR 
# 
############################################################################# 
############################################################################# 

# 
# Clean up before running 
# 
# "/u02/hdfs_staging/ios/incremental/TOPACTR_DeltaFiles" 
# 


if [ -f filelist1 ] ; then 
    rm filelist1 
fi 

if [ -f filelist2 ] ; then 
    rm filelist2 
fi 

# 
# Create filelist containing name of files parsed for dir's loaded from kdwxxxx 
# 
for i in *tar 
    do 
     echo "$i" | rev | cut -d"." -f2 | rev >> filelist1 
    done 

# 
# Create work dir's for extracting tar files into for each date 
# 
while IFS= read -r file 
    do 
     [ ! -d "$file" ] && mkdir "$file" 
    done < "/u02/hdfs_staging/ios/incremental/TOPACTR_DeltaFiles/filelist1" 

# 
# Create filelist2 containing name of files parsed to copy 
# tar files to dir's for extraction 
# 
shopt -s nullglob     # Bash extension, so that empty glob matches will work 
    for file in ./*.tar ; do  # Use this, NOT "for file in *" 
     echo "$file" >> filelist2 
    done 

# 
# Copy and Decompress tar files in these new dir's 
# HERE IS WHERE I NEED TO LOOP THROUGH THE FILELIST1 AND FILELIST2 
# AND PERFORM ADDITIONAL COMMANDS 
# 



# 
# Execute hive load to external table script to load incremental files to ios_incremental. 
# The ios_incremental database tables for these files is in place. 
# 


#hive -e CREATE EXTERNAL TABLE $filelist 


# 
# Run hive SQL script to add changed files to ios_staging tables. 
# This will be called from a hql script file and will require variables 
# for each table involved. This view combines record sets from both the 
# Base (base_table) and Change (incremental_table) tables and is reduced 
# only to the most recent records for each unique .id. It is defined as 
# follows: 
# 

#hive -e 
# CREATE VIEW reconcile_view AS 
# SELECT t1.* FROM 
# (SELECT * FROM base_table 
#   UNION ALL 
#   SELECT * FROM incremental_table) t1 
# JOIN 
#  (SELECT id, max(modified_date) max_modified FROM 
#   (SELECT * FROM base_table 
#   UNION ALL 
#   SELECT * FROM incremental_table) t2 
#  GROUP BY id) s 
# ON t1.id = s.id AND t1.modified_date = s.max_modified; 
# 


# 
# Copy updated ios_staging data to update ios_prod db 
# 



# 
# Clean and Archive files to get ready for next incremental load 
#

那么，是你的问题？ – leigero 2015-02-10 17:30:51

答

我想你在找什么通过两个列表同时进行迭代。

下面是做这件事的一种方式，它假定文件名不要么换行或在其名称中冒号（很容易结肠更改为其他符号）：

paste -d: filelist1 filelist2 | while IFS=: read -r file1 file2; do 
    some_command "$file1" "$file2" 
    # ... 
done

一更多的防御解决方案是将列表放入数组而不是文件中，然后迭代for循环。（我离开了阵列的建立;有很多的例子对SO）：

for ((i=0;i<${#filearray1[@]};++i)); do 
    file1="${filearray1[i]}" 
    file2="${filearray2[i]}" 
    some_command "$file1" "$file2" 
    # ... 
done

答

也许是这样的（明显缺乏错误检查）：

exec 3< filelist1 4< filelist2 

while read -u3 tarfile 
do 
    read -u4 destination 
    mv "${tarfile}" "${destination}"/. 
    (cd "${destination}" 
    # ... other stuff 
) # subshell is to avoid having to cd back where you came from 
done 

exec 3<&- 4<&-

遍历两个外部文件列表并在bash脚本中执行命令

相关推荐