遍历两个外部文件列表并在bash脚本中执行命令
问题描述:
我正在创建一个脚本,可以将filelist1(tar文件列表)和filelist2(目录列表)。我需要遍历/读取这些文件列表,并将filelist1 mv中的第一个文件放在filelist2中的第一个目录中。一旦那里,我将提取和执行此文件夹中的文件的其他活动。试图自动化,因为我将每天有130个加上tar文件,每个文件包含75到200个必须处理的文件。下面是我在制品(WIP)工作的脚本:遍历两个外部文件列表并在bash脚本中执行命令
#############################################################################
#############################################################################
#
# Incremental load script v1
# Created 02/09/2015 NHR
#
#############################################################################
#############################################################################
#
# Clean up before running
#
# "/u02/hdfs_staging/ios/incremental/TOPACTR_DeltaFiles"
#
if [ -f filelist1 ] ; then
rm filelist1
fi
if [ -f filelist2 ] ; then
rm filelist2
fi
#
# Create filelist containing name of files parsed for dir's loaded from kdwxxxx
#
for i in *tar
do
echo "$i" | rev | cut -d"." -f2 | rev >> filelist1
done
#
# Create work dir's for extracting tar files into for each date
#
while IFS= read -r file
do
[ ! -d "$file" ] && mkdir "$file"
done < "/u02/hdfs_staging/ios/incremental/TOPACTR_DeltaFiles/filelist1"
#
# Create filelist2 containing name of files parsed to copy
# tar files to dir's for extraction
#
shopt -s nullglob # Bash extension, so that empty glob matches will work
for file in ./*.tar ; do # Use this, NOT "for file in *"
echo "$file" >> filelist2
done
#
# Copy and Decompress tar files in these new dir's
# HERE IS WHERE I NEED TO LOOP THROUGH THE FILELIST1 AND FILELIST2
# AND PERFORM ADDITIONAL COMMANDS
#
#
# Execute hive load to external table script to load incremental files to ios_incremental.
# The ios_incremental database tables for these files is in place.
#
#hive -e CREATE EXTERNAL TABLE $filelist
#
# Run hive SQL script to add changed files to ios_staging tables.
# This will be called from a hql script file and will require variables
# for each table involved. This view combines record sets from both the
# Base (base_table) and Change (incremental_table) tables and is reduced
# only to the most recent records for each unique .id. It is defined as
# follows:
#
#hive -e
# CREATE VIEW reconcile_view AS
# SELECT t1.* FROM
# (SELECT * FROM base_table
# UNION ALL
# SELECT * FROM incremental_table) t1
# JOIN
# (SELECT id, max(modified_date) max_modified FROM
# (SELECT * FROM base_table
# UNION ALL
# SELECT * FROM incremental_table) t2
# GROUP BY id) s
# ON t1.id = s.id AND t1.modified_date = s.max_modified;
#
#
# Copy updated ios_staging data to update ios_prod db
#
#
# Clean and Archive files to get ready for next incremental load
#
答
我想你在找什么通过两个列表同时进行迭代。
下面是做这件事的一种方式,它假定文件名不要么换行或在其名称中冒号(很容易结肠更改为其他符号):
paste -d: filelist1 filelist2 | while IFS=: read -r file1 file2; do
some_command "$file1" "$file2"
# ...
done
一更多的防御解决方案是将列表放入数组而不是文件中,然后迭代for循环。 (我离开了阵列的建立;有很多的例子对SO):
for ((i=0;i<${#filearray1[@]};++i)); do
file1="${filearray1[i]}"
file2="${filearray2[i]}"
some_command "$file1" "$file2"
# ...
done
答
也许是这样的(明显缺乏错误检查):
exec 3< filelist1 4< filelist2
while read -u3 tarfile
do
read -u4 destination
mv "${tarfile}" "${destination}"/.
(cd "${destination}"
# ... other stuff
) # subshell is to avoid having to cd back where you came from
done
exec 3<&- 4<&-
那么,是你的问题? – leigero 2015-02-10 17:30:51