通过识别文件前4个字节的文件哈希

问题描述:

我想写一个python脚本来搜索我当前的目录,以确定它们的标题jpg,然后散列这些文件。我有点太过分了。任何建议,将不胜感激。通过识别文件前4个字节的文件哈希

from os import listdir, getcwd 
from os.path import isfile, join, normpath, basename 
import hashlib 

jpgHead = b'\xff\xd8\xff\xe0' 

def get_files(): 
    current_path = normpath(getcwd()) 
    return [join(current_path, f) for f in listdir(current_path) if 
isfile(join(current_path, f))] 

def checkJPG(): 
    checkJPG=checkJPG.read(4) 
    if checkJPG==jpgHead 
    get_hashes() 

def get_hashes(): 
    files = checkJPG() 
    list_of_hashes = [] 
    for each_file in files: 
     hash_md5 = hashlib.md5() 
     with open(each_file, "rb") as f: 
     list_of_hashes.append('Filename: {}\tHash: 
     {}\n'.format(basename(each_file), hash_md5.hexdigest())) 
     return list_of_hashes 

def write_jpgHashes(): 
    hashes=get_hashes() 
    with open('list_of_hashes.txt', 'w') as f: 
     for md5_hash in hashes: 
     f.write(md5_hash) 


if __name__ == '__main__': 

write_jpgHashes() 
+1

请检查https://stackoverflow.com/help/how-to-ask。您应该提供一个最小的,完整的和可验证的例子,以便其他人可以重现您的问题并针对特定问题提出问题;如何编写提供完整源代码的问题并不广泛。 –

+2

这段代码有几个问题。有几个块需要缩进,一个没有冒号的'if'语句,'get_files'被定义但从未被调用,'checkJPG'返回'None',但'get_hashes'希望它返回文件,还有那些2函数在无限递归循环中相互调用。等等。 –

+0

另外,'checkJPG = checkJPG.read(4)'很奇怪,而且你从来没有真正地散列任何文件数据。 –

我修改你的一些功能了一下,试试看

from os import listdir, getcwd 
from os.path import isfile, join, normpath, basename 
import hashlib 

jpgHead = b'\xff\xd8\xff\xe0' 

def get_files(path = getcwd()): 
    current_path = normpath(path) 
    return [ join(current_path, f) for f in listdir(current_path) if isfile(join(current_path, f)) ] 

def checkJPG(path): 
    with open(path, 'rb') as f : 
     header = f.read(4) 
    return header == jpgHead 

def get_hashes(): 
    list_of_hashes = [] 
    for each_file in get_files() : 
     if checkJPG(each_file) : 
      list_of_hashes.append('Filename: {}\tHash: {}\n'.format(each_file, md5hf(each_file))) 
    return list_of_hashes 

def md5hf(path): 
    #return hashlib.md5(open(path, "rb").read()).hexdigest() ## you can use this line for small files ## 
    hash_md5 = hashlib.md5() 
    with open(path, "rb") as f: 
     for chunk in iter(lambda : f.read(4096), b""): 
      hash_md5.update(chunk) 
    return hash_md5.hexdigest() 

def write_jpgHashes(): 
    hashes=get_hashes() 
    with open('list_of_hashes.txt', 'w') as f: 
     for md5_hash in hashes: 
      f.write(md5_hash) 

if __name__ == '__main__': 
    write_jpgHashes() 

注:

  1. 修正了一些语法和缩进错误
  2. 转身checkJPG为布尔
  3. 将文件的md5哈希值添加到list_of_hashesget_hashes
  4. 添加了md5hf功能,得到的MD5校验