使用Python计算目录的大小？

答

这抓起子目录：

import os 
def get_size(start_path = '.'): 
    total_size = 0 
    for dirpath, dirnames, filenames in os.walk(start_path): 
     for f in filenames: 
      fp = os.path.join(dirpath, f) 
      total_size += os.path.getsize(fp) 
    return total_size 

print get_size()

而且使用os.listdir一个有趣的oneliner（不包括子目录 ）：

import os 
sum(os.path.getsize(f) for f in os.listdir('.') if os.path.isfile(f))

参考：

os.path.getsize - 给人的SIZ E在字节

os.walk

更新要使用os.path.getsize，这比使用os.stat（）。st_size方法更清晰。

感谢ghostdog74指出这一点！

os.stat - st_size以字节为单位给出大小。也可以用来获取文件大小和其他文件相关的信息。

更新2015

scandir可用，并且可以比os.walk方法更快。包是可以从PyPI中，并os.scandir()要包含在Python 3.5：

https://pypi.python.org/pypi/scandir

+7

+1，但oneliner不返回有效的结果，因为它不是递归的 – luc 2009-09-08 10:19:12

+2

是的，它只是用于平面目录的情况。 – monkut 2009-09-08 10:23:08

+31

真正的乐趣，你可以做一个递归大小在一行： sum（os.path.getsize（os.path.join（dirpath，filename）） dirpath，dirnames，os.walk中的文件名（路径） for filename in filenames） – driax 2010-08-29 20:02:39

答

monknut答案是好的，但它不能在破符号链接，所以你也必须检查这条道路确实存在

if os.path.exists(fp): 
    total_size += os.stat(fp).st_size

+3

你可能不想遵循符号链接。你应该使用'lstat'。 – asmeurer 2014-03-18 20:44:07

答

为获得一个文件的大小，有os.path.getsize（）

>>> import os 
>>> os.path.getsize("/path/file") 
35L

其报道字节。

答

这是一个递归函数（递归地总结所有子文件夹及其各自文件的大小），它返回与运行“du -sb”时完全相同的字节。在linux（其中表示‘当前文件夹’，“”）：

import os 

def getFolderSize(folder): 
    total_size = os.path.getsize(folder) 
    for item in os.listdir(folder): 
     itempath = os.path.join(folder, item) 
     if os.path.isfile(itempath): 
      total_size += os.path.getsize(itempath) 
     elif os.path.isdir(itempath): 
      total_size += getFolderSize(itempath) 
    return total_size 

print "Size: " + str(getFolderSize("."))

+1

谢谢！很有用。 – 2011-02-27 01:59:17

+0

此函数计算的符号链接的尺寸过 - 如果你想跳过符号链接，你必须检查，这不是说：如果os.path.isfile（itempath）和os.path.islink（itempath）和 elif os.path.isdir（itempath）和os.path.islink（itempath）。 – airween 2015-08-23 19:02:34

答

接受的答案并没有考虑到硬或软链接，并且将两次计算这些文件。你想跟踪你已经看到的inode，而不是添加这些文件的大小。

import os 
def get_size(start_path='.'): 
    total_size = 0 
    seen = {} 
    for dirpath, dirnames, filenames in os.walk(start_path): 
     for f in filenames: 
      fp = os.path.join(dirpath, f) 
      try: 
       stat = os.stat(fp) 
      except OSError: 
       continue 

      try: 
       seen[stat.st_ino] 
      except KeyError: 
       seen[stat.st_ino] = True 
      else: 
       continue 

      total_size += stat.st_size 

    return total_size 

print get_size()

+4

考虑使用'os.lstat'（而不是'os.stat'），它避免了以下符号链接：[docs.python.org/2/library/os.html#os.lstat](http://docs。 python.org/2/library/os.html#os.lstat） – 2014-01-30 11:22:01

答

你可以做这样的事情：

import commands 
size = commands.getoutput('du -sh /path/').split()[0]

在这种情况下，我回来了，如果你想你可以用commands.getstatusoutput检查之前没有测试结果。

答

克里斯的回答是不错的，但可以通过使用一组检查看到目录，这也避免了使用异常的控制流进行更地道：

def directory_size(path): 
    total_size = 0 
    seen = set() 

    for dirpath, dirnames, filenames in os.walk(path): 
     for f in filenames: 
      fp = os.path.join(dirpath, f) 

      try: 
       stat = os.stat(fp) 
      except OSError: 
       continue 

      if stat.st_ino in seen: 
       continue 

      seen.add(stat.st_ino) 

      total_size += stat.st_size 

    return total_size # size in bytes

+2

Chris的回答也没有考虑符号链接和目录本身的大小。我已经编辑了相应的答案，固定函数的输出现在与'df -sb'完全相同。 – Creshal 2013-12-11 13:21:23

答

递归一行代码：

def getFolderSize(p): 
    from functools import partial 
    prepend = partial(os.path.join, p) 
    return sum([(os.path.getsize(f) if os.path.isfile(f) else getFolderSize(f)) for f in map(prepend, os.listdir(p))])

+2

虽然不是一个班轮...... – 2013-09-12 12:28:46

+1

虽然它不是一个班轮。但是，它会以字节为单位递归计算文件夹大小（即使文件夹内有多个文件夹）并给出正确的值。 – Venkatesh 2014-12-18 19:57:41

答

import os 

def get_size(path): 
    total_size = 0 
    for dirpath, dirnames, filenames in os.walk(path): 
     for f in filenames: 
      if os.path.exists(fp): 
       fp = os.path.join(dirpath, f) 
       total_size += os.path.getsize(fp) 

    return total_size # in megabytes

谢谢monkut & troex！这工作真的很好！

+0

此代码不会运行（f / fp变量中的拼写错误）并且不是递归的。 – oferlivny 2015-09-16 15:06:01

+0

这不会运行。在分配之前你指的是'fp'。它还会以字节返回'total_size'，而不是兆字节。 – freethebees 2018-02-27 10:20:32

答

该脚本告诉你哪个文件是CWD中最大的文件，并告诉你文件在哪个文件夹中。这个脚本对我的作品上的Win8和Python 3.3.3外壳

import os 

folder=os.cwd() 

number=0 
string="" 

for root, dirs, files in os.walk(folder): 
    for file in files: 
     pathname=os.path.join(root,file) 
##  print (pathname) 
##  print (os.path.getsize(pathname)/1024/1024) 
     if number < os.path.getsize(pathname): 
      number = os.path.getsize(pathname) 
      string=pathname 


##  print() 


print (string) 
print() 
print (number) 
print ("Number in bytes")

答

一些建议，到目前为止实现递归的方法，另一些采用外壳或不会产生整齐格式的结果。当您的代码是Linux平台的一次性代码时，您可以照常格式化，包括递归，作为一行代码。

du.py 
----- 
#!/usr/bin/python3 
import subprocess 

def du(path): 
    """disk usage in human readable format (e.g. '2,1GB')""" 
    return subprocess.check_output(['du','-sh', path]).split()[0].decode('utf-8') 

if __name__ == "__main__": 
    print(du('.'))

简单，高效，将文件和多级目录的工作：除了在最后一行print，它将为python2和python3当前版本的工作

$ chmod 750 du.py 
$ ./du.py 
2,9M

有点晚5年后，但因为这仍然在搜索引擎的hitlist中，它可能会有所帮助...

+7

Nb。仅限Linux。 – meawoppl 2015-05-18 17:41:30

+7

Python，本质上是跨平台的，可能应该避开这个 – 2015-09-11 23:48:52

+5

感谢这些评论。我添加了一些关于平台依赖性的答案。但是，如果是一次性脚本，大部分Python代码。这样的代码不应该带有功能限制，冗长且容易出错的段落，或者在边缘情况下出现罕见结果，只是为了便于携带_超出任何需求。它一如既往，是一种权衡，开发人员有责任明智地选择; – flaschbier 2015-09-18 04:51:16

答

对于问题的第二部分

def human(size): 

    B = "B" 
    KB = "KB" 
    MB = "MB" 
    GB = "GB" 
    TB = "TB" 
    UNITS = [B, KB, MB, GB, TB] 
    HUMANFMT = "%f %s" 
    HUMANRADIX = 1024. 

    for u in UNITS[:-1]: 
     if size < HUMANRADIX : return HUMANFMT % (size, u) 
     size /= HUMANRADIX 

    return HUMANFMT % (size, UNITS[-1])

答

一衬你说...... 这里是一个班轮：

sum([sum(map(lambda fname: os.path.getsize(os.path.join(directory, fname)), files)) for directory, folders, files in os.walk(path)])

虽然我可能会分裂出来，它不执行检查。

要转换为KB看到Reusable library to get human readable version of file size?和

答

工作，它的所有子目录下面的脚本打印目录大小为指定的目录。它也尝试从缓存递归函数的调用中获益（如果可能的话）。如果省略参数，该脚本将在当前目录中工作。输出按目录大小从大到小排序。所以你可以根据你的需要来调整它。

PS我用配方578019用于示出在人类友好的格式目录大小（http://code.activestate.com/recipes/578019/）

from __future__ import print_function 
import os 
import sys 
import operator 

def null_decorator(ob): 
    return ob 

if sys.version_info >= (3,2,0): 
    import functools 
    my_cache_decorator = functools.lru_cache(maxsize=4096) 
else: 
    my_cache_decorator = null_decorator 

start_dir = os.path.normpath(os.path.abspath(sys.argv[1])) if len(sys.argv) > 1 else '.' 

@my_cache_decorator 
def get_dir_size(start_path = '.'): 
    total_size = 0 
    if 'scandir' in dir(os): 
     # using fast 'os.scandir' method (new in version 3.5) 
     for entry in os.scandir(start_path): 
      if entry.is_dir(follow_symlinks = False): 
       total_size += get_dir_size(entry.path) 
      elif entry.is_file(follow_symlinks = False): 
       total_size += entry.stat().st_size 
    else: 
     # using slow, but compatible 'os.listdir' method 
     for entry in os.listdir(start_path): 
      full_path = os.path.abspath(os.path.join(start_path, entry)) 
      if os.path.isdir(full_path): 
       total_size += get_dir_size(full_path) 
      elif os.path.isfile(full_path): 
       total_size += os.path.getsize(full_path) 
    return total_size 

def get_dir_size_walk(start_path = '.'): 
    total_size = 0 
    for dirpath, dirnames, filenames in os.walk(start_path): 
     for f in filenames: 
      fp = os.path.join(dirpath, f) 
      total_size += os.path.getsize(fp) 
    return total_size 

def bytes2human(n, format='%(value).0f%(symbol)s', symbols='customary'): 
    """ 
    (c) http://code.activestate.com/recipes/578019/ 

    Convert n bytes into a human readable string based on format. 
    symbols can be either "customary", "customary_ext", "iec" or "iec_ext", 
    see: http://goo.gl/kTQMs 

     >>> bytes2human(0) 
     '0.0 B' 
     >>> bytes2human(0.9) 
     '0.0 B' 
     >>> bytes2human(1) 
     '1.0 B' 
     >>> bytes2human(1.9) 
     '1.0 B' 
     >>> bytes2human(1024) 
     '1.0 K' 
     >>> bytes2human(1048576) 
     '1.0 M' 
     >>> bytes2human(1099511627776127398123789121) 
     '909.5 Y' 

     >>> bytes2human(9856, symbols="customary") 
     '9.6 K' 
     >>> bytes2human(9856, symbols="customary_ext") 
     '9.6 kilo' 
     >>> bytes2human(9856, symbols="iec") 
     '9.6 Ki' 
     >>> bytes2human(9856, symbols="iec_ext") 
     '9.6 kibi' 

     >>> bytes2human(10000, "%(value).1f %(symbol)s/sec") 
     '9.8 K/sec' 

     >>> # precision can be adjusted by playing with %f operator 
     >>> bytes2human(10000, format="%(value).5f %(symbol)s") 
     '9.76562 K' 
    """ 
    SYMBOLS = { 
     'customary'  : ('B', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y'), 
     'customary_ext' : ('byte', 'kilo', 'mega', 'giga', 'tera', 'peta', 'exa', 
          'zetta', 'iotta'), 
     'iec'   : ('Bi', 'Ki', 'Mi', 'Gi', 'Ti', 'Pi', 'Ei', 'Zi', 'Yi'), 
     'iec_ext'  : ('byte', 'kibi', 'mebi', 'gibi', 'tebi', 'pebi', 'exbi', 
          'zebi', 'yobi'), 
    } 
    n = int(n) 
    if n < 0: 
     raise ValueError("n < 0") 
    symbols = SYMBOLS[symbols] 
    prefix = {} 
    for i, s in enumerate(symbols[1:]): 
     prefix[s] = 1 << (i+1)*10 
    for symbol in reversed(symbols[1:]): 
     if n >= prefix[symbol]: 
      value = float(n)/prefix[symbol] 
      return format % locals() 
    return format % dict(symbol=symbols[0], value=n) 

############################################################ 
### 
### main() 
### 
############################################################ 
if __name__ == '__main__': 
    dir_tree = {} 
    ### version, that uses 'slow' [os.walk method] 
    #get_size = get_dir_size_walk 
    ### this recursive version can benefit from caching the function calls (functools.lru_cache) 
    get_size = get_dir_size 

    for root, dirs, files in os.walk(start_dir): 
     for d in dirs: 
      dir_path = os.path.join(root, d) 
      if os.path.isdir(dir_path): 
       dir_tree[dir_path] = get_size(dir_path) 

    for d, size in sorted(dir_tree.items(), key=operator.itemgetter(1), reverse=True): 
     print('%s\t%s' %(bytes2human(size, format='%(value).2f%(symbol)s'), d)) 

    print('-' * 80) 
    if sys.version_info >= (3,2,0): 
     print(get_dir_size.cache_info())

示例输出：

37.61M .\subdir_b 
2.18M .\subdir_a 
2.17M .\subdir_a\subdir_a_2 
4.41K .\subdir_a\subdir_a_1 
---------------------------------------------------------- 
CacheInfo(hits=2, misses=4, maxsize=4096, currsize=4)

编辑：移动上述null_decorator，如user2233949推荐

+0

你的脚本运行良好，但你需要移动'if sys.version_info> = ...'行上面的null_decorator函数。否则，你会得到一个'null_decorator'未定义的异常。尽管如此，它仍然很棒。 – user2233949 2016-02-02 23:26:53

+0

@ user2233949，谢谢！我相应地修改了代码。 – MaxU 2016-02-02 23:51:04

答

Python 3.5使用递归文件夹大小os.scandir

def folder_size(path='.'): 
    total = 0 
    for entry in os.scandir(path): 
     if entry.is_file(): 
      total += entry.stat().st_size 
     elif entry.is_dir(): 
      total += folder_size(entry.path) 
    return total

+1

如果不担心递归性sum（[entry.stat（）。st_size用于os.scandir（file）]）中的条目，Python 3单线程方法。注意输出是以字节/ 1024来获得KB和/（1024 * 1024）来获得MB。 – weiji14 2017-10-31 21:43:08

+2

@ weiji14丢失括号，即'sum（entry.stat（）。st_size用于os.scandir（file）中的条目）'。没有理由做一个列表，因为'sum'也需要迭代器。 – 2017-11-17 10:16:39

答

晚会有点晚，但只有一行，前提是您安装了glob2和humanize。请注意，在Python 3中，默认iglob具有递归模式。如何修改Python 3的代码对读者来说只是一个简单的练习。

>>> import os 
>>> from humanize import naturalsize 
>>> from glob2 import iglob 
>>> naturalsize(sum(os.path.getsize(x) for x in iglob('/var/**')))) 
'546.2 MB'

答

不可否认，这是一种黑客行为，只适用于Unix/Linux。

它匹配du -sb .，因为实际上这是一个运行du -sb .命令的Python bash包装器。

import subprocess 

def system_command(cmd): 
    """"Function executes cmd parameter as a bash command.""" 
    p = subprocess.Popen(cmd, 
         stdout=subprocess.PIPE, 
         stderr=subprocess.PIPE, 
         shell=True) 
    stdout, stderr = p.communicate() 
    return stdout, stderr 

size = int(system_command('du -sb . ')[0].split()[0])

答

我使用python 2.7.13与scandir，这里是我的一行递归函数来得到一个文件夹的总大小：

from scandir import scandir 
def getTotFldrSize(path): 
    return sum([s.stat(follow_symlinks=False).st_size for s in scandir(path) if s.is_file(follow_symlinks=False)]) + \ 
    + sum([getTotFldrSize(s.path) for s in scandir(path) if s.is_dir(follow_symlinks=False)]) 

>>> print getTotFldrSize('.') 
1203245680

https://pypi.python.org/pypi/scandir

答

利用图书馆sh：模块du这样做：

pip install sh 

import sh 
print(sh.du("-s", ".")) 
91154728  .

如果您想通过asterix，请使用glob，如here所述。

的值转换在人类readables，使用humanize：

pip install humanize 

import humanize 
print(humanize.naturalsize(91157384)) 
91.2 MB

答

当子目录的大小计算，它应该更新其父母的文件夹的大小，并直到它到达根部父，这将继续下去。

以下函数计算文件夹及其所有子文件夹的大小。

import os 

def folder_size(path): 
    parent = {} # path to parent path mapper 
    folder_size = {} # storing the size of directories 
    folder = os.path.realpath(path) 

    for root, _, filenames in os.walk(folder): 
     if root == folder: 
      parent[root] = -1 # the root folder will not have any parent 
      folder_size[root] = 0.0 # intializing the size to 0 

     elif root not in parent: 
      immediate_parent_path = os.path.dirname(root) # extract the immediate parent of the subdirectory 
      parent[root] = immediate_parent_path # store the parent of the subdirectory 
      folder_size[root] = 0.0 # initialize the size to 0 

     total_size = 0 
     for filename in filenames: 
      filepath = os.path.join(root, filename) 
      total_size += os.stat(filepath).st_size # computing the size of the files under the directory 
     folder_size[root] = total_size # store the updated size 

     temp_path = root # for subdirectories, we need to update the size of the parent till the root parent 
     while parent[temp_path] != -1: 
      folder_size[parent[temp_path]] += total_size 
      temp_path = parent[temp_path] 

    return folder_size[folder]/1000000.0

答

对于它的价值......树命令做这一切免费：

tree -h --du /path/to/dir # files and dirs 
tree -h -d --du /path/to/dir # dirs only

我喜欢Python，但迄今为止这一问题最简单的解决方案不需要新的代码。

使用Python计算目录的大小？

相关推荐