Skip to content

The system file API

Python libraries use the system API for fundamental file operations and I encountered one example of subtlety in this with the program to scan for duplicate files that had been merged by Fslint using links. A restored tarball had symbolic links that were no longer valid. That caused problems.

OSError: [Errno 2] No such file or directory:

The problem was using os.stat() to determine how many links there were to a file in a tree walkthrough. That particular call, though, barfs if the file it is given to check is a link to a file that doesn’t exist. os.stat() tries to follow the link to find the file properties and returns an error if the file doesn’t exist. To get around this problem, the call needs to be to os.lstat() which only looks at the given file name and not what it is linked to.

This is a system dependent thing so Python will convert an os.lstat() call to os.stat() call if on a system that doesn’t support the difference. In Linux, you can find out more by checking man pages (command ‘man stat’). Also see the Python Docs 16.1. os — Miscellaneous operating system interfaces and scroll down to “16.1.5. Files and Directories.”

not following symlinks: If follow_symlinks is False, and the last element of the path to operate on is a symbolic link, the function will operate on the symbolic link itself instead of the file the link points to. (For POSIX systems, Python will call the l… version of the function.)

You can check whether or not follow_symlinks is supported on your platform using os.supports_follow_symlinks. If it is unavailable, using it will raise a NotImplementedError.

Chasing this particular issue also produced other enlightenment. Python os.walk and symlinks at Stack Overflow provided an example of what Python offers in modern programming concepts. It walks a file tree to sum file sizes.

for subdir, dirs, files in os.walk(sys.argv[1]):
    paths = (os.path.join(subdir, f) for f in files)
    space = sum(os.stat(path).st_size for path in paths if os.path.isfile(path))

that is explained

@Serg It’s not sum, it is the use of iterators that result in space efficiency. You can see that paths is a genexp. So this declarative style ensures that you are dealing with one object at a time while essentially talking about all at the same time. There are no intermediate container data structures. But in this example I used “may be”, as you can see, because there aren’t intermediate lists anyway. But has a side effect, space is changing. – C Panda Apr 7 ’16 at 19:20

Generators and iterators, … Oh, My!

and don’t forget assigning tuples to multiple variables in a single assignment. Those ideas are what separate Python from Fortran or Basic, the high level programming languages of my youth.

It is a good point made. By iterating through a generator, memory is saved because only one file is at bat at a time. The straightforward for loop gathers a list of all the files in the entire tree and that can take up a lot of memory. For the remove links utility, I’d replace the sum() with a file delete and add the number of links condition to the isfile() condition. That could be interesting to play with.