So I’m adding more capabilities to my sysinfo.py program. The next thing that I want to do is get a JSON result from df
. This is a function whose description, from the man page, says “report file system disk space usage”.
Here is a sample of the output of df for one of my systems:
Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/flapjack-root 959088096 3802732 906566516 1% / udev 1011376 4 1011372 1% /dev tmpfs 204092 288 203804 1% /run none 5120 0 5120 0% /run/lock none 1020452 0 1020452 0% /run/shm /dev/sda1 233191 50734 170016 23% /boot
So I started by writing a little Python program that used the subprocess.check_output()
method to capture the output of df
.
This went through various iterations and ended up with this single line of python code, which requires eleven lines of comments to explain it:
# # this next line of code is pretty tense ... let me explain what # it does: # subprocess.check_output(["df"]) runs the df command and returns # the output as a string # rstrip() trims of the last whitespace character, which is a 'n' # split('n') breaks the string at the newline characters ... the # result is an array of strings # the list comprehension then applies shlex.split() to each string, # breaking each into tokens # when we're done, we have a two-dimensional array with rows of # tokens and we're ready to make objects out of them # df_array = [shlex.split(x) for x in subprocess.check_output(["df"]).rstrip().split('n')]
My original df.py
code constructed the JSON result manually, a painfully finicky process. After I got it running I remembered a lesson I learned from my dear friend the late David Nochlin, namely that I should construct an object and then use a rendering library to create the JSON serialization.
So I did some digging around and discovered that the Python json
library includes a fairly sensible serialization method that supports prettyprinting of the result. The result was a much cleaner piece of code:
# df.py # # parse the output of df and create JSON objects for each filesystem. # # $Id: df.py,v 1.5 2014/09/03 00:41:31 marc Exp $ # # now let's parse the output of df to get filesystem information # # Filesystem 1K-blocks Used Available Use% Mounted on # /dev/mapper/flapjack-root 959088096 3799548 906569700 1% / # udev 1011376 4 1011372 1% /dev # tmpfs 204092 288 203804 1% /run # none 5120 0 5120 0% /run/lock # none 1020452 0 1020452 0% /run/shm # /dev/sda1 233191 50734 170016 23% /boot import subprocess import shlex import json def main(): """Main routine - call the df utility and return a json structure.""" # this next line of code is pretty tense ... let me explain what # it does: # subprocess.check_output(["df"]) runs the df command and returns # the output as a string # rstrip() trims of the last whitespace character, which is a 'n' # split('n') breaks the string at the newline characters ... the # result is an array of strings # the list comprehension then applies shlex.split() to each string, # breaking each into tokens # when we're done, we have a two-dimensional array with rows of # tokens and we're ready to make objects out of them df_array = [shlex.split(x) for x in subprocess.check_output(["df"]).rstrip().split('n')] df_num_lines = df_array[:].__len__() df_json = {} df_json["filesystems"] = [] for row in range(1, df_num_lines): df_json["filesystems"].append(df_to_json(df_array[row])) print json.dumps(df_json, sort_keys=True, indent=2) return def df_to_json(tokenList): """Take a list of tokens from df and return a python object.""" # If df's ouput format changes, we'll be in trouble, of course. # the 0 token is the name of the filesystem # the 1 token is the size of the filesystem in 1K blocks # the 2 token is the amount used of the filesystem # the 5 token is the mount point result = {} fsName = tokenList[0] fsSize = tokenList[1] fsUsed = tokenList[2] fsMountPoint = tokenList[5] result["filesystem"] = {} result["filesystem"]["name"] = fsName result["filesystem"]["size"] = fsSize result["filesystem"]["used"] = fsUsed result["filesystem"]["mount_point"] = fsMountPoint return result if __name__ == '__main__': main()
which, in turn, produces a rather nice df output in JSON.
{ "filesystems": [ { "filesystem": { "mount_point": "/", "name": "/dev/mapper/flapjack-root", "size": "959088096", "used": "3802632" } }, { "filesystem": { "mount_point": "/dev", "name": "udev", "size": "1011376", "used": "4" } }, { "filesystem": { "mount_point": "/run", "name": "tmpfs", "size": "204092", "used": "288" } }, { "filesystem": { "mount_point": "/run/lock", "name": "none", "size": "5120", "used": "0" } }, { "filesystem": { "mount_point": "/run/shm", "name": "none", "size": "1020452", "used": "0" } }, { "filesystem": { "mount_point": "/boot", "name": "/dev/sda1", "size": "233191", "used": "50734" } } ] }
Quite a lot of fun, really.
/proc/filesystems seems easier to parse.
Interesting. The /proc/filesystems on my systems (Ubuntu 12.04 and 14.04) seem just to list the different filesystem protocols available in my kernel, not the actual mounted file systems on my machines.
One of the things I want to do is track filesystem space usage to avoid getting stuck when something fills up.
I’ve poked around in the documentation and don’t see anything obvious to list the actual mounted file systems and their sizes. What am I missing?