So I’m adding more capabilities to my sysinfo.py program. The next thing that I want to do is get a JSON result from df
. This is a function whose description, from the man page, says “report file system disk space usage”.
Here is a sample of the output of df for one of my systems:
Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/flapjack-root 959088096 3802732 906566516 1% / udev 1011376 4 1011372 1% /dev tmpfs 204092 288 203804 1% /run none 5120 0 5120 0% /run/lock none 1020452 0 1020452 0% /run/shm /dev/sda1 233191 50734 170016 23% /boot
So I started by writing a little Python program that used the subprocess.check_output()
method to capture the output of df
.
This went through various iterations and ended up with this single line of python code, which requires eleven lines of comments to explain it:
# # this next line of code is pretty tense ... let me explain what # it does: # subprocess.check_output(["df"]) runs the df command and returns # the output as a string # rstrip() trims of the last whitespace character, which is a 'n' # split('n') breaks the string at the newline characters ... the # result is an array of strings # the list comprehension then applies shlex.split() to each string, # breaking each into tokens # when we're done, we have a two-dimensional array with rows of # tokens and we're ready to make objects out of them # df_array = [shlex.split(x) for x in subprocess.check_output(["df"]).rstrip().split('n')]
My original df.py
code constructed the JSON result manually, a painfully finicky process. After I got it running I remembered a lesson I learned from my dear friend the late David Nochlin, namely that I should construct an object and then use a rendering library to create the JSON serialization.
So I did some digging around and discovered that the Python json
library includes a fairly sensible serialization method that supports prettyprinting of the result. The result was a much cleaner piece of code:
# df.py # # parse the output of df and create JSON objects for each filesystem. # # $Id: df.py,v 1.5 2014/09/03 00:41:31 marc Exp $ # # now let's parse the output of df to get filesystem information # # Filesystem 1K-blocks Used Available Use% Mounted on # /dev/mapper/flapjack-root 959088096 3799548 906569700 1% / # udev 1011376 4 1011372 1% /dev # tmpfs 204092 288 203804 1% /run # none 5120 0 5120 0% /run/lock # none 1020452 0 1020452 0% /run/shm # /dev/sda1 233191 50734 170016 23% /boot import subprocess import shlex import json def main(): """Main routine - call the df utility and return a json structure.""" # this next line of code is pretty tense ... let me explain what # it does: # subprocess.check_output(["df"]) runs the df command and returns # the output as a string # rstrip() trims of the last whitespace character, which is a 'n' # split('n') breaks the string at the newline characters ... the # result is an array of strings # the list comprehension then applies shlex.split() to each string, # breaking each into tokens # when we're done, we have a two-dimensional array with rows of # tokens and we're ready to make objects out of them df_array = [shlex.split(x) for x in subprocess.check_output(["df"]).rstrip().split('n')] df_num_lines = df_array[:].__len__() df_json = {} df_json["filesystems"] = [] for row in range(1, df_num_lines): df_json["filesystems"].append(df_to_json(df_array[row])) print json.dumps(df_json, sort_keys=True, indent=2) return def df_to_json(tokenList): """Take a list of tokens from df and return a python object.""" # If df's ouput format changes, we'll be in trouble, of course. # the 0 token is the name of the filesystem # the 1 token is the size of the filesystem in 1K blocks # the 2 token is the amount used of the filesystem # the 5 token is the mount point result = {} fsName = tokenList[0] fsSize = tokenList[1] fsUsed = tokenList[2] fsMountPoint = tokenList[5] result["filesystem"] = {} result["filesystem"]["name"] = fsName result["filesystem"]["size"] = fsSize result["filesystem"]["used"] = fsUsed result["filesystem"]["mount_point"] = fsMountPoint return result if __name__ == '__main__': main()
which, in turn, produces a rather nice df output in JSON.
{ "filesystems": [ { "filesystem": { "mount_point": "/", "name": "/dev/mapper/flapjack-root", "size": "959088096", "used": "3802632" } }, { "filesystem": { "mount_point": "/dev", "name": "udev", "size": "1011376", "used": "4" } }, { "filesystem": { "mount_point": "/run", "name": "tmpfs", "size": "204092", "used": "288" } }, { "filesystem": { "mount_point": "/run/lock", "name": "none", "size": "5120", "used": "0" } }, { "filesystem": { "mount_point": "/run/shm", "name": "none", "size": "1020452", "used": "0" } }, { "filesystem": { "mount_point": "/boot", "name": "/dev/sda1", "size": "233191", "used": "50734" } } ] }
Quite a lot of fun, really.