JSON output from DF
So I'm adding more capabilities to my sysinfo.py program. The next thing that I want to do is get a JSON result from
Here is a sample of the output of df for one of my systems:
So I started by writing a little Python program that used the
This went through various iterations and ended up with this single line of python code, which requires eleven lines of comments to explain it:
My original
So I did some digging around and discovered that the Python
which, in turn, produces a rather nice df output in JSON.
Quite a lot of fun, really.
df
. This is a function whose description, from the man page, says "report file system disk space usage".Here is a sample of the output of df for one of my systems:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/flapjack-root 959088096 3802732 906566516 1% /
udev 1011376 4 1011372 1% /dev
tmpfs 204092 288 203804 1% /run
none 5120 0 5120 0% /run/lock
none 1020452 0 1020452 0% /run/shm
/dev/sda1 233191 50734 170016 23% /boot
So I started by writing a little Python program that used the
subprocess.check_output()
method to capture the output of df
.This went through various iterations and ended up with this single line of python code, which requires eleven lines of comments to explain it:
#
# this next line of code is pretty tense ... let me explain what
# it does:
# subprocess.check_output(["df"]) runs the df command and returns
# the output as a string
# rstrip() trims of the last whitespace character, which is a 'n'
# split('n') breaks the string at the newline characters ... the
# result is an array of strings
# the list comprehension then applies shlex.split() to each string,
# breaking each into tokens
# when we're done, we have a two-dimensional array with rows of
# tokens and we're ready to make objects out of them
#
df_array = [shlex.split(x) for x in
subprocess.check_output(["df"]).rstrip().split('n')]
My original
df.py
code constructed the JSON result manually, a painfully finicky process. After I got it running I remembered a lesson I learned from my dear friend the late David Nochlin, namely that I should construct an object and then use a rendering library to create the JSON serialization.So I did some digging around and discovered that the Python
json
library includes a fairly sensible serialization method that supports prettyprinting of the result. The result was a much cleaner piece of code:
# df.py
#
# parse the output of df and create JSON objects for each filesystem.
#
# $Id: df.py,v 1.5 2014/09/03 00:41:31 marc Exp $
#
# now let's parse the output of df to get filesystem information
#
# Filesystem 1K-blocks Used Available Use% Mounted on
# /dev/mapper/flapjack-root 959088096 3799548 906569700 1% /
# udev 1011376 4 1011372 1% /dev
# tmpfs 204092 288 203804 1% /run
# none 5120 0 5120 0% /run/lock
# none 1020452 0 1020452 0% /run/shm
# /dev/sda1 233191 50734 170016 23% /boot
import subprocess
import shlex
import json
def main():
"""Main routine - call the df utility and return a json structure."""
# this next line of code is pretty tense ... let me explain what
# it does:
# subprocess.check_output(["df"]) runs the df command and returns
# the output as a string
# rstrip() trims of the last whitespace character, which is a 'n'
# split('n') breaks the string at the newline characters ... the
# result is an array of strings
# the list comprehension then applies shlex.split() to each string,
# breaking each into tokens
# when we're done, we have a two-dimensional array with rows of
# tokens and we're ready to make objects out of them
df_array = [shlex.split(x) for x in
subprocess.check_output(["df"]).rstrip().split('n')]
df_num_lines = df_array[:].__len__()
df_json = {}
df_json["filesystems"] = []
for row in range(1, df_num_lines):
df_json["filesystems"].append(df_to_json(df_array[row]))
print json.dumps(df_json, sort_keys=True, indent=2)
return
def df_to_json(tokenList):
"""Take a list of tokens from df and return a python object."""
# If df's ouput format changes, we'll be in trouble, of course.
# the 0 token is the name of the filesystem
# the 1 token is the size of the filesystem in 1K blocks
# the 2 token is the amount used of the filesystem
# the 5 token is the mount point
result = {}
fsName = tokenList[0]
fsSize = tokenList[1]
fsUsed = tokenList[2]
fsMountPoint = tokenList[5]
result["filesystem"] = {}
result["filesystem"]["name"] = fsName
result["filesystem"]["size"] = fsSize
result["filesystem"]["used"] = fsUsed
result["filesystem"]["mount_point"] = fsMountPoint
return result
if __name__ == '__main__':
main()
which, in turn, produces a rather nice df output in JSON.
{
"filesystems": [
{
"filesystem": {
"mount_point": "/",
"name": "/dev/mapper/flapjack-root",
"size": "959088096",
"used": "3802632"
}
},
{
"filesystem": {
"mount_point": "/dev",
"name": "udev",
"size": "1011376",
"used": "4"
}
},
{
"filesystem": {
"mount_point": "/run",
"name": "tmpfs",
"size": "204092",
"used": "288"
}
},
{
"filesystem": {
"mount_point": "/run/lock",
"name": "none",
"size": "5120",
"used": "0"
}
},
{
"filesystem": {
"mount_point": "/run/shm",
"name": "none",
"size": "1020452",
"used": "0"
}
},
{
"filesystem": {
"mount_point": "/boot",
"name": "/dev/sda1",
"size": "233191",
"used": "50734"
}
}
]
}
Quite a lot of fun, really.
/proc/filesystems seems easier to parse.
ReplyDelete