Waiting for the File Server

Well, I now have four different UNIX machines and I've been doing sysadmin tasks on all of them.  As a result I now have four home directories that are out of sync.

How annoying.

Ultimately I plan to create a file server on one of my machines and provide the same home directory on all of them, but I haven't done that yet, so I need some temporary crutches to tide me over until I get the file server built. In particular, I need to find out what is where.

The first thing I did was establish trust among the machines, making flapjack, the oldest, into the 'master' trusted by the others.  This I did by creating an SSH private key using ssh-keygen on the master and putting the matching public key in .ssh/authorized_keys on the other machines.

Then I decided to automate the discovery of what directories were on which machine.  This is made easier because of my personal trick for organizing files, namely to have a set of top level subdirectories named org/, people/, and projects/ in my home directory. Each of these has twenty-six subdirectories named a through z, with appropriately named subdirectories under them. This I find helps me put related things together. It is not an alternative to search but rather a complement.

Anyway, the result is that I could build a Makefile that automates reaching out to all of my machines and gathering information. Here's the Makefile:


# $Id: Makefile,v 1.7 2014/07/04 18:57:44 marc Exp marc $

FORCE = force

HOSTS = flapjack frenchtoast pancake waffle

FILES = Makefile

checkin: ${FORCE}
ci -l ${FILES}

uname: ${FORCE}
for h in ${HOSTS};
do ssh $$h uname -a
| sed -e 's/^/'$$h': /';
done

host_find: ${FORCE}
echo > host_find.txt
for h in ${HOSTS};
do ssh $$h find -print
| sed -e 's/^/'$$h': /'
>> host_find.txt; done

clusters.txt: host_find.txt
sed -e 's|(/[^/]*/[a-z]/[^/]*)/.*$$|1|' host_find.txt
| uniq -c
| grep -v '^ *1 '
> clusters.txt

force:


Ideally, of course, I'd get the list of host names in the variable HOSTS from my configuration database, but having neglected to build one yet, I am just listing my machines by name there.

The first important target host_find does an ssh to all of the machines, including itself, and runs find, prefixing the host name on each line so that I can determine which files exist on which machine. This creates a file named host_find.txt which I can probably dispense with now that the machinery is working.

The second important target, clusters.txt, passes the host_find.txt output through a SED script. This SED script does a rather careful substitution of patterns like /org/z/zodiac/blah-blah-blah with /org/z/zodiac. Then the pipe through uniq -c counts up the number of identical path prefixes. That's fine, but there are lots of subdirectories /org/f that are empty and I don't want them cluttering up my result, so the grep -v '^ *1 ' pipe segment excludes the lines with a count of 1.

The result of running that tonight is the following report:


8 flapjack: ./org/c/coursera
351 flapjack: ./org/s/studiopress
3119 flapjack: ./org/g/gnu
1312 flapjack: ./org/f/freedesktop
293 flapjack: ./org/m/minecraft
9 flapjack: ./org/b/brother
2 flapjack: ./org/n/national_center_for_access_to_justice
1168 flapjack: ./org/w/wordpress
4 flapjack: ./projects/c/cron
10 flapjack: ./projects/c/cups
6 flapjack: ./projects/d/dhcp
33 flapjack: ./projects/d/dns
15 flapjack: ./projects/s/sysadmin
5 flapjack: ./projects/f/ftp
3 flapjack: ./projects/p/printcap
8 flapjack: ./projects/p/programming
8 flapjack: ./projects/t/tftpd
35 flapjack: ./projects/n/netboot
7 flapjack: ./projects/l/logrotate
8 flapjack: ./projects/r/rolodex
189 flapjack: ./projects/h/html5reset
6 frenchtoast: ./projects/p/printcap
5 frenchtoast: ./projects/c/cups
380 pancake: ./org/m/minecraft
3 pancake: ./projects/l/logrotate
15 pancake: ./projects/d/dns
9 pancake: ./projects/s/sysadmin
11 waffle: ./projects/s/sysadmin
8 waffle: ./projects/t/tftpd
15 waffle: ./projects/d/dns
3 waffle: ./projects/l/logrotate
375 waffle: ./org/m/minecraft


And ... voila! I have a map that I can use to figure out how to consolidate the many scattered parts of my home directory.

[2014-07-04 - updated the Makefile so that it is more friendly to web browsers.]

[2014-07-29 - a friend of mine critiqued my Makefile code and pointed out that gmake has powerful iteration functions of its own, eliminating the need for me to incorporate shell code in my targets. The result is quite elegant, I must say!]


#
# Find out what files exist on all of the hosts on donner.lan
# Started in June 2014 by Marc Donner
#
# $Id: Makefile,v 1.12 2014/07/30 02:07:07 marc Exp $
#

FORCE = force

# This ought to be the result of a call to the CMDB
HOSTS = flapjack frenchtoast pancake waffle

FILES = Makefile host_find.txt clusters.txt

#
# This provides us with the ISO 8601 date (YYYY-MM-DD)
#
DATE := $(shell /bin/date +"%Y-%m-%d")

help: ${FORCE}
cat Makefile

checkin: ${FORCE}
ci -l ${FILES}

# A finger exercise to ensure that we can see the base info on the hosts
HOSTS_UNAME := $(HOSTS:%=.%_uname.txt)

uname: ${HOSTS_UNAME}
cat ${HOSTS_UNAME}

.%_uname.txt: ${FORCE}
ssh $* uname -a | sed -e 's/^/:'$*': /' > $@

HOSTS_UPTIME := $(HOSTS:%=.%_uptime.txt)

uptime: ${HOSTS_UPTIME}
cat ${HOSTS_UPTIME}

.%_uptime.txt: ${FORCE}
ssh $* uptime | sed -e 's/^/:'$*': /' > $@

# Another finger exercise to verify the location of the ssh landing
# point home directory

HOSTS_PWD := $(HOSTS:%=.%_pwd.txt)

pwd: ${HOSTS_PWD}
cat ${HOSTS_PWD}

.%_pwd.txt: ${FORCE}
ssh $* pwd | sed -e 's/^/:'$*': /' > $@

# Run find on all of the ${HOSTS} and prefix mark all of the results,
# accumulating them all in host_find.txt

HOSTS_FIND := $(HOSTS:%=.%_find.txt)

find: ${HOSTS_FIND}

.%_find.txt: ${FORCE}
echo '# ' ${DATE} > $@
ssh $* find -print | sed -e 's/^/:'$*': /' >> $@

# Get rid of the empty directories and report the number of files in each
# non-empty directory
clusters.txt: ${HOSTS_FIND}
cat ${HOSTS_FIND}
| sed -e 's|(/[^/]*/[a-z]/[^/]*)/.*$$|1|'
| uniq -c
| grep -v '^ *1 '
| sort -t ':' -k 3
> clusters.txt

force:

Comments

Popular posts from this blog

Quora Greatest Hits - What are common stages that PhD student researchers go through with their thesis project?

Two Intel NUC servers running Ubuntu

Important Patents - Procrastination