Waiting for the File Server

Well, I now have four different UNIX machines and I’ve been doing sysadmin tasks on all of them.  As a result I now have four home directories that are out of sync.

How annoying.

Ultimately I plan to create a file server on one of my machines and provide the same home directory on all of them, but I haven’t done that yet, so I need some temporary crutches to tide me over until I get the file server built. In particular, I need to find out what is where.

The first thing I did was establish trust among the machines, making flapjack, the oldest, into the ‘master’ trusted by the others.  This I did by creating an SSH private key using ssh-keygen on the master and putting the matching public key in .ssh/authorized_keys on the other machines.

Then I decided to automate the discovery of what directories were on which machine.  This is made easier because of my personal trick for organizing files, namely to have a set of top level subdirectories named org/, people/, and projects/ in my home directory. Each of these has twenty-six subdirectories named a through z, with appropriately named subdirectories under them. This I find helps me put related things together. It is not an alternative to search but rather a complement.

Anyway, the result is that I could build a Makefile that automates reaching out to all of my machines and gathering information. Here’s the Makefile:

# $Id: Makefile,v 1.7 2014/07/04 18:57:44 marc Exp marc $

FORCE = force

HOSTS = flapjack frenchtoast pancake waffle

FILES = Makefile

checkin: ${FORCE}
	ci -l ${FILES}

uname: ${FORCE}
	for h in ${HOSTS}; 
	   do ssh $$h uname -a 
	      | sed -e 's/^/'$$h': /'; 

host_find: ${FORCE}
	echo > host_find.txt
	for h in ${HOSTS}; 
		do ssh $$h find -print 
		| sed -e 's/^/'$$h': /' 
		 >> host_find.txt; done

clusters.txt: host_find.txt
	sed -e 's|(/[^/]*/[a-z]/[^/]*)/.*$$|1|' host_find.txt 
	| uniq -c 
	| grep -v '^ *1 ' 
	> clusters.txt


Ideally, of course, I’d get the list of host names in the variable HOSTS from my configuration database, but having neglected to build one yet, I am just listing my machines by name there.

The first important target host_find does an ssh to all of the machines, including itself, and runs find, prefixing the host name on each line so that I can determine which files exist on which machine. This creates a file named host_find.txt which I can probably dispense with now that the machinery is working.

The second important target, clusters.txt, passes the host_find.txt output through a SED script. This SED script does a rather careful substitution of patterns like /org/z/zodiac/blah-blah-blah with /org/z/zodiac. Then the pipe through uniq -c counts up the number of identical path prefixes. That’s fine, but there are lots of subdirectories /org/f that are empty and I don’t want them cluttering up my result, so the grep -v '^ *1 ' pipe segment excludes the lines with a count of 1.

The result of running that tonight is the following report:

      8 flapjack: ./org/c/coursera
    351 flapjack: ./org/s/studiopress
   3119 flapjack: ./org/g/gnu
   1312 flapjack: ./org/f/freedesktop
    293 flapjack: ./org/m/minecraft
      9 flapjack: ./org/b/brother
      2 flapjack: ./org/n/national_center_for_access_to_justice
   1168 flapjack: ./org/w/wordpress
      4 flapjack: ./projects/c/cron
     10 flapjack: ./projects/c/cups
      6 flapjack: ./projects/d/dhcp
     33 flapjack: ./projects/d/dns
     15 flapjack: ./projects/s/sysadmin
      5 flapjack: ./projects/f/ftp
      3 flapjack: ./projects/p/printcap
      8 flapjack: ./projects/p/programming
      8 flapjack: ./projects/t/tftpd
     35 flapjack: ./projects/n/netboot
      7 flapjack: ./projects/l/logrotate
      8 flapjack: ./projects/r/rolodex
    189 flapjack: ./projects/h/html5reset
      6 frenchtoast: ./projects/p/printcap
      5 frenchtoast: ./projects/c/cups
    380 pancake: ./org/m/minecraft
      3 pancake: ./projects/l/logrotate
     15 pancake: ./projects/d/dns
      9 pancake: ./projects/s/sysadmin
     11 waffle: ./projects/s/sysadmin
      8 waffle: ./projects/t/tftpd
     15 waffle: ./projects/d/dns
      3 waffle: ./projects/l/logrotate
    375 waffle: ./org/m/minecraft

And … voila! I have a map that I can use to figure out how to consolidate the many scattered parts of my home directory.

[2014-07-04 – updated the Makefile so that it is more friendly to web browsers.]

[2014-07-29 – a friend of mine critiqued my Makefile code and pointed out that gmake has powerful iteration functions of its own, eliminating the need for me to incorporate shell code in my targets. The result is quite elegant, I must say!]

# Find out what files exist on all of the hosts on donner.lan
# Started in June 2014 by Marc Donner
# $Id: Makefile,v 1.12 2014/07/30 02:07:07 marc Exp $

FORCE = force

# This ought to be the result of a call to the CMDB
HOSTS = flapjack frenchtoast pancake waffle

FILES = Makefile host_find.txt clusters.txt

# This provides us with the ISO 8601 date (YYYY-MM-DD)
DATE := $(shell /bin/date +"%Y-%m-%d")

help: ${FORCE}
	cat Makefile

checkin: ${FORCE}
	ci -l ${FILES}

# A finger exercise to ensure that we can see the base info on the hosts
HOSTS_UNAME := $(HOSTS:%=.%_uname.txt)

uname: ${HOSTS_UNAME}

.%_uname.txt: ${FORCE}
	ssh $* uname -a | sed -e 's/^/:'$*': /' > $@

HOSTS_UPTIME := $(HOSTS:%=.%_uptime.txt)

uptime: ${HOSTS_UPTIME}

.%_uptime.txt: ${FORCE}
	ssh $* uptime | sed -e 's/^/:'$*': /' > $@

# Another finger exercise to verify the location of the ssh landing
# point home directory

HOSTS_PWD := $(HOSTS:%=.%_pwd.txt)

pwd: ${HOSTS_PWD}
	cat ${HOSTS_PWD}

.%_pwd.txt: ${FORCE}
	ssh $* pwd | sed -e 's/^/:'$*': /' > $@

# Run find on all of the ${HOSTS} and prefix mark all of the results,
# accumulating them all in host_find.txt

HOSTS_FIND := $(HOSTS:%=.%_find.txt)

find: ${HOSTS_FIND}

.%_find.txt: ${FORCE}
	echo '# ' ${DATE} > $@
	ssh $* find -print | sed -e 's/^/:'$*': /' >> $@

# Get rid of the empty directories and report the number of files in each
# non-empty directory
clusters.txt: ${HOSTS_FIND}
	cat ${HOSTS_FIND} 
	| sed -e 's|(/[^/]*/[a-z]/[^/]*)/.*$$|1|' 
	| uniq -c 
	| grep -v '^ *1 ' 
	| sort -t ':' -k 3 
	> clusters.txt


Two Intel NUC servers running Ubuntu

Two Intel NUC servers running Ubuntu

A week or two ago I took the plunge and ordered a pair of Intel NUC systems. Here’s what happened next as I worked to build a pair of Ubuntu servers out of the hardware:

I ordered the components for two Linux servers from Amazon:

  • Intel NUC D54250WYK [$364.99 each]
  • Crucial M500 240 GB mSATA [$119.99 each]
  • Crucial 16GB Kit [$134.99 each]
  • Cables Unlimited 6-Foot Mickey Mouse Power Cord [$5.99 each]

for a total of $625.96 per machine. Because I have a structured wiring system in my apartment I didn’t bother with the wifi card.

Assembly was fast, taking ten or fifteen minutes to open the bottom cover, snap in the RAM and the SSD, and button the machine up again.

Getting Ubuntu installed was rather more work (on an iMac):

Download the Ubuntu image from the Ubuntu site.

Prepare a bootable USB with the server image (used diskutil to learn that my USB stick was on /dev/disk4):

  • hdiutil convert -format UDRW -o ubuntu-14.04-server-amd64.img ubuntu-14.04-server-amd64.iso
  • diskutil unmountDisk /dev/disk4
  • sudo dd if=ubuntu-14.04-server-amd64.img.dmg of=/dev/rdisk4 bs=1m
  • diskutil eject /dev/disk4

This then booted on the NUC, and the install went relatively smoothly.

However the system would not boot – did not recognize the SSD as a boot system – after the installation was complete

Did a little searching around and learned that I needed to update the BIOS on the NUC. Downloaded the updated firmware from the Intel site, following a YouTube video from Intel, and applied the new firmware.

Redid the install, which ultimately worked, after one more glitch. The second machine went more smoothly.

Two little Linux boxes now working quite nicely – completely silent, 16G of RAM on each, 240G SSD on each.

They are physically tiny … hard to overemphasize how tiny, but really tiny. They sit on top of my Airport Extreme access point and make it look big.

2014 Five Borough Bike Tour – I’m riding

The Five Borough Bike Tour is an annual event in which tens of thousands of New Yorkers ride 40 or 50 miles from lower Manhattan up through the Bronx, Queens, Brooklyn, and over the Verrazano Narrows Bridge to Staten Island.  For the last three years I’ve supported a wonderful organization called Bronxworks (http://bronxworks.org/) that helps families in need in The Bronx.  I ride with a number of friends, some of whom live in the Bronx, and all of whom have adopted this wonderful group.

I rode with the Bronxworks team in 2011 and 2012 but a conflict prevented me from riding in 2013, though I donated to support the rest of the team.  Fortunately for me I will be riding again this year.  If you want to contribute to Bronxworks in support of my ride you may visit my fundraising page http://www.crowdrise.com/BronxWorks2014BikeTour/fundraiser/marcdonner.  If you do so, I will be eternally grateful!


From the Editors: The Invisible Computers

[Originally published in the November/December 2011 issue (Volume 9 number 6) of IEEE Security & Privacy magazine.]

Just over a decade ago, shortly before we launched IEEE Security & Privacy, MIT Press published Donald Norman‘s book The Invisible Computer. At the time, conversations about the book focused on the opportunities exposed by his powerful analogies between computers and small electric motors as system components.

Today, almost everything we use has one or more computers, and a surprising number have so many that they require internal networks. For instance, a new automobile has so many computers in it that it has at least two local area networks, separated by a firewall, to connect them, along with interconnects to external systems. There’s probably even a computer in the key!

Medical device makers have also embraced computers as components. Implantable defibrillators and pacemakers have computers and control APIs. If it’s a computer, it must have some test facilities, and these, if misused, could threaten a patient’s health. Doctors who have driven these designs, focused entirely on saving lives, are shocked when asked about safeguards to prevent unauthorized abuse. It’s probably good that their minds don’t go that way, but someone (that’s you) should definitely be thinking that way.

In 2007, the convergence battle in the mobile telephone world was resolved with the iPhone. iPhone’s launch ended the mad competition to add more surfaces and smaller buttons to attach more “features” to each phone. Ever after, a mobile phone would be primarily a piece of software. One button was enough. After that, it was software all the rest of the way down, and control of the technology’s evolution shifted from mechanical to software engineers.

By now, the shape of the computer systems world is beginning to emerge. No longer is the familiar computer body plan of a screen, keyboard, and pointing device recognizable. Now computers lurk inside the most innocuous physical objects, specialized in function but increasingly sophisticated in behavior. Beyond the computer’s presence, however, is the ubiquity of interconnection. The new generation of computers is highly connected, and this is driving a revolution in both security and privacy issues.

It isn’t always obvious what threats to security and privacy this new reality will present. For example, it’s now possible to track stolen cameras using Web-based services that scan published photographs and index them by metadata included in JPEG or TIFF files. Although this is a boon for theft victims, the privacy risks have yet to be understood.

The computer cluster that is a contemporary automobile presents tremendous improvements in safety, performance, and functionality, but it also presents security challenges that are only now being studied and understood. Researchers have identified major vulnerabilities and, encouragingly, report engagement from the automobile industry in acting to mitigate the documented risks.

Security and privacy practitioners and researchers have become comfortable working in the well-lit neighborhood of the standard computer system lamppost. However, the computing world will continue to change rapidly. We should focus more effort on the challenges of the next generations of embedded and interconnected systems.

This is my valedictory editor-in-chief message. I helped George Cybenko, Carl Landwehr, and Fred Schneider launch this magazine and have served as associate EIC ever since. In recent years, my primary work moved into other areas, and lately I have felt that I was gaining more than I was contributing. Thus, at the beginning of 2011, I suggested to EIC John Viega that I would like to step down as associate EIC and give him an opportunity to bring some fresh blood to the team. The two new associate EIC — Shari Lawrence Pfleeger and Jeremy Epstein — are both impressive experts and a wonderful addition. The magazine, and the community it serves, are in excellent hands.

From the Editors: Privacy and the System Life Cycle

[Originally published in the March/April 2011 issue (Volume 9 number 2) of IEEE Security & Privacy magazine.]

Engineering long-lived systems is hard, and adding privacy considerations to such systems makes the work harder.

Who may look at private data that I put online? Certainly I may look at it, plus any person I explicitly authorize. When may the online system’s operators look at it? Certainly when customer service representatives are assisting me in resolving a problem, they might look at the data, though I would expect them to get my permission before doing so. I would also expect my permission to extend only for the duration of the support transaction and to cover just enough data elements to allow the problem’s analysis and resolution.

When may developers responsible for the software’s evolution and maintenance look at my data? Well, pretty much never. The exception is when they’re called in during escalation of a customer service transaction. Yes, that’s right: developers may not, in general, look at private data contained in the systems that they have written and continue to support. In practice, it’s probably infeasible to make developer access impossible, but we should make it highly visible.

Doesn’t the code have a role in this? Of course it does, but the code isn’t generally created by the consumer and isn’t private. Insofar as consumers create code—and they do when they write macros, filters, and configurations for the system—it’s part of this analysis. The system life cycle and privacy implications of user-created code are beyond the current state of the art and merit significant attention in their own right.

So what happens when an online system is forced to migrate data from one version of the software to another version? This happens periodically in the evolution of most long-lived systems, and it often involves a change to the underlying data model. How do software engineers ensure that the migration is executed correctly? They may not spot-check the data, of course, because it’s private. Instead, they build test datasets and run them through the migration system and carefully check the results. But experienced software engineers know very well that test datasets are generally way too clean and don’t exercise the worst of the system. Remember, no system can ever be foolproof because fools are way too clever. So we must develop tests that let us verify that data migration has been executed properly without being able to examine the result and spot-check it by eye. Ouch.

What’s the state of the art with respect to this topic? Our community has produced several documents that represent a start for dealing with private data in computer systems. By and large, these documents focus on foundational issues such as what is and isn’t private data, how to notify consumers that private data will be gathered and held, requirements of laws and regulations governing private data, and protecting private data from unauthorized agents and uses.

Rules and regulations concerning privacy fall along a spectrum. At one end are regulations that attempt to specify behavior to a high level of detail. These rules are well intended, but it’s sometimes unclear to engineers whether compliance is actually possible. At the other end are rules such as HIPAA (Health Insurance Portability and Accountability Act) that simply draw a bright line around a community of data users that comprise doctors, pharmacies, labs, insurers, and their agents and forbid any data flow across that line. HIPAA provides few restrictions on the handling or use of this data within that line. Of course, one irony with HIPAA is that the consumer is outside the line.

Given the current state of engineering systems for online privacy, regulations like HIPAA are probably better than heavy-handed attempts to rush solutions faster than the engineering community can figure out feasibility limits.

This is an important area of work, and some promising research is emerging, such as Craig Gentry’s recent PhD thesis on homomorphic encryption ( http://crypto.stanford.edu/craig/craig-thesis.pdf), but full rescue looks to be years off. We welcome reports from practitioners and researchers on approaches to the problem of maintaining data that may not be examined.

From the Editors: Phagocytes in Cyberspace

[Originally published in the March/April 2010 issue (Volume 8 number 2) of IEEE Security & Privacy magazine.]

Let us reflect on the evolution of malware as our industry has progressed during the 30-plus years since computers moved out of the mainframe datacenter cathedrals and into the personal computer bazaars. We might be moving back to cathedrals these days with the expansion of cloud computing, but the personal computer is here to stay in one form or another — whether it’s desktop or laptop or PDA or smartphone, and whether it’s a stand-alone system with fat client software or a network device with thinner clients.

In the early days of computing, malware was transmitted by infected floppy disks. Authors were amateurs, virulence was low, and the risk was relatively minor—mostly an inconvenience. Later, the computing universe got larger and more densely connected as PCs became cheaper and the Internet and the Web made distributing software cheaper and easier. The software industry in turn made the installation of software easier, accommodating the needs of non-hobbyist users who had little tolerance for technical complexity. Malware authors did likewise, though perhaps for different reasons.

If we look at the history of disease, we see similar changes as biological communities evolved. The higher-population densities of towns and cities sped disease propagation. Adding injury to injury, sharing critical resources like water wells and food markets made propagation an easier problem for bacteria to solve, thus creating a challenge for us. The economic benefits of clustering in cities were in increasing tension with the hygiene problems that emerged from higher population density and the speedups in disease propagation that resulted.

Malware Propagation

Today we see a world in which malware has become a lucrative global industry, for both the offense and the defense. Organized criminals tend a complex interdependent ecosystem in which bot herders supervise vast arrays of zombie PCs. These herders pay malware distributors anywhere from a few cents to a dollar or more for each new machine infected. These botnets are hired out by the hour through professionally designed and implemented websites that accept credit cards and offer online support. What does one do with hired botnet hours? Why, one distributes spam for a fee, or attacks the websites of small- and medium-sized businesses to support the income of a protection racket, or distributes malware to accumulate zombies for another botnet. Malware development is so lucrative that the producers have established companies complete with human resources departments and paintball outings for employees.

The number of zombie PCs is huge. Reliable numbers for total zombies aren’t available, but McAfee claimed to have measured in the first quarter of 2009 an increase of 12 million IP addresses behaving like zombies. 0-day exploits are likewise growing in number. Signature-based antimalware software has fallen further and further behind the bad guys, who use tools that enable them to custom design malware by checking boxes on a GUI. The new malware is polymorphic, allowing hundreds or thousands of versions, each with a different signature for a single virus. Enterprising malware knows how to thwart defensive software. In the early days, it would simply halt the antivirus software. Later, it would uninstall the software. The best modern malware surreptitiously alters the defensive software to blind it to the malware, such as by tampering with the signature files, defeating its responses.

Grandma in Iowa might very well have a PC that’s running zombie software from two different botnets, but she doesn’t notice that her machine is infected or that it has participated in dozens of DDoS attacks and sourced thousands of pieces of spam. The bot software is pretty savvy these day — it lies low when grandma’s using the machine and avoids contending for critical resources so as not to attract grandma’s attention.

Defensive Strategies

Our industry continues to design and implement systems as if each will operate in a malware-free environment forever. A process running an application in a contemporary operating system trusts the services provided to it by the kernel. When developers build distributed systems that orchestrate several processes to cooperate in a larger task, the good ones might cross-authenticate to ensure that they’re talking to the appropriate process, and the better ones might secure the traffic between nodes, but it’s pretty rare for a process to verify that its correspondent is running the right software version, and almost unknown for the process to check on the operating system kernel and the services that it provides.

In the biological world, by contrast, virtually every organism survives with significant numbers of hostile bacteria and viruses in and around its body. Studies show hundreds of distinct bacterial species living on the skin of typical human subjects, and we know that the digestive tract is home to thousands of bacteria, many of which can cause lethal sickness if they were to get out of the gut and into more vulnerable parts of the body. Despite our intimate proximity to dangerous bio-malware, we are generally oblivious. The body keeps the bacteria and viruses in check.

The body has a sophisticated IFF (identify friend or foe) system that helps it distinguish between “thee” and “me” and to attack the “thee.” The odd bacterial or viral illness and even the occasional pandemic represent the exceptions that prove the rule. By and large, we survive as individuals and even thrive in the presence of some pretty bad stuff. Most of our body’s defensive actions take place below the threshold of awareness. Sometimes the basic defenses fail to keep the malware in check, so you develop a fever indicating that something, perhaps an infection, is amiss. If the defenses fail further, you have a funeral.

Maybe it’s time for the good guys (that’s us, if you aren’t following along in the script) to reconsider our defensive strategies. The designers of the Kerberos authentication system explicitly assumed that the bad guys were going to be on the network and set themselves the task of designing an authentication system that didn’t rely on the network’s sterility. Of course, the Kerberos designers’ conception of bad guys was limited to mischievous undergraduates, not organized criminal gangs, but the key insight was correct.

Can we further weaken the trust assumptions underlying our system designs? What would software look like if the applications didn’t trust the file system, or if the file system didn’t trust the operating system? We’ve made some progress on this front, with TPM (trusted platform module) hardware deployed in a number of industries, but we haven’t yet established an adequate level of paranoia in system designers.

The bacteria and viruses that threaten our bodies evolved over time, whereas the malware that threatens our computers has been designed by clever software engineers. Our antimalware defenses don’t adapt to their threat environment locally; at present, they depend on a small number of managers working at antivirus companies. The signature-based antimalware systems are increasingly challenged by scale and quality control problems. (A misbehaving antimalware system is sort of like an immune system under the influence of HIV—a threat that started life as a defensive system.)

Can we build defensive systems that analyze the behavior of malware and react by disabling it? Is there a graduated response mechanism that we can articulate that will allow our defenses to slow malware down while they study it and decide whether to shut it down? Would it be enough to cripple the malware and reduce its virulence?

Work is already under way with some of these assumptions at places like the University of New Mexico and Microsoft Research, but not nearly enough. We’ve clearly reached the end of the line with classical approaches and assumptions. Now is the time for radical thinking.

From the Editors: International Blues

[Originally published in the March/April 2010 issue (Volume 8 number 2) of IEEE Security & Privacy magazine.]

IEEE Security & Privacy could be a lot more international in its focus and content. Reflecting on its content and tone over the past seven years, it’s hard to tell that we think of either privacy or security in a broad international context. There are examples of taking a broader view, but they’re more notable as exceptions than as standards. This is bad for several reasons. First, privacy and security have different levels of importance in different places in the world. Second, by largely ignoring the non-Western world, we risk dangerous blind spots. Third, we might be failing to take simple steps that would make our magazine more valuable worldwide.

Although the purely technical aspects of our work are universal and generic, engineering is all about making trade-offs informed by economic and cultural judgments. Moreover, our subject matter firmly straddles the boundary between technology and policy—something we deliberately set out to do when we created the magazine in 2002/2003. Policy topics are generally more complex and tend to vary across jurisdictions, not to mention industries and institutions. Let’s begin to focus our attention on ensuring that our international relevance increases going forward.

We have seen far too few articles on the challenges of dealing with cybersecurity issues across jurisdictions. Definitions of criminal violations differ across the world—let’s see some examples of issues raised by these distinctions. Cultural standards vary globally, leading to differences in attitudes toward security, privacy, and the role of security services.

Maybe we can’t address generic technical questions yet, so perhaps we should be examining a range of case studies on how these subjects manifest themselves in different countries. After we’ve seen enough case studies, perhaps we’ll be able to abstract away from the details and get our heads around a new set of important questions. How have these variations affected security systems’ design and implementation and operational responses to incidents?

“Made in <insert country here>” has become meaningless as industries have globalized and the movement of physical and virtual goods has become ever easier, making accountability for product quality ever more diffuse—and assurance ever more difficult. Views of personal responsibility toward the community, the employer, the nation, and the world vary widely. An employer’s power to enforce behavior on the part of its employees varies widely across the world, so a vendor might well intend to deliver a high-integrity product, only to be undermined by one or more employees whose cultural views don’t require that they comply. One consequence of this is that products might have “features” that their operators never wanted, features that compromise the security and privacy guarantees that their operators seek to meet.

Can we begin a discussion of techniques for making networks robust in the face of components that are unreliable or even potentially hostile to our usage? Back in the 1980s, the MIT Project Athena folks argued that a security system’s design should presuppose that the network is held by hostile adversaries. Maybe it’s time to go back to that sort of design principle.

This topic isn’t brand new. For example, the United Nations Commission on International Trade Law has been working on cross-border computer crimes, trying to harmonize international agreements on things like rules of evidence, law enforcement cooperation, and definition of crimes. Numerous other international groups are now or can be expected to soon begin working on these and related issues. Cybersecurity is an area in which the balance of power between attackers and defenders is tipping very strongly toward the attackers. This situation presents challenges both to law enforcement and to national security institutions across the world, something that our community should begin to consider and address. S&P has been a leader in discourse throughout its life, and we will adapt ourselves to this emerging trend to best serve our community.

From the Editors: New Models for Old

[Originally published in the July/August 2009 issue (Volume 7 number 4) of IEEE Security & Privacy magazine.]

When faced with a new thing, human beings do something very sensible. They try to harness previous experience and intuition in service of the new thing. How is this new thing like something that I already know and understand?

Trying to model the new thing on some old thing can be efficient, making it easier to reason about the new thing by using analogies adopted from previous experience. The late Claude Shannon did this at least twice in his illustrious career.

The 1930s were an intense time in digital circuits, with engineers busily designing and building ever more complex machines out of electromechanical relays. Design principles for relay systems were vague and imprecise, with engineers employing rules of thumb and heuristics whose efficacy were limited. The result was a world in which tremendous potential was hampered by a real lack of powerful tools for reasoning about the artifacts that engineers were creating.

In 1937, Shannon wrote his master’s dissertation at MIT entitled, “A Symbolic Analysis of Relay and Switching Circuits.” In this paper, which has been called “possibly the most important, and also the most famous, master’s thesis of the [twentieth] century,” he observed that if one limited the interconnection topology very slightly, one could prove that relay circuits obeyed the mathematical rules George Boole formalized in “An Investigation of the Laws of Thought” in 1854. Suddenly, engineers had in their hands powerful tools to help them analyze designs, predict their performance, and determine whether the designs could be made smaller or simpler. It’s because of this work that today we refer to digital circuitry as “logic.”

If he had done no more in his career, Shannon would have been a major contributor, but he couldn’t leave well enough alone. In 1948, he released “A Mathematical Theory of Communication,” a paper that established the field of information theory. The basic concept introduced was that information could be modelled effectively using the mathematics of probability theory, particularly using the specific notations common to thermodynamics. The importance of the information theory work was so great that his earlier work on digital circuit theory has faded to comparative unimportance.

The ability to reuse a model when it fits, even if only approximately, is a powerful tool for speeding the adoption of new technologies. The desktop metaphor is credited with helping the Macintosh rapidly reach a user community that had previously found computing inaccessible, becoming the common metaphor across essentially all computing environments. Although the metaphor has its roots in the work of Douglas Englebart and was refined at Xerox PARC, it’s forever associated with the Macintosh.

Analogic and metaphoric reasoning doesn’t always work, however. For each of the brilliant examples cited here, there’s at least one counterexample in which such approaches fail

Some years ago, I led a project at an investment bank to replace its use of microfiche with an online system. In designing the system, we referred to some SEC regulations governing the storage and retention of records by institutions such as ours. The regulations specified that only optical disks were permitted in these record retention systems. The provision of the regulation that gave the engineers working on the design effort the most entertainment is the requirement that they provide a facility for “projecting” images of the stored documents. It was clear from the rule’s wording that the document’s authors had a mental model in which an optical disk was very much like microfiche, containing very highly miniaturized photographic images of the documents stored there. In a microfiche system, a document is optically enlarged using what amounts to a slide projector. The intent of the regulation was obviously not that we provide a facility to project retrieved documents on a screen but rather that our system be able to display an essentially unaltered rendition of the original document, allowing investigators to see such documents as they were seen by the bank’s staff when they were first used.

As the system’s designers, we felt compelled to write an extensive interpretive document that extracted the original intent from the regulations and get the lawyers to sign off on that interpretation. Then, we could ensure that each of those, more appropriately posed, requirements was met and document how that had been done. In this case, we’d inverted the overly specific regulation to get at the true underlying functional requirements. Of course, if the requirements had been written properly to start with, we could have avoided the time-consuming and expensive process of writing the interpretative document and getting it reviewed and approved by the compliance department. Moreover, we would have avoided the risk that the SEC might disagree with our interpretation and restatement of the requirements.

Why is this important? As technical professionals, we often bemoan the challenge of communicating technology’s potential to laypeople and of their often painful errors in attempting to pierce the complexities and grasp the essential concepts and values on offer. This challenge is manifested in rules and regulations written to “fight the last war” and interpreted by auditors, reporters, and analysts who sometimes miss the essential point. Our frustration is that it’s often these laymen, rather than our technical leaders and visionaries, who establish public understanding of our contributions.

As an industry, we’re now faced with a wide range of circumstances in which the security and privacy protection provisions of systems are specified in laws and regulations. For instance, we have regulations like SEC rules, HIPAA, and SOX that enshrine paper-based information storage and retrieval models in their security and control models. If you have a paper record, how do you ensure its immunity from destruction, theft, or alteration? Why, you put it in a room with thick walls and strong locked doors. You check the backgrounds of everyone requesting access to the room, including the executives and the janitors. You implement careful processes to ensure that every transaction involving one of the documents is recorded in a log book somewhere.

Unfortunately, when you replace the file cabinets in the room with racks full of disks connected by networks, you discover that the thick walls are now as effective as a similar volume of air at securing the documents. But a literal audit might well give a clean bill of health to the roomful of disks. It’s secured within a strong wall. The doors are locked. Everyone with access to the keys is known. A+.

What can we, the security and privacy technical community, do to improve things? Rules and regulations are unfortunately -static documents that, in a dynamic technology world, will somehow always manage to find themselves out of date. We’re in the midst of a huge society-wide change to move record keeping from paper systems to digital ones. In consequence, a vast number of existing rules can and should be rethought and revised. No better time than now, and no one better to do it than we.

From the Editors: Reading (with) the Enemy

[Originally published in the January/February 2009 issue (Volume 7 number 1) of IEEE Security & Privacy magazine.]

Back in the July/August 2006 issue of IEEE Security & Privacy, the editors of the Book Reviews department wrote an essay entitled,  “Why We Won’t Review Books by Hackers.”  They argued that to review such books would be to “tacitly endorse a convicted criminal who now wants to pass himself off as a consultant.” We published two letters to the editor in the subsequent issue, and that was the end of the topic. Or so you thought.

In this issue, I argue that whether S&P reviews them, you should read the writings of bad guys, with the usual caveat that you should do so if they have something useful to say and are well written. This topic has been debated for many years, and the positions boil down to one of four basic arguments:

  • The writings of bad guys are morally tainted.
  • We should not reward bad guys for bad behavior.
  • The writings of bad guys provide “how to” information for the next generation of bad guys.
  • The writings of bad guys glamorize bad behavior and should be eschewed along with other attractive nuisances (to steal a term from the legal community).

If the moral taint disqualification fails for Mein Kampf, then there’s no reason we should let it stop us reading the works of lesser criminals. Fundamentally, any writing that gives the good guys an insight into the behavior of the bad guys is useful.

In the case of black hat computer adventurers, there’s no legitimate employment, so a book’s economic importance to the bad guy might be quite significant. On balance, however, this is a red herring. Negligibly few books are so popular that they change the fortunes of their authors. Most books have no more than modest success that, in the best case, produces a few hundreds or perhaps thousands of dollars for the author. This isn’t enough to make a real behavioral difference. Moreover, if a book becomes incredibly successful, it’s likely that the book’s value to society outweighs the harm that comes from rewarding the bad guy. A more subtle argument is that bad guys write books to market their skills for later employment as security experts. This argument is similarly bogus because it’s really “moral taint” in disguise. Without getting into an imponderable debate on ethics, this argument comes down to the assertion that a bad guy can never be reformed and that skills learned from bad behavior should never be used for gain.

The third argument — that bad-guy writing passes evil skills on to future bad guys — falls apart similarly on deeper analysis. It reduces to the old security through obscurity chestnut, which our community has been on the forefront of rebutting. Besides, cybercrime is a fast-paced arms race, and most of last week’s tools and techniques are ineffective and irrelevant this week. Of course, the more general techniques that bad guys use to develop attacks are as valuable to defenders as they are to attackers.

The last argument (about attractive nuisance) is an interesting one. The world of cybercriminal-authored books clearly breaks into two parts — those whose authors have been caught and convicted and those whose authors have not. All the bad-guy books I can think of have been written by convicted criminals. Books written by unconvicted criminals lack a certain–to put it delicately–credibility, wouldn’t you say? After all, it’s hard to believe that an uncaught and unconvicted bad guy would reveal all the vulnerabilities he knew. And if you want to trade time in jail and the permanent status of a convicted criminal for the dubious chance at fame that writing a true cybercrime book brings, then you probably already have severe problems.

Most fundamentally, however, the department editors noted that the book they were refusing to review was uninformative and badly written. This makes the book a waste of time by violating my rule that bad-guy books should be “useful and well written” to be worth reading. So if you hear about a good book by a bad guy, by all means read it.

From the Editors: Cyberassault on Estonia

[This editorial was published originally in “Security & PrivacyVolume 5 Number 4 July/August 2007]

Estonia recently survived a massive distributed denial-of-service (DDoS) attack that came on the heels of the Estonian government’s relocation of a statue commemorating Russia’s 1940s wartime role. This action inflamed the feelings of the substantial Russian population in Estonia, as well as those of various elements in Russia itself.
Purple prose then boiled over worldwide, with apocalyptic announcements that a “cyberwar” had been unleashed on the Estonians. Were the attacks initiated by hot-headed nationalists or by a nation state? Accusations and denials have flown, but no nation state has claimed authorship.

It’s not really difficult to decide if this was cyberwarfare or simple criminality. Current concepts of war require people in uniforms or a public declaration. There’s no evidence that such was the case. In addition, there’s no reason to believe that national resources were required to mount the attack. Michael Lesk’s piece on the Estonia attacks in this issue (see the Digital Protection department on p. 76) include estimates that, at current botnet leasing prices, the entire attack could have been accomplished for US$100,000, a sum so small that any member of the upper middle class in Russia, or elsewhere, could have sponsored it.

Was there national agency? It’s highly doubtful that Russian President Vladimir Putin or anyone connected to him authorized the attacks. If any Russian leader had anything to say about the Estonians, it was more likely an intemperate outburst like Henry II’s exclamation about Thomas Becket, “Will no one rid me of this troublesome priest?”

We can learn from this, however: security matters, even for trivial computers. A few tens of thousands of even fairly negligible PCs, when attached by broadband connections to the Internet and commanded in concert, can overwhelm all modestly configured systems—and most substantial ones.

Engineering personal systems so that they can’t be turned into zombies is a task that requires real attention. In the meantime, the lack of quality-of-service facilities in our network infrastructure will leave them vulnerable to future botnet attacks. Several avenues are available to address the weaknesses in our current systems, and we should be exploring all of them. Faced with epidemic disease, financial panic, and other mass threats to the common good, we’re jointly and severally at risk and have a definite and legitimate interest in seeing to it that the lower limits of good behavior aren’t violated.

From the Estonia attacks, we’ve also learned that some national military institutions are, at present, hard-pressed to defend their countries’ critical infrastructures and services. Historically, military responses to attacks have involved applying kinetic energy to the attacking forces or to the attackers’ infrastructure. But when the attacking force is tens or hundreds of thousands of civilian PCs hijacked by criminals, what is the appropriate response? Defense is left to the operators of the services and of the infrastructure, with the military relegated to an advisory role—something that both civilians and military must find uncomfortable. Of course, given the murky situations involved in cyberwar, we’ll probably never fully learn what the defense establishments could or did do.

Pundits have dismissed this incident, arguing that this is a cry of “wolf!” that should be ignored (see www.nytimes.com/2007/06/24/weekinreview/24schwartz.html). Although it’s true that we’re unlikely to be blinded to an invasion by the rebooting of our PCs, it’s naïve to suggest that our vulnerability to Internet disruptions has passed its peak. Cyberwar attacks, as demonstrated in 2003 by Slammer, have the potential to disable key infrastructures. To ignore that danger is criminally naïve. Nevertheless, all is not lost.

Events like this have been forecast for several years, and as of the latest reports, there were no surprises in this attack. The mobilization of global expertise to support Estonia’s network defense was heartening and will probably be instructive to study. Planners of information defenses and drafters of future cyberdefense treaties should be contemplating these events very carefully. This wasn’t the first such attack—and it won’t be the last.