How will we build a Third System of education?

I have recently been reading about, as Mike Gancarz puts it in Linux and the Unix Philosophy, “The Three Systems of Man”. This is, to my understanding, a fairly well documented and often-observed concept in software design, possibly first referenced by Frederick Brooks in The Mythical Man-Month when he coined “the second system effect“. Gancarz seems to take the concept further, generalizing it to any system built by humans.

Man has the capacity to build only three systems. No mater how hard he may try, no matter how many hours, months, or years for which he may struggle, he eventually realizes that he is incapable of anything more. He simply cannot build a fourth. To believe otherwise is self-delusion.

The First System

Fueled by need, constricted by deadlines, a first system is born out of a creative spark. It’s quick, often dirty, but gets the job done well. Importantly it inspires others with the possibilities it opens up. The “what if”s elicited by a First System lead to…

The Second System

Encouraged and inspired by the success of the First System more people want to get on bored and offer their own contributions and add features they deem necessary. Committees are formed to organize and delegate. Everyone offers their expertise and everyone believes they have expertise, even when they don’t. The Second System has a marketing team devoted to selling its many features to eagerly awaiting customers, and to appeal to the widest possible customer base nearly any feature that is thought up is added. In reality, most users end up only using a small fraction of available features of The Second System, the rest just get in the way. Despite enjoying commercial success The Second System is usually the worse of the three. By trying to appease everyone (and more often then not, by not understanding , the committees in charge have created a mediocre experience. The unnecessary features add so much complexity that bugs are many and fixes take a considerable amount of effort. After some time, some users (and developers) start to recognize The Second System for what it is: bloatware.

The Third System

The Third System is built by people who have been burned by the Second System

Eventually enough people grow frustrated by the inefficiencies and bloat of The Second System that they rebel against it. They set out to create a new system that contains the essential features and lessons learned in the First and Second Systems, but leave out the crud that accumulated by the Second System. The construction of a Third System comes about either as a result of observed need, or as an act of rebellion against the Second System. Third Systems challenge the status quo set by Second Systems, and as such there is a natural tendency to those invested in The Second System to criticize, distrust and fear The Third System and those who advocate for it.

The Interesting History of Unix

Progression from First to Second to Third system always happens in that order, but sometimes a Third System can reset back to First, as is the case with Unix. While Gancarz argues that current commercial Unix is a Second System, the original Unix created by a handful of people at Bell Labs was a Third System. It grew out of the Multics project which was the Second System solution spun from the excitement of the Compatible Time-Sharing System (CTSS), arguably the first timesharing system ever deployed. Multics suffered so much from second-system syndrome that it collapsed under its own weight.

Linux is both a Third and Second system: while it shares many properties of commercial Unix that are Second System-like, it is under active development by people who came aboard as rebels of Unix and who put every effort into eliminating the Second System cruft associated with its commercial cousin.

Is our current Educational Complex a Second System?

I see many signs of second-system effect in our current educational system. Designed and controlled by committee, constructed to meed the needs of a large audience while failing to meet the individual needs of many (most?). Solutions to visible problems are also determined by committee and patches to solutions serve to cover up symptoms. Addressing the underlying causes would require asking some very difficult questions about the nature of the system itself. Something that those invested in it are not rushing to do.

Building a Third System

What would a Linux-esq approach to education look like? What are the bits that we would like to keep? What are the ugliest pieces that should be discarded first? And how will we weave it all together into a functional, useful system?

But What Does It All Mean?

We are at that point in the semester where we are all feeling a bit overwhelmed.  We’re past the introductory material, covered a lot of new concepts and ideas, there’s the threat of a midterm on the horizon, and to make matters worse the same thing is happening in every other class as well. It is not surprising that this is also the part of the semester in which I get questions like “why are we learning this?”, “how do these assignments teach me about Unix?”, or “we’ve covered a lot of commands, a lot of new ideas, done several assignments, what should I focus on?” or “how does it all fit together?” These are good questions to ask, and a good exercise to answer, both for me, so that I can better shape the course to meet its goals, and for the participants in the course to solidify the connections between different material and concepts.  I will attempt to start that process by providing my own thoughts addressing these questions: Learning Unix is as much (if not more) about learning a different way of thinking and problem solving with a computer as it is about learning how to use the terminal and acclimating yourself to the different GUI environments.  And while we’re on the topic of user interfaces, there are several.

Rule of Diversity: Distrust all claims for “one true way”

Unlike Windows or OS X which provides only a single graphical environment to its users and a single graphical toolkit to its developers, Unix systems have multiple options for both environments (e.g. Ubuntu’s Unity, GNOME Shell, KDE) and toolkits (GTK, Qt are the most common).  This confusing jumble isn’t just to make it needlessly annoying for users, in fact it is the result of one of Unixes core philosophies: the users has a better idea of what he/she wants to do than the programmer writing an application.  As a result, many decisions are pushed closer to the user level.  This is sometimes listed as a downside to Unix/Linux, since it increases the perceived complexity of the system to the user, but luckily many distributions have been set up to select a sensible default choice for most people who would rather jump right into using their computer, rather than configuring it.  Ubuntu, with its Unity interface, is a good example of this. But it’s empowering to be aware that you aren’t stuck with Unity, you can install and use any of the other graphical environments as well, such as GNOME, KDE, LXDE, and more. Moving on, but keeping The Rule of Diversity in mind, let’s revisit the examples we looked at in class.  The first was related to the homework in which we were asked to read in lines in a white-space delimited record format containing fields for “First Name”, “Last Name”, “Amount Owed”, and “Phone Number”.  A short example of such a file is

Bill Bradley 25.20 Blacksburg 951-1001
Charles Cassidy 14.52 Radford 261-1002
David Dell 35.00 Blacksburg 231-1003

We were then asked to write a program that would print out information for people in ‘Blacksburg’ in the order of “Phone Number”, “Last Name”, “First Name”, “Amount Owed”. A straight forward way to solve this using Python is with the following code snippet

for line in f:
    fields = line.split()
    if fields[3] == 'Blacksburg':
        record = [fields[4], fields[1], fields[0], fields[2]]
        print ', '.join(record)

In class we looked at an alternative solution using list comprehension:

for fields in (r for r in imap(str.split, f) if r[3] == 'Blacksburg'):
        print ', '.join(fields[i] for i in [4,1,0,2])

Both of these examples can be found on github.  They both do the same thing.   The first takes 5 lines, the second 2.  I made use of a few convenience features to make this happen, the first is the imap function, the iterator version of the map function.  The map function is common under many functional programming languages and implements a common task of applying a function (in this case str.split) to every element in a list (in this case f, the file object). This is an extremely common task in programming, but there is no analog in C but luckily the STL Algorithms library gives us std::transform for C+, though the syntax isn’t nearly as clean as Python’s. So the big question is “If I’ve been implementing this idiom all along, without ‘map’, why change now?”  The answer is that implementing it without map will be guaranteed to use more lines of code, with which we know there is a statistically higher chance of making a mistake.  In addition, the implementation would look a lot like any of the other loops that you have written in the same program and you will find yourself pausing at it to ask yourself “What am I trying to do here?”.  Once you learn the concept of ‘map’, using it is much more concise.  Looking at the call to “map” you know exactly what is going on without having to mentally process a “for” or “while” loop.  This idea is generalized to the concept of list comprehension, which is what we’re doing with the rest of that line.  Working with a list of things is really common in programming, and one of the common things we do with lists is to generate new lists that are some filtered and transformed version of the original.  List comprehension provides a cleaner syntax (similar to the set comprehension that you may be familiar with in mathematics) to transforming lists than the traditional “for” or “while” loop would yield.  And more importantly, once you get familiar with the syntax, it lets you more quickly recognize what is going on.  For example, let’s look at two ways of computing a list of the Pythagorean triples for values 1 through 10

triples1 = []
for x in xrange(1,11):
    for y in xrange(1,11):
        for z in xrange(1,11):
            if x**2 + y**2 == z**2:
                triples1.append((x,y,z))
print triples1

and now, using list comprehension:

triples2 = [ (x,y,z) for x in xrange(1,11)
                     for y in xrange(1,11)
                     for z in xrange(1,11)
                     if x**2 + y**2 == z**2 ]
print triples2

I’ve broken the second example across several lines so that it will all fit on the screen, but it could be left on a single line (see the full, working example) and still be just as readable.  Right off the bat we can look at the second version and tell that `triples2` will be a list of tuples containing three values (x,y,z).  We had to work our way down to five levels of nested blocks to figure that out in the first example.  And while you may not realize it because you’re so used to doing it, our brains have a much more difficult time following what is going on in a nested loop, it implies a specific hierarchy that is misleading for this problem. Let’s shift gears just a bit and look at some of the commands I ran at the end of class.  I first wanted to count all the lines of code in all of the *.py files in my current directory:

cat *.py | wc -l

Then I wanted to revise that and filter out any blank lines:

cat *.py | sed '/^$/d' | wc -l

And let’s also filter out any lines that only contain a comment

cat *.py | sed '/^$/d' | sec '/^#.*$/d' | wc -l

(note, we could have combined the two `sed` commands into one, I separated them to emphasize the idea of using a pipeline to filter data) Next I wanted to know what modules I was importing.

cat *.py | grep '^import'

Say I wanted to isolate just the names, I could use the `cut` command

cat *.py | grep '^import' | cut -d ' ' -f 2

If you didn’t know about the `cut` command you could use sed’s `s` command to do a substitution using regular expressions. I will leave the implementation of this as an exercise for the reader.

We notice that there are few duplicates, let’s only print out unique names

cat *.py | grep '^import' | cut -d ' ' -f 2 | sort | uniq

Exercise for the reader: why is the `sort` necessary?

And finally, let’s count the number of uniq modules I’m using

cat *.py | grep '^import' | cut -d ' ' -f 2 | sort | uniq | wc -l

I could have just shown you the final command and said “this prints the number of modules I’m using” but I wanted to demonstrate the thought process to get there. We started with just a two command pipeline, and then started building up the command one piece at a time. This is a great example of another core Unix philosophy: write simple programs that do one thing and one thing well, and write them with a consistent interface so that they can easy be used together. Now I admit, counting the number of modules uses this way required us to start up 6 processes. Luckily process creation on Unix systems is relatively cheap by design. This had the intended consequence of creating an operating environment in which it made sense to build up complex commands from simpler ones and thereby encouraged the design of simple programs that do one thing and one thing well. We could write a much more efficient program to do this task in C or another compiled language, but the point is, we didn’t have to. As you get more familiar with the simple commands you’ll find that there are many tasks like this you want to do that occur too infrequently for writing a dedicated program, but can be pieced together quickly with a pipeline.

So what the heck do these to different topics: list comprehension and command pipelines, have in common?  And why are we using Python at all? Well, Unix’s strength is that it provides a huge wealth of excellent tools and supports a large number of programming languages.  It does everything an operating system can do to allow you, the developer, to pick the best tool for the job.  As we mentioned before, when we’re developing a program the “best tool” usually means the one that will allow us to solve the problem in the fewest lines possible.  Python’s syntax is much cleaner than that of C or C++, and its support of convenience features like list comprehension allow us to implement algorithms that might normally take several loops in a less expressive language in one, easy to understand line.

This has been a rather long post, I hope you’re still with me.  To summarize, don’t worry too much about memorizing every single command right away, that will come naturally as you use them more often (and a refresher is always just a quick call to `man`).  Instead shift your thinking to a higher level of abstraction and always ask yourself “what tools do I have available to solve this problem” and try to pick the “best” one, whatever “best” means in the context you are in.  Unix/Linux puts you, the user and you, the developer in the drivers seat, it provides you with a wealth of knobs and buttons to press, but does little to tell you which ones it thinks you *should* press.  This can be intimidating, especially coming from a Windows or OS X environment which tends to make most of the choices for the user.  That’s ok, and to be expected.  With practice, you will learn to appreciate your newly discovered flexibility and will start having fun!

I want to know what you think! Share your thoughts on what we’ve gone over  in class, the assignments we’ve done, and the reading we’ve discussed.  How do you see it all fitting together?