And Now You’re an Anstronaut: Open Source Treks to The Final Frontier

There have been a couple of blog posts recently referencing the recent switch NASA made from Windows to Debian 6, a GNU/Linux distribution, as the OS running on the laptops abord the International Space Station. It’s worth noting that Linux is no stranger to the ISS, as it has been a part of ground control operations since the beginning.

The reasons for the space-side switch are quoted as

…we needed an operating system that was stable and reliable — one that would give us in-house control. So if we needed to patch, adjust, or adapt, we could.

This is satisfying to many Open Source/Linux fans in it’s own right: a collaborative open source project has once again proved itself more stable and reliable for the (relatively) extrodinary conditions of low Earth orbit than a product produced by a major software giant. Plus one for open source collaboration and peer networks!

But theres another reason to be excited. And it’s a reason that would not necessarily applied (mostly) to, say, Apple fanatics had NASA decided to switch to OS X instead of Debian. And that reason has to do with the collaborative nature of the open source movement, codified in many open source licenses under which the software is released. Linux, and the GNU tools, which together make up a fully functional operating system, are released under the GNU General Public License. Unlike many licenses used for commersial software, the GPL esures that software licenses under its terms remains free for users to use,modify and redistribute. There are certainly some strong criticisms and ongoing debate regarding some key aspects of the GPL, especially version 3, the point of contention mostly lies in what is popularly called the “viral” effect of the license: that modified and derived work must also be released under the same license. The GPL might not be appropriate for every developer and every project, but it codifies the spirit of open source software in a way that is agreeable with many developers and users.

So what does this all mean in terms of NASA’s move? We already know that they chose GNU/Linux for its reliability and stability over alternatives, but that doesn’t mean it’s completely bug free, or will always work perfectly with every piece of hardware, which after all is another reason for the switch: no OS will be completely bug free or always work with all hardware, but at least Debian gives NASA the flexibility of making improvements themselves. And there in lies the reason for excitement. While there is no requirement that NASA redistribute their own modified versions of the software, there is no reason to assume they wouldn’t in most cases, and if they do, it will be redistributed under the same license. It’s certainly realistic to expect they will be directing a lot of attention to making the Linux kernel, and the GNU tools packaged with Debian even more stable and more reliable, and those improvements will make their way back into the general distributions that we all use. This means better hardware support for all GNU/Linux users in the future!

And of course it works both ways. Any bug fixes you make and redistribute may make their way back to the ISS, transforming humanity’s thirst for exploring “the final frontier” into a truly collaborative and global endeavor.

How will we build a Third System of education?

I have recently been reading about, as Mike Gancarz puts it in Linux and the Unix Philosophy, “The Three Systems of Man”. This is, to my understanding, a fairly well documented and often-observed concept in software design, possibly first referenced by Frederick Brooks in The Mythical Man-Month when he coined “the second system effect“. Gancarz seems to take the concept further, generalizing it to any system built by humans.

Man has the capacity to build only three systems. No mater how hard he may try, no matter how many hours, months, or years for which he may struggle, he eventually realizes that he is incapable of anything more. He simply cannot build a fourth. To believe otherwise is self-delusion.

The First System

Fueled by need, constricted by deadlines, a first system is born out of a creative spark. It’s quick, often dirty, but gets the job done well. Importantly it inspires others with the possibilities it opens up. The “what if”s elicited by a First System lead to…

The Second System

Encouraged and inspired by the success of the First System more people want to get on bored and offer their own contributions and add features they deem necessary. Committees are formed to organize and delegate. Everyone offers their expertise and everyone believes they have expertise, even when they don’t. The Second System has a marketing team devoted to selling its many features to eagerly awaiting customers, and to appeal to the widest possible customer base nearly any feature that is thought up is added. In reality, most users end up only using a small fraction of available features of The Second System, the rest just get in the way. Despite enjoying commercial success The Second System is usually the worse of the three. By trying to appease everyone (and more often then not, by not understanding , the committees in charge have created a mediocre experience. The unnecessary features add so much complexity that bugs are many and fixes take a considerable amount of effort. After some time, some users (and developers) start to recognize The Second System for what it is: bloatware.

The Third System

The Third System is built by people who have been burned by the Second System

Eventually enough people grow frustrated by the inefficiencies and bloat of The Second System that they rebel against it. They set out to create a new system that contains the essential features and lessons learned in the First and Second Systems, but leave out the crud that accumulated by the Second System. The construction of a Third System comes about either as a result of observed need, or as an act of rebellion against the Second System. Third Systems challenge the status quo set by Second Systems, and as such there is a natural tendency to those invested in The Second System to criticize, distrust and fear The Third System and those who advocate for it.

The Interesting History of Unix

Progression from First to Second to Third system always happens in that order, but sometimes a Third System can reset back to First, as is the case with Unix. While Gancarz argues that current commercial Unix is a Second System, the original Unix created by a handful of people at Bell Labs was a Third System. It grew out of the Multics project which was the Second System solution spun from the excitement of the Compatible Time-Sharing System (CTSS), arguably the first timesharing system ever deployed. Multics suffered so much from second-system syndrome that it collapsed under its own weight.

Linux is both a Third and Second system: while it shares many properties of commercial Unix that are Second System-like, it is under active development by people who came aboard as rebels of Unix and who put every effort into eliminating the Second System cruft associated with its commercial cousin.

Is our current Educational Complex a Second System?

I see many signs of second-system effect in our current educational system. Designed and controlled by committee, constructed to meed the needs of a large audience while failing to meet the individual needs of many (most?). Solutions to visible problems are also determined by committee and patches to solutions serve to cover up symptoms. Addressing the underlying causes would require asking some very difficult questions about the nature of the system itself. Something that those invested in it are not rushing to do.

Building a Third System

What would a Linux-esq approach to education look like? What are the bits that we would like to keep? What are the ugliest pieces that should be discarded first? And how will we weave it all together into a functional, useful system?

Blasphemy?: DRMed Games on Linux

The interwebz have been all atwitter the past month or so with Valve’s announcement of the porting of their Steam gaming service to the
GNU/Linux platform. Many Linux users were thrilled about the
announcement and saw it as a sign that Linux was breaking out of the
small niche culture of hackers to more mainstream folk who just want
to turn there computer on and play a game. To be fair, Linux is not
without a large number of free (both as in beer and as in speech)
games already, but the announcement of a major (are they? I actually
only heard of Valve and Steam because of the Linux announcement)
gaming company moving to the platform as seen by some as legitimizing
the OS to the masses. It certainly gives everyone something to talk
about.

I consider myself more of a pragmatic when it comes to the
philosophical debate surrounding free (for those familiar, the debate
mostly deals with libre software. English has many deficiencies, one
of which is the multiple meanings of the word “free”. In general free
software supporters do support the idea of paying for software and
believe that people should be able to make money off of the software
they write). I think free software is a great ideal to strive for, and
certainly for mission critical software I believe it is important to
have the freedom to view and modify the source code. As I brought up
in vtcli earlier this semester, as an example it is important to have
the freedom to confirm that the incognito mode of your web browser
really is doing what it says it is and not storing or sharing your
browsing information (as an aside to that, I erroneously claimed that
Chrome was open source, it is not, however, it theoretically uses the
same code-base as Chromium, which is open source, and happens to be
the browser I use both when in Linux and OS X. I highly encourage any
users of Chrome to switch to Chromium for the open sourced goodness it
provides, including the ability to confirm that incognito mode really
is incognito). That being said, if there’s a great game I like I am
not terribly concerned with not being able to look at or distribute
the source code, though I certainly would encourage game developers to
release their code under one of the many open source licenses.

It is interesting to note that free software evangelist Richard
Stallman himself isn’t ALL doom and gloom about the news. Though he
certainly isn’t thrilled and encourages people to try out any of the
free games that are available, he does see the move as a possible
motivator for some people to ditch their non-free OSes completely if
gaming had been the only thing holding them back.

However, if you’re going to use these games, you’re better off using them on GNU/Linux rather than on Microsoft Windows. At least you avoid the harm to your freedom that Windows would do. – Richard Stallman

I installed Steam on my Arch Linux install last week and so far have
tried out Bastion, Splice and The World of Goo. All work very well
and have been fun (I had played World of Goo before both on OS X and
Android, it is fun on any platform!). Offically, Arch Linux isn’t
supported but after adding a couple of the libraries and font packages
mentioned on the wiki everything worked like a charm. One down side
that Stallman failed to mention in his response was the fact that it
is much easier for me to spend money on games now that I don’t need to
switch over to OS X to run them.

The Tides of Change?

As Linus Torvalds has mentioned in several video interviews, probably the main reason Linux has been lagging behind in the desktop market is that it doesn’t come pre-installed on desktop hardware, and the average computer user just isn’t going to put forth the effort to install a different operating system and configure it* than came with their new machine. Recently Dell caused a bit of excitement with their release of an Ubuntu addition of their “XPS 13: The Ubuntu developers” edition laptop.  To be fair, this is not the first machine that Dell has offered with Linux pre-installed, but it does seem to be the first that they’ve tried pushing to the mainstream (or in this case, developer) community (in the past you really had to make an effort to find the Ubuntu option on their ordering form).  Dell is also not the only desktop distributor to offer systems with Linux pre-loaded (indeed, many of the others exclusively offer Linux machines), but it is probably the brand with the most name recognition to the general audience.  Could this be the beginning of the end of the Microsoft monopoly on the desktop OS market?  I am optimistic!

*Be wary of the blog posts and forum comments that recount stories of installing Linux and being frustrated with the difficulty of getting all the necessary drivers for their hardware and using that as an argument that the OS wasn’t “ready” for prime time.  If you have ever installed Windows on a fresh new machine you will be well aware that it can be just as frustrating.  Windows doesn’t “just work” on the machines you buy because it is a superior OS (it isn’t), it works because the system distributors like Dell take the time to make sure that the necessary drivers for the particular hardware in the machine are all included.

But What Does It All Mean?

We are at that point in the semester where we are all feeling a bit overwhelmed.  We’re past the introductory material, covered a lot of new concepts and ideas, there’s the threat of a midterm on the horizon, and to make matters worse the same thing is happening in every other class as well. It is not surprising that this is also the part of the semester in which I get questions like “why are we learning this?”, “how do these assignments teach me about Unix?”, or “we’ve covered a lot of commands, a lot of new ideas, done several assignments, what should I focus on?” or “how does it all fit together?” These are good questions to ask, and a good exercise to answer, both for me, so that I can better shape the course to meet its goals, and for the participants in the course to solidify the connections between different material and concepts.  I will attempt to start that process by providing my own thoughts addressing these questions: Learning Unix is as much (if not more) about learning a different way of thinking and problem solving with a computer as it is about learning how to use the terminal and acclimating yourself to the different GUI environments.  And while we’re on the topic of user interfaces, there are several.

Rule of Diversity: Distrust all claims for “one true way”

Unlike Windows or OS X which provides only a single graphical environment to its users and a single graphical toolkit to its developers, Unix systems have multiple options for both environments (e.g. Ubuntu’s Unity, GNOME Shell, KDE) and toolkits (GTK, Qt are the most common).  This confusing jumble isn’t just to make it needlessly annoying for users, in fact it is the result of one of Unixes core philosophies: the users has a better idea of what he/she wants to do than the programmer writing an application.  As a result, many decisions are pushed closer to the user level.  This is sometimes listed as a downside to Unix/Linux, since it increases the perceived complexity of the system to the user, but luckily many distributions have been set up to select a sensible default choice for most people who would rather jump right into using their computer, rather than configuring it.  Ubuntu, with its Unity interface, is a good example of this. But it’s empowering to be aware that you aren’t stuck with Unity, you can install and use any of the other graphical environments as well, such as GNOME, KDE, LXDE, and more. Moving on, but keeping The Rule of Diversity in mind, let’s revisit the examples we looked at in class.  The first was related to the homework in which we were asked to read in lines in a white-space delimited record format containing fields for “First Name”, “Last Name”, “Amount Owed”, and “Phone Number”.  A short example of such a file is

Bill Bradley 25.20 Blacksburg 951-1001
Charles Cassidy 14.52 Radford 261-1002
David Dell 35.00 Blacksburg 231-1003

We were then asked to write a program that would print out information for people in ‘Blacksburg’ in the order of “Phone Number”, “Last Name”, “First Name”, “Amount Owed”. A straight forward way to solve this using Python is with the following code snippet

for line in f:
    fields = line.split()
    if fields[3] == 'Blacksburg':
        record = [fields[4], fields[1], fields[0], fields[2]]
        print ', '.join(record)

In class we looked at an alternative solution using list comprehension:

for fields in (r for r in imap(str.split, f) if r[3] == 'Blacksburg'):
        print ', '.join(fields[i] for i in [4,1,0,2])

Both of these examples can be found on github.  They both do the same thing.   The first takes 5 lines, the second 2.  I made use of a few convenience features to make this happen, the first is the imap function, the iterator version of the map function.  The map function is common under many functional programming languages and implements a common task of applying a function (in this case str.split) to every element in a list (in this case f, the file object). This is an extremely common task in programming, but there is no analog in C but luckily the STL Algorithms library gives us std::transform for C+, though the syntax isn’t nearly as clean as Python’s. So the big question is “If I’ve been implementing this idiom all along, without ‘map’, why change now?”  The answer is that implementing it without map will be guaranteed to use more lines of code, with which we know there is a statistically higher chance of making a mistake.  In addition, the implementation would look a lot like any of the other loops that you have written in the same program and you will find yourself pausing at it to ask yourself “What am I trying to do here?”.  Once you learn the concept of ‘map’, using it is much more concise.  Looking at the call to “map” you know exactly what is going on without having to mentally process a “for” or “while” loop.  This idea is generalized to the concept of list comprehension, which is what we’re doing with the rest of that line.  Working with a list of things is really common in programming, and one of the common things we do with lists is to generate new lists that are some filtered and transformed version of the original.  List comprehension provides a cleaner syntax (similar to the set comprehension that you may be familiar with in mathematics) to transforming lists than the traditional “for” or “while” loop would yield.  And more importantly, once you get familiar with the syntax, it lets you more quickly recognize what is going on.  For example, let’s look at two ways of computing a list of the Pythagorean triples for values 1 through 10

triples1 = []
for x in xrange(1,11):
    for y in xrange(1,11):
        for z in xrange(1,11):
            if x**2 + y**2 == z**2:
                triples1.append((x,y,z))
print triples1

and now, using list comprehension:

triples2 = [ (x,y,z) for x in xrange(1,11)
                     for y in xrange(1,11)
                     for z in xrange(1,11)
                     if x**2 + y**2 == z**2 ]
print triples2

I’ve broken the second example across several lines so that it will all fit on the screen, but it could be left on a single line (see the full, working example) and still be just as readable.  Right off the bat we can look at the second version and tell that `triples2` will be a list of tuples containing three values (x,y,z).  We had to work our way down to five levels of nested blocks to figure that out in the first example.  And while you may not realize it because you’re so used to doing it, our brains have a much more difficult time following what is going on in a nested loop, it implies a specific hierarchy that is misleading for this problem. Let’s shift gears just a bit and look at some of the commands I ran at the end of class.  I first wanted to count all the lines of code in all of the *.py files in my current directory:

cat *.py | wc -l

Then I wanted to revise that and filter out any blank lines:

cat *.py | sed '/^$/d' | wc -l

And let’s also filter out any lines that only contain a comment

cat *.py | sed '/^$/d' | sec '/^#.*$/d' | wc -l

(note, we could have combined the two `sed` commands into one, I separated them to emphasize the idea of using a pipeline to filter data) Next I wanted to know what modules I was importing.

cat *.py | grep '^import'

Say I wanted to isolate just the names, I could use the `cut` command

cat *.py | grep '^import' | cut -d ' ' -f 2

If you didn’t know about the `cut` command you could use sed’s `s` command to do a substitution using regular expressions. I will leave the implementation of this as an exercise for the reader.

We notice that there are few duplicates, let’s only print out unique names

cat *.py | grep '^import' | cut -d ' ' -f 2 | sort | uniq

Exercise for the reader: why is the `sort` necessary?

And finally, let’s count the number of uniq modules I’m using

cat *.py | grep '^import' | cut -d ' ' -f 2 | sort | uniq | wc -l

I could have just shown you the final command and said “this prints the number of modules I’m using” but I wanted to demonstrate the thought process to get there. We started with just a two command pipeline, and then started building up the command one piece at a time. This is a great example of another core Unix philosophy: write simple programs that do one thing and one thing well, and write them with a consistent interface so that they can easy be used together. Now I admit, counting the number of modules uses this way required us to start up 6 processes. Luckily process creation on Unix systems is relatively cheap by design. This had the intended consequence of creating an operating environment in which it made sense to build up complex commands from simpler ones and thereby encouraged the design of simple programs that do one thing and one thing well. We could write a much more efficient program to do this task in C or another compiled language, but the point is, we didn’t have to. As you get more familiar with the simple commands you’ll find that there are many tasks like this you want to do that occur too infrequently for writing a dedicated program, but can be pieced together quickly with a pipeline.

So what the heck do these to different topics: list comprehension and command pipelines, have in common?  And why are we using Python at all? Well, Unix’s strength is that it provides a huge wealth of excellent tools and supports a large number of programming languages.  It does everything an operating system can do to allow you, the developer, to pick the best tool for the job.  As we mentioned before, when we’re developing a program the “best tool” usually means the one that will allow us to solve the problem in the fewest lines possible.  Python’s syntax is much cleaner than that of C or C++, and its support of convenience features like list comprehension allow us to implement algorithms that might normally take several loops in a less expressive language in one, easy to understand line.

This has been a rather long post, I hope you’re still with me.  To summarize, don’t worry too much about memorizing every single command right away, that will come naturally as you use them more often (and a refresher is always just a quick call to `man`).  Instead shift your thinking to a higher level of abstraction and always ask yourself “what tools do I have available to solve this problem” and try to pick the “best” one, whatever “best” means in the context you are in.  Unix/Linux puts you, the user and you, the developer in the drivers seat, it provides you with a wealth of knobs and buttons to press, but does little to tell you which ones it thinks you *should* press.  This can be intimidating, especially coming from a Windows or OS X environment which tends to make most of the choices for the user.  That’s ok, and to be expected.  With practice, you will learn to appreciate your newly discovered flexibility and will start having fun!

I want to know what you think! Share your thoughts on what we’ve gone over  in class, the assignments we’ve done, and the reading we’ve discussed.  How do you see it all fitting together?