We are at that point in the semester where we are all feeling a bit overwhelmed. We’re past the introductory material, covered a lot of new concepts and ideas, there’s the threat of a midterm on the horizon, and to make matters worse the same thing is happening in every other class as well. It is not surprising that this is also the part of the semester in which I get questions like “why are we learning this?”, “how do these assignments teach me about Unix?”, or “we’ve covered a lot of commands, a lot of new ideas, done several assignments, what should I focus on?” or “how does it all fit together?” These are good questions to ask, and a good exercise to answer, both for me, so that I can better shape the course to meet its goals, and for the participants in the course to solidify the connections between different material and concepts. I will attempt to start that process by providing my own thoughts addressing these questions: Learning Unix is as much (if not more) about learning a different way of thinking and problem solving with a computer as it is about learning how to use the terminal and acclimating yourself to the different GUI environments. And while we’re on the topic of user interfaces, there are several.
Unlike Windows or OS X which provides only a single graphical environment to its users and a single graphical toolkit to its developers, Unix systems have multiple options for both environments (e.g. Ubuntu’s Unity, GNOME Shell, KDE) and toolkits (GTK, Qt are the most common). This confusing jumble isn’t just to make it needlessly annoying for users, in fact it is the result of one of Unixes core philosophies: the users has a better idea of what he/she wants to do than the programmer writing an application. As a result, many decisions are pushed closer to the user level. This is sometimes listed as a downside to Unix/Linux, since it increases the perceived complexity of the system to the user, but luckily many distributions have been set up to select a sensible default choice for most people who would rather jump right into using their computer, rather than configuring it. Ubuntu, with its Unity interface, is a good example of this. But it’s empowering to be aware that you aren’t stuck with Unity, you can install and use any of the other graphical environments as well, such as GNOME, KDE, LXDE, and more. Moving on, but keeping The Rule of Diversity in mind, let’s revisit the examples we looked at in class. The first was related to the homework in which we were asked to read in lines in a white-space delimited record format containing fields for “First Name”, “Last Name”, “Amount Owed”, and “Phone Number”. A short example of such a file is
Bill Bradley 25.20 Blacksburg 951-1001
Charles Cassidy 14.52 Radford 261-1002
David Dell 35.00 Blacksburg 231-1003
We were then asked to write a program that would print out information for people in ‘Blacksburg’ in the order of “Phone Number”, “Last Name”, “First Name”, “Amount Owed”. A straight forward way to solve this using Python is with the following code snippet
for line in f:
fields = line.split()
if fields == 'Blacksburg':
record = [fields, fields, fields, fields]
print ', '.join(record)
In class we looked at an alternative solution using list comprehension:
for fields in (r for r in imap(str.split, f) if r == 'Blacksburg'):
print ', '.join(fields[i] for i in [4,1,0,2])
Both of these examples can be found on github. They both do the same thing. The first takes 5 lines, the second 2. I made use of a few convenience features to make this happen, the first is the imap function, the iterator version of the map function. The map function is common under many functional programming languages and implements a common task of applying a function (in this case str.split) to every element in a list (in this case f, the file object). This is an extremely common task in programming, but there is no analog in C but luckily the STL Algorithms library gives us std::transform for C+, though the syntax isn’t nearly as clean as Python’s. So the big question is “If I’ve been implementing this idiom all along, without ‘map’, why change now?” The answer is that implementing it without map will be guaranteed to use more lines of code, with which we know there is a statistically higher chance of making a mistake. In addition, the implementation would look a lot like any of the other loops that you have written in the same program and you will find yourself pausing at it to ask yourself “What am I trying to do here?”. Once you learn the concept of ‘map’, using it is much more concise. Looking at the call to “map” you know exactly what is going on without having to mentally process a “for” or “while” loop. This idea is generalized to the concept of list comprehension, which is what we’re doing with the rest of that line. Working with a list of things is really common in programming, and one of the common things we do with lists is to generate new lists that are some filtered and transformed version of the original. List comprehension provides a cleaner syntax (similar to the set comprehension that you may be familiar with in mathematics) to transforming lists than the traditional “for” or “while” loop would yield. And more importantly, once you get familiar with the syntax, it lets you more quickly recognize what is going on. For example, let’s look at two ways of computing a list of the Pythagorean triples for values 1 through 10
triples1 = 
for x in xrange(1,11):
for y in xrange(1,11):
for z in xrange(1,11):
if x**2 + y**2 == z**2:
and now, using list comprehension:
triples2 = [ (x,y,z) for x in xrange(1,11)
for y in xrange(1,11)
for z in xrange(1,11)
if x**2 + y**2 == z**2 ]
I’ve broken the second example across several lines so that it will all fit on the screen, but it could be left on a single line (see the full, working example) and still be just as readable. Right off the bat we can look at the second version and tell that `triples2` will be a list of tuples containing three values (x,y,z). We had to work our way down to five levels of nested blocks to figure that out in the first example. And while you may not realize it because you’re so used to doing it, our brains have a much more difficult time following what is going on in a nested loop, it implies a specific hierarchy that is misleading for this problem. Let’s shift gears just a bit and look at some of the commands I ran at the end of class. I first wanted to count all the lines of code in all of the *.py files in my current directory:
cat *.py | wc -l
Then I wanted to revise that and filter out any blank lines:
cat *.py | sed '/^$/d' | wc -l
And let’s also filter out any lines that only contain a comment
cat *.py | sed '/^$/d' | sec '/^#.*$/d' | wc -l
(note, we could have combined the two `sed` commands into one, I separated them to emphasize the idea of using a pipeline to filter data) Next I wanted to know what modules I was importing.
cat *.py | grep '^import'
Say I wanted to isolate just the names, I could use the `cut` command
cat *.py | grep '^import' | cut -d ' ' -f 2
If you didn’t know about the `cut` command you could use sed’s `s` command to do a substitution using regular expressions. I will leave the implementation of this as an exercise for the reader.
We notice that there are few duplicates, let’s only print out unique names
cat *.py | grep '^import' | cut -d ' ' -f 2 | sort | uniq
Exercise for the reader: why is the `sort` necessary?
And finally, let’s count the number of uniq modules I’m using
cat *.py | grep '^import' | cut -d ' ' -f 2 | sort | uniq | wc -l
I could have just shown you the final command and said “this prints the number of modules I’m using” but I wanted to demonstrate the thought process to get there. We started with just a two command pipeline, and then started building up the command one piece at a time. This is a great example of another core Unix philosophy: write simple programs that do one thing and one thing well, and write them with a consistent interface so that they can easy be used together. Now I admit, counting the number of modules uses this way required us to start up 6 processes. Luckily process creation on Unix systems is relatively cheap by design. This had the intended consequence of creating an operating environment in which it made sense to build up complex commands from simpler ones and thereby encouraged the design of simple programs that do one thing and one thing well. We could write a much more efficient program to do this task in C or another compiled language, but the point is, we didn’t have to. As you get more familiar with the simple commands you’ll find that there are many tasks like this you want to do that occur too infrequently for writing a dedicated program, but can be pieced together quickly with a pipeline.
So what the heck do these to different topics: list comprehension and command pipelines, have in common? And why are we using Python at all? Well, Unix’s strength is that it provides a huge wealth of excellent tools and supports a large number of programming languages. It does everything an operating system can do to allow you, the developer, to pick the best tool for the job. As we mentioned before, when we’re developing a program the “best tool” usually means the one that will allow us to solve the problem in the fewest lines possible. Python’s syntax is much cleaner than that of C or C++, and its support of convenience features like list comprehension allow us to implement algorithms that might normally take several loops in a less expressive language in one, easy to understand line.
This has been a rather long post, I hope you’re still with me. To summarize, don’t worry too much about memorizing every single command right away, that will come naturally as you use them more often (and a refresher is always just a quick call to `man`). Instead shift your thinking to a higher level of abstraction and always ask yourself “what tools do I have available to solve this problem” and try to pick the “best” one, whatever “best” means in the context you are in. Unix/Linux puts you, the user and you, the developer in the drivers seat, it provides you with a wealth of knobs and buttons to press, but does little to tell you which ones it thinks you *should* press. This can be intimidating, especially coming from a Windows or OS X environment which tends to make most of the choices for the user. That’s ok, and to be expected. With practice, you will learn to appreciate your newly discovered flexibility and will start having fun!
I want to know what you think! Share your thoughts on what we’ve gone over in class, the assignments we’ve done, and the reading we’ve discussed. How do you see it all fitting together?