Structure, Language and Art

In a recent post tylera5 commented that the last time he wrote poetry was in high school, and wasn’t expecting to have to write a poem for a programming course. I got the idea for a poetry assignment from a friend of mine who teaches a biological science course. She found that the challenge of condensing a technical topic into a 17 syllable Haiku really forces one to think critically about the subject and filter through all the information to shake out the key concept. And poems about tech topics are just fun to read!

I think the benefit is even increased for a programming course. As tylera5 mentioned, both poems had a structure, and he had to think a bit about how to put his thoughts into the structure dictated by the poetry form, whether it be the 5/7/5 syllable structure of a Haiku, or the AABBA rhyming scheme of a limerick.

Poetry is the expression of ideas and thoughts through structured language (and the structure can play a larger or lesser roll depending on the poet, and type of poetry). Programming also is the expression of ideas and thoughts through structured language. The domain of ideas is often more restricted (though not necessarily, this article and book could be the subject of a full post in its own right) and adherence to structure is more strict, but there is an art to both forms of expression.

Are there artistic and expressive tools in other STEM topics as well?

Response to “Class Material – reposted”

In his post titled Class Material – reposted, zickbe asked a very good question about the content of ECE2524.  This is a question that has come up at least once every semester, to paraphrase it is “Since there are modern GUI tools for Linux now, why are we learning all these old command line tools?” The example given in the post was a simple task of replacing all periods ‘.’ with commas ‘,’ in some text input.  Indeed, many graphical editors do have search and replace functionality that make this particular task quite easy.  So what’s the point of learning to do it from the command line?

There are two answers to this question, each from a different perspective.

You as the User

The first is probably the perspective you are all thinking about right now: you as a user of a general-purpose operating system, editing files, writing code, surfing the web, etc.  As we have seen already, Unix has a strong tradition as a platform for text manipulation (remember, its first use was as an OS to run a word processing system for the AT&T Bell Labs patent department).  When we store our data in plain text we have a large collection of powerful tools to manipulate and process that data.

Of course, when learning new concepts we start with simple examples.  One of the simplest ways we can manipulate text is with a literal substitution, for example “replace all occurrences of the word ‘cat’ with ‘dog’ “, or “replace all occurrences of ‘.’ with ‘,’ “.  Literal substitutions are used often enough that many graphical tools have implemented the feature into the interface.  Let’s say we have a file myfile.txt and we want to change all occurrences of ‘dog’ to ‘cat’, we could either use the terminal:

sed -i 's/dog/cat/g' myfile.txt

Or we could open myfile.txt in our favorite text editor, choose the menu option for “search and replace”, enter “dog” and “cat” in the appropriate field, click “ok” and we’re done.  For this simple case it seems like it’s hardly worth the brain-space to remember how to use sed.  Let’s kick it up a notch though.  When writing software applications we often have many files associated with one project.  What if we wanted to replace ‘dog’ with ‘cat’ across several files?  Using the GUI we would open each file in succession, click the menu that contained “search and replace” fill in our search and replace words, hit ‘ok’ and then repeat for the remaining files.  This is probably doable for a few files.  What about 100?  1000?

find project/ -name *.txt -exec sed -i 's/dog/cat/g' '{}' \+

or

find project/ -name *.txt | xargs sed -i 's/dog/cat/g'

The nice thing about this is that the amount of effort we put in is the same no matter how many files we want to process, whether it be 3, 100, 1000 or more.  Try doing 100 text substitutions in a GUI and you’re asking for a repetitive stress injury!

“Ok”, you’re saying to yourself “but how often am I working with hundreds of files at once? I usually just have one or two files I want to modify, it’s not too bad to navigate the GUI menu a few times to do text substitution.”  Let’s think of some more examples of text manipulation you might want to do. In my previous post I described the process I went through to compile a list of links to last semester’s projects. At one point I wanted to prepend each line with a ‘-‘ character to generate a list in Markdown syntax.  I could have just manually added the character to each line, there were only 19, after all, but instead I used a sed command

sed -r 's/^(.*)$/- \1/g'

It didn’t really save me many keystrokes in this case, but it easily fit into the automated workflow I had set up to convert the list of urls to a nice HTML format suitable for posting on the blog. It’s also a task that would have become quite tedious to do by hand if there were more than the 20 or so items that I had. And if I wanted to do somethling a bit more complex like “prepend only the lines containing a url with ‘-‘ but leave all others unchanged”

sed -r 's|^(.*https?://.*)|- \1|g'

Now I can selectively convert lines to a Markdown style list. This is much quicker for even medium sized files than scanning each line by eye to find urls, and then adding a ‘-‘. Can your GUI do that? And of course, if I had a few, or a few hundred files that I wanted to process like this, I could use the same `find … -exec` or `find … | xargs … ` idiom I used above.

Another quick example: You are probably familiar with the two main styles of naming functions with multiple words: CamelCase and underscore_case

def myHelloFunction:
    pass

def my_hello_function:
    pass

Which style you use is largely a matter of preference, although sometimes when working on collaborative projects the project will define a particular style that you must adhere to. Let’s say you’ve been using one style for a few projects and then decide you want to switch (or you get a bunch of code from a friend who was using a different style, or… )

sed -r 's/([A-Z])/_\l\1/g'

Will convert CamelCase to camel_case. Doing the same automatic formatting in a GUI of your choice is left as an exercise for the reader.  A quick google search will turn up a sed command to do the reverse transformation.

The take-away from all of this is that while the examples we use in class may be simple enough that it just so happens that a GUI editor has implemented similar functionality the tools themselves are much more powerful. GUIs are great in that they make it really easy to do the things that the GUI designers planned for. However, they make it difficult or impossible to do things that the designers didn’t plan for.  In the case where you want to perform a text manipulation on a large number of files, or a complex manipulation on one or more files, the command line tools provide a solution where the graphical tools do not.

You as the Developer

But you’re not just any user are you? You are getting a degree in Computer Systems Engineering, and even if you plan to focus on hardware it is a guarantee that you will be writing software at some point (probably many points).  You may even write some software that needs to do text manipulation.  Perhaps a preprocessor for a compiler, or even your own text editor.  What if you want to build in some functionality to allow the end-user to do some text manipulation. Maybe a simple text substitution, or perhaps you’re writing an IDE and want to provide a menu option to automatically convert CamelCase to camel_case across a set of project files.  How would you implement this?  For these examples it probably makes sense to use the regular expression library of whichever language you are programming in, but even in that case, the expressions themselves will be the same as in the sed example.  In some cases you may actually want to spawn a child process running one of the sed commands from above directly (maybe you want to run a complex text manipulation on a large number of files that a user selects with a GUI and let the manipulation run in the background while the GUI is free to take additional requests from the user).

Summary

As you are working with the command line and working through the examples for this class remember to keep in mind the flexibility of the commands you are learning.  In many cases the examples will be so simple that the same functionality has been implemented in any of the popular graphical tools, but the command line version provides much more control and flexibility, as I hope these few examples have demonstrated.  Can you think of any other examples that could be done using command-line text manipulation tools but would be impossible in a general purpose graphical environment?

As I mentioned before, this question comes up every semester.  How could the material in the class be modified to make the power of the tools we learn more apparent?  Should more complex examples be included at the possible expense of clarity? More examples?  Was the explanation I gave here convincing?  If not, please explain why in the comments and I’ll do my best to revise!

What Makes Good Software Good?

The first day of class (ECE2524: Introduction to Unix for Engineers) I asked participents the  open ended question “What makes good software good?” and asked them to answer both “for the developer” and “for the consumer”.

I generated a list of words and phrases for each sub-response and then normalized it based on my own intuition (e.g. I changed “simplicity” to “simple”, “easy to use” to “intuitive”, etc.). I then dumped the list into Wordle to generate these images:

Good Software for the Consumer

Good Software for the Consumer

Good Software for the Developer

Good Software for the Developer

For a future in-class exercise I plan to ask participants to link the common themes that appear in these word clouds back to specific rules mentioned in the reading.

A Matter of Standards

Last night on my bike ride home from PFP class I mentally prepared a “todo list” of things to get done in the couple of hours I’d have  before getting too tired to be productive.  In a classic example of the story of my life, all that mental preparation went out the window when I finally arrived home, checked my email (probably mistake #1, checking email wasn’t on the original todo list) and read a message from a student in the class I’m teaching, ECE2524: Introduction to Unix for Engineers.

On the face of it, the question seemed to be a simple one: “how do I display a certain character on the screen?” Furthermore, they noted that when they compiled their program in Windows, it worked fine and displayed the character they wanted, a block symbole: ▊, but when compiling and running on Linux the character displayed as a question mark ‘?’

Now, before you get turned off by words like “compile” and “Linux”, let me assure you, this all has a point and it relates to a discussion we had in PFP about “standards for the Ph.D.” plus, it resulted in one of my favorite methods of procrastination, exploring things we take for granted and discovering why we do things the way we do.

After some googling around I came across this excellent post, from which I pulled many of the examples that I use here.

The problem was one of standards, but before we can talk about that we need to know a little bit about the history of how characters are stored and represented on a computer.  Even if you aren’t a computer engineer you probably know that computers don’t work with letters at all, they work with numbers, and you probably know they work with numbers represented in base 2, or binary, where ’10’ represents ‘2’, ’11’ represents 3, ‘100’ is a ‘4’ and so on.  And if you didn’t know some or any of that, that’s perfectly ok, because you don’t actually need to know how a computer stores and manipulates information in order to use a computer any more, but back in the early days of computing, you did.  Also important for the story, back in the early days of computing the kind of information people needed to represent was much more limited, pictures and graphics of any kind were far beyond the capabilities of hardware used to represent information, in fact, early computer terminals were just glorified typewriters, only capable of representing letters in the Classical Latin alphabet, a-z, A-Z, numbers 0-9 and, because much of the early development was done in the United States, punctuation used in the English language.  To represent these letters with numbers a code had to be developed: a 1 to 1 relationship between a number and a letter.  The code that  came to widespred use, was called American Standard Code for Information Interchange, or ASCII

ASCII chart

This was a nice code for the time, with a total of 128 characters, any once character could be represented with 7 digital bits (2^7 = 128), so for instance 100 0001 in binary, which is 65 in good ol’ base 10, represents upper case ‘A’ while 110 001, or 97 represents lower case ‘a’.  For technical reasons it is convenient to store binary data in chunk bits totaling a power of 2.  7 is not a power of two, but 8 is, and so early computers stored and used information in chunks of 8 bits (today’s modern processors use data in chunks of 32 or 64 bits).

Well, this was all fine and good, we could represent all the printed characters we needed, along with a set of “control” characters that were used for other purposes needed for transmitting data from one location to another.  But soon 128 characters started feeling limited, for one thing, even in English, it is sometimes useful to print accented characters, such as é in résumé.  Well, people noticed that ASCII only used 7 bits, but recognized that information was stored in groups of 8 bits, so there was a whole other bit that could be used.  People got creative and created extended ASCII which assigned symbols to the integer range 128-255 thereby making complete use of all 8 bits, but taking care not to change the meaning of the lower 127 codes, so for instance 130 now was used to represent é.

The problem was that even 255 characters is not enough to represent the richness of all human languages around the world, and so as computer use became more prevalent in other parts of the world the upper 127 codes were used to represent different sets of symbols, for instance computers sold in Israel used 130 to represent the Hebrew letter Gimel (ג) instead of the accented é.  At first, everyone was happy.  People could represent all or most symbols needed for their native language (ignoring for the moment Chinese or Japanese, which have thousands of different symbols, with no hope of fitting in an 8-bit code).

Then the unthinkable happened.  The Internet, and more to the point, email, changed the landscape of character representation, because all of a sudden people were trying to send and receive information to and from different parts of the world.  So now, when an American sent their résumé to their colleague in Isreal is showed up as a rגsumג.  Woops!

But what to do?  At this point there were hundreds of different “code pages” used to represent a set of 255 characters with 8 bits.  While the lower 127 bits remained mostly consistent between code pages, the upper 127 were a bit of a free-for-all.  It became clear that a new standard was needed for representing characters on computers, one that could be used on any computer to represent any printed character of any human language, including ones that did could not easily be represented by only 255 characters.

The solution is called Unicode, and it is a fundamentally different way of thinking about character representation.  In ASCII, and all the code pages developed after that, the relationship between a character and how that character was stored in computer memory was exact (even if different people didn’t agree what that relationship was).  In ASCII, an upper case ‘A’ was stored as 0100 0001, and if you could look at the individual bits physically stored in memory, that is what you would see, end of story.  Unicode relates letters to an abstract concept called a “code point”, a Unicode A is represented as U+0041.  A code point does not tell you anything about how a letter is stored in 1s and 0s, instead U+0041 just means the concept or idea of “upper case A”, likewise U+00E9 means the “lower case accented e”  (é), and U+05D2 means “the Hebrew letter gimel” (ג).  You can find all the Unicode representation for any supported character on the Unicode website, or for quick reference at a variety of online charts, like this one.

But remember, the Unicode representations are associated with the concept of the letter, not how it is stored on a computer.  The relationship between Unicode value and storage value is determined by the encoding scheme, the most common being UTF-8.  A neat property of the UTF-8 encoding is that it is backwards compatible with the lower 127 ASCII characters, and so if those are the only characters you are using they’ll show up just fine in older software that doesn’t know anything about Unicode and assumes everything is in ASCII.

I know I’m risking losing my point at this point, but one last thing.  Right click on this webpage and click “View Page Source”.  Near the top of the page you should see something that looks like

<meta charset=”UTF-8″ />

or

<meta http-equiv=”Content-Type” content=”text/html; charset=UTF-8″ />

This is the line that tells your web browser what encoding scheme is used for the characters on this web page.  But “wait”, you might say, I have a self-reference problem, to write out “charset=UTF-8”, I need to first pick an encoding to use, so how can I tell the web browser what encoding I’m using without assuming it already knows what encoding I’m using?  Well, luckily, all the characters needed to write out the first few header lines, including “charset=UTF-8” just happen to be contained in the lower 127 characters of the original ASCII specification, which is the same as UTF-8 for that small range.  So web browsers can safely assume “UTF-8″ until they read the line <meta charset=”UTF-16” /> at which point they will reload the page and switch to the specified encoding scheme.

Ok. So where the heck was I going with this?  Well for one thing, the history of character representation is quite interesting and highlights various aspects of the history of computing, and sheds light on something that we all take for granted now, that I can open up a web page on any computer and be reasonably sure that the symbols used to represent the characters displayed are what the author intended.

But it also highlights the importance of forming good standards, because without them, it is difficult to communicate across boundaries.  Standards don’t need to specify the details of implementation (how a character is stored in computer memory), but at the very least, to be useful and flexible they need to specify a common map between a specific concept (the letter ‘A’ in the Latin alphabet) and some agreed upon label (U+0041).

Currently, we don’t really have a standardized way of talking about a Ph.D.  What is a “qualifier exam”? “prelims”?, “proposal”? all of these things could mean something different depending on your department and discipline.  While trying to standardize details such as “how many publications” or “how many years” or “what kind of work” would be difficult at best, nonsensical in many cases, to do across disciplines, we could start talking about standardizing the language we use to talk about various parts of the Ph.D process that are similar across fields.

And incidentally, this is why I still haven’t finished grading the stack of homeworks I told myself I’d finish last night.

And for what it’s worth, the answer to the student’s question was to use the Unicode representation of the ▊ symbol, which is standardized, not the extended-ASCII representation, which is not a standard way to represent that symbol.

 

Academic Privilege: Experiences as a white cisgendered gay male atheist Engineer

Wow.  So after skipping out of PFP early on Monday to attend a talk titled “Why are you Atheists so Angry” by Greta Christina, I was going to write a post about    what angers me about the current state of academia (for those of you not familiar with Greta’s talk, anger in this context is not a bad thing, it is a powerful motivator for social change).  In the process of confirming the url to her blog, a curious random happenstance led me to this post from July, 2011, which in turn lead to here and finally to Of Dogs and Lizards: A Parable of Privilege

I’m not going to rehash the situation and subsequent discussion that lead to the first two links, but if you have time for nothing else, read “Of Dogs and Lizards” immediately after this (or earlier if you find yourself thinking that I shouldn’t be “making a big deal” about this).

This whole sequence of posts was really relevant to me because I had just spent a good deal of time last week discussing the concept of “privileged” with a group of friendly folks.  The parable did a better job of explaining it than I did, I think.

It’s important to understand privilege because it exists at all levels in higher ed, and has a profound effect on the people that don’t have it.  Before I go on, there are many, many kinds of privileged and many of us have some but not all forms.  There’s white privilege, male privilege, straight privilege, cisgendered privilege, religious (in this country, Christian) privilege and so on and so forth.  Notice I’m not talking about the privilege that comes with having a lot of money (although the previously mentioned kinds of privileged have a huge effect on whether or not someone achieves financial privilege).  I’m talking about unearned privileges.  Privileges granted just by being born a certain way, or adopting a certain religion.

(Electrical) Engineering is a male dominated field, and while there have been many discussions as to why this is (and how to change it), one large reason is that it is not perceived as an inviting environment to women.

As a gay male, I tend to be sensitive to sexist comments made by professors, colleagues, even my adviser.  Not for the same reason a woman would be sensitive to them, although I can empathize, but because they make me feel like an outlier, like I don’t belong.  I really don’t understand, why would we “hire some dancing girls” to celebrate a successful paper submission?  And why would I pick a major based on the ability “to meet women”? And why is talking about how engineers can “pick up girls” such a popular topic (here’s a tip, maybe if you started thinking of women as human beings (editors note: I originally had written “human beans”, which might be the case as well)  and not some kind of alien species that you had to “trick” into talking to you, you’d be more successful).

I wish I could remember some more specific examples from the classroom.  All I can remember is numerous times feeling uncomfortable, both for myself, and for the few women around, after a professor (likely unknowingly) made a sexist comment in class.

Now, if you have read the parable yet, you’ll understand that I am not accusing the people making these comments of being bad people. They’re just unaware.  They legitimately do not understand why the comments they are making might be offensive to some people.  Because they have privilege.  It’s not a bad thing, or a good thing, it’s just the state of the world that we live in.  But because they have privilege, they also have the privilege of ignoring the people who raise concerns.

I have had good friends suggest that maybe I was just “an angsty gay boy” for feeling uncomfortable about the pervasive heteronormativity I experience in Engineering.  I have been told by colleagues, after raising concern about a sexist remark made by a professor, that “it’s not a big deal, he didn’t mean it that way, don’t worry about it”.  Well, I am worried about it.  And I’m also worried when people tell me not to worry about it.  As you know by now from reading the referenced posts, these responses are a nice way of saying “shut up”.  Subconciously that is usually often done because maybe they see some truth in what I’m saying but don’t want to admit it because they’re uncomfortable facing the fact that they have privilege, or maybe it’s to try and preserve the privilege that they have.

Academe should be an environment that is welcoming and inclusive to ALL people, and I think most of us feel that way.  So please, the next time someone tells you that a comment made them feel uncomfortable, listen to them.  And understand that it might take a while for you to understand WHY a comment that sounds perfectly reasonable to you might make someone else feel uncomfortable.

What privileged to you enjoy that you might not be aware of? And how might they lead you to say things that may make others feel uncomfortable?

What unearned privileges do you *not* have, and have you ever been made to feel uncomfortable, or unsafe as a result?