How will we build a Third System of education?

I have recently been reading about, as Mike Gancarz puts it in Linux and the Unix Philosophy, “The Three Systems of Man”. This is, to my understanding, a fairly well documented and often-observed concept in software design, possibly first referenced by Frederick Brooks in The Mythical Man-Month when he coined “the second system effect“. Gancarz seems to take the concept further, generalizing it to any system built by humans.

Man has the capacity to build only three systems. No mater how hard he may try, no matter how many hours, months, or years for which he may struggle, he eventually realizes that he is incapable of anything more. He simply cannot build a fourth. To believe otherwise is self-delusion.

The First System

Fueled by need, constricted by deadlines, a first system is born out of a creative spark. It’s quick, often dirty, but gets the job done well. Importantly it inspires others with the possibilities it opens up. The “what if”s elicited by a First System lead to…

The Second System

Encouraged and inspired by the success of the First System more people want to get on bored and offer their own contributions and add features they deem necessary. Committees are formed to organize and delegate. Everyone offers their expertise and everyone believes they have expertise, even when they don’t. The Second System has a marketing team devoted to selling its many features to eagerly awaiting customers, and to appeal to the widest possible customer base nearly any feature that is thought up is added. In reality, most users end up only using a small fraction of available features of The Second System, the rest just get in the way. Despite enjoying commercial success The Second System is usually the worse of the three. By trying to appease everyone (and more often then not, by not understanding , the committees in charge have created a mediocre experience. The unnecessary features add so much complexity that bugs are many and fixes take a considerable amount of effort. After some time, some users (and developers) start to recognize The Second System for what it is: bloatware.

The Third System

The Third System is built by people who have been burned by the Second System

Eventually enough people grow frustrated by the inefficiencies and bloat of The Second System that they rebel against it. They set out to create a new system that contains the essential features and lessons learned in the First and Second Systems, but leave out the crud that accumulated by the Second System. The construction of a Third System comes about either as a result of observed need, or as an act of rebellion against the Second System. Third Systems challenge the status quo set by Second Systems, and as such there is a natural tendency to those invested in The Second System to criticize, distrust and fear The Third System and those who advocate for it.

The Interesting History of Unix

Progression from First to Second to Third system always happens in that order, but sometimes a Third System can reset back to First, as is the case with Unix. While Gancarz argues that current commercial Unix is a Second System, the original Unix created by a handful of people at Bell Labs was a Third System. It grew out of the Multics project which was the Second System solution spun from the excitement of the Compatible Time-Sharing System (CTSS), arguably the first timesharing system ever deployed. Multics suffered so much from second-system syndrome that it collapsed under its own weight.

Linux is both a Third and Second system: while it shares many properties of commercial Unix that are Second System-like, it is under active development by people who came aboard as rebels of Unix and who put every effort into eliminating the Second System cruft associated with its commercial cousin.

Is our current Educational Complex a Second System?

I see many signs of second-system effect in our current educational system. Designed and controlled by committee, constructed to meed the needs of a large audience while failing to meet the individual needs of many (most?). Solutions to visible problems are also determined by committee and patches to solutions serve to cover up symptoms. Addressing the underlying causes would require asking some very difficult questions about the nature of the system itself. Something that those invested in it are not rushing to do.

Building a Third System

What would a Linux-esq approach to education look like? What are the bits that we would like to keep? What are the ugliest pieces that should be discarded first? And how will we weave it all together into a functional, useful system?

A Matter of Standards

Last night on my bike ride home from PFP class I mentally prepared a “todo list” of things to get done in the couple of hours I’d have  before getting too tired to be productive.  In a classic example of the story of my life, all that mental preparation went out the window when I finally arrived home, checked my email (probably mistake #1, checking email wasn’t on the original todo list) and read a message from a student in the class I’m teaching, ECE2524: Introduction to Unix for Engineers.

On the face of it, the question seemed to be a simple one: “how do I display a certain character on the screen?” Furthermore, they noted that when they compiled their program in Windows, it worked fine and displayed the character they wanted, a block symbole: ▊, but when compiling and running on Linux the character displayed as a question mark ‘?’

Now, before you get turned off by words like “compile” and “Linux”, let me assure you, this all has a point and it relates to a discussion we had in PFP about “standards for the Ph.D.” plus, it resulted in one of my favorite methods of procrastination, exploring things we take for granted and discovering why we do things the way we do.

After some googling around I came across this excellent post, from which I pulled many of the examples that I use here.

The problem was one of standards, but before we can talk about that we need to know a little bit about the history of how characters are stored and represented on a computer.  Even if you aren’t a computer engineer you probably know that computers don’t work with letters at all, they work with numbers, and you probably know they work with numbers represented in base 2, or binary, where ’10’ represents ‘2’, ’11’ represents 3, ‘100’ is a ‘4’ and so on.  And if you didn’t know some or any of that, that’s perfectly ok, because you don’t actually need to know how a computer stores and manipulates information in order to use a computer any more, but back in the early days of computing, you did.  Also important for the story, back in the early days of computing the kind of information people needed to represent was much more limited, pictures and graphics of any kind were far beyond the capabilities of hardware used to represent information, in fact, early computer terminals were just glorified typewriters, only capable of representing letters in the Classical Latin alphabet, a-z, A-Z, numbers 0-9 and, because much of the early development was done in the United States, punctuation used in the English language.  To represent these letters with numbers a code had to be developed: a 1 to 1 relationship between a number and a letter.  The code that  came to widespred use, was called American Standard Code for Information Interchange, or ASCII

ASCII chart

This was a nice code for the time, with a total of 128 characters, any once character could be represented with 7 digital bits (2^7 = 128), so for instance 100 0001 in binary, which is 65 in good ol’ base 10, represents upper case ‘A’ while 110 001, or 97 represents lower case ‘a’.  For technical reasons it is convenient to store binary data in chunk bits totaling a power of 2.  7 is not a power of two, but 8 is, and so early computers stored and used information in chunks of 8 bits (today’s modern processors use data in chunks of 32 or 64 bits).

Well, this was all fine and good, we could represent all the printed characters we needed, along with a set of “control” characters that were used for other purposes needed for transmitting data from one location to another.  But soon 128 characters started feeling limited, for one thing, even in English, it is sometimes useful to print accented characters, such as é in résumé.  Well, people noticed that ASCII only used 7 bits, but recognized that information was stored in groups of 8 bits, so there was a whole other bit that could be used.  People got creative and created extended ASCII which assigned symbols to the integer range 128-255 thereby making complete use of all 8 bits, but taking care not to change the meaning of the lower 127 codes, so for instance 130 now was used to represent é.

The problem was that even 255 characters is not enough to represent the richness of all human languages around the world, and so as computer use became more prevalent in other parts of the world the upper 127 codes were used to represent different sets of symbols, for instance computers sold in Israel used 130 to represent the Hebrew letter Gimel (ג) instead of the accented é.  At first, everyone was happy.  People could represent all or most symbols needed for their native language (ignoring for the moment Chinese or Japanese, which have thousands of different symbols, with no hope of fitting in an 8-bit code).

Then the unthinkable happened.  The Internet, and more to the point, email, changed the landscape of character representation, because all of a sudden people were trying to send and receive information to and from different parts of the world.  So now, when an American sent their résumé to their colleague in Isreal is showed up as a rגsumג.  Woops!

But what to do?  At this point there were hundreds of different “code pages” used to represent a set of 255 characters with 8 bits.  While the lower 127 bits remained mostly consistent between code pages, the upper 127 were a bit of a free-for-all.  It became clear that a new standard was needed for representing characters on computers, one that could be used on any computer to represent any printed character of any human language, including ones that did could not easily be represented by only 255 characters.

The solution is called Unicode, and it is a fundamentally different way of thinking about character representation.  In ASCII, and all the code pages developed after that, the relationship between a character and how that character was stored in computer memory was exact (even if different people didn’t agree what that relationship was).  In ASCII, an upper case ‘A’ was stored as 0100 0001, and if you could look at the individual bits physically stored in memory, that is what you would see, end of story.  Unicode relates letters to an abstract concept called a “code point”, a Unicode A is represented as U+0041.  A code point does not tell you anything about how a letter is stored in 1s and 0s, instead U+0041 just means the concept or idea of “upper case A”, likewise U+00E9 means the “lower case accented e”  (é), and U+05D2 means “the Hebrew letter gimel” (ג).  You can find all the Unicode representation for any supported character on the Unicode website, or for quick reference at a variety of online charts, like this one.

But remember, the Unicode representations are associated with the concept of the letter, not how it is stored on a computer.  The relationship between Unicode value and storage value is determined by the encoding scheme, the most common being UTF-8.  A neat property of the UTF-8 encoding is that it is backwards compatible with the lower 127 ASCII characters, and so if those are the only characters you are using they’ll show up just fine in older software that doesn’t know anything about Unicode and assumes everything is in ASCII.

I know I’m risking losing my point at this point, but one last thing.  Right click on this webpage and click “View Page Source”.  Near the top of the page you should see something that looks like

<meta charset=”UTF-8″ />

or

<meta http-equiv=”Content-Type” content=”text/html; charset=UTF-8″ />

This is the line that tells your web browser what encoding scheme is used for the characters on this web page.  But “wait”, you might say, I have a self-reference problem, to write out “charset=UTF-8”, I need to first pick an encoding to use, so how can I tell the web browser what encoding I’m using without assuming it already knows what encoding I’m using?  Well, luckily, all the characters needed to write out the first few header lines, including “charset=UTF-8” just happen to be contained in the lower 127 characters of the original ASCII specification, which is the same as UTF-8 for that small range.  So web browsers can safely assume “UTF-8″ until they read the line <meta charset=”UTF-16” /> at which point they will reload the page and switch to the specified encoding scheme.

Ok. So where the heck was I going with this?  Well for one thing, the history of character representation is quite interesting and highlights various aspects of the history of computing, and sheds light on something that we all take for granted now, that I can open up a web page on any computer and be reasonably sure that the symbols used to represent the characters displayed are what the author intended.

But it also highlights the importance of forming good standards, because without them, it is difficult to communicate across boundaries.  Standards don’t need to specify the details of implementation (how a character is stored in computer memory), but at the very least, to be useful and flexible they need to specify a common map between a specific concept (the letter ‘A’ in the Latin alphabet) and some agreed upon label (U+0041).

Currently, we don’t really have a standardized way of talking about a Ph.D.  What is a “qualifier exam”? “prelims”?, “proposal”? all of these things could mean something different depending on your department and discipline.  While trying to standardize details such as “how many publications” or “how many years” or “what kind of work” would be difficult at best, nonsensical in many cases, to do across disciplines, we could start talking about standardizing the language we use to talk about various parts of the Ph.D process that are similar across fields.

And incidentally, this is why I still haven’t finished grading the stack of homeworks I told myself I’d finish last night.

And for what it’s worth, the answer to the student’s question was to use the Unicode representation of the ▊ symbol, which is standardized, not the extended-ASCII representation, which is not a standard way to represent that symbol.

 

The Freedom to be “Technologically Elite”

Also in response to Kim’s recent post.  I think the conversation about access and the issue of digital inclusion is a very important one to have, and we need to continue to be aware of how the tools and technologies we use may include or exclude people.  I would like to talk a bit about the concept of the “technological elite” that Kim brought up.  It’s important to be aware that it is possible for an elite minority to control the tools the majority comes to depend and rely on, but that is not currently the case, and in fact, I would argue that it will only become a reality if the majority allows it to happen.  As Jon Udell mentioned in his conversation the Internet itself has always been and continues to be an inherently distributed technology.  No single organization, whether it be corporate or governmental, owns or controls it.  There have been attempts, and there will continue to be attempts to restrict freedoms and tighten control, like the recent attempted SOPA/PIPA legislation, but it is our responsibility to continue to be aware of those attempts and fight them.

Many popular software tools in use, including WordPress which powers this blog, are free and open source.  This means that anyone can take a look at the source code to learn what is going on behind the scenes, and in many cases, modify and improve that tool for your own or public use.  The language that WordPress is written in, PHP is not only open-source, but there are a plethora of free tutorials online for anyone interested in learning how to program in the language.  The database used by WordPress to store and retrieve content, MySQL is currently open source, though the project itself was originally proprietary (Another relational database management system, PostgreSQL, has been open source for the entirety of its live-time and in many cases can be used as a drop-in replacement for MySQL). The majority of servers powering the internet run some version of the Linux operating system, itself freely available and open source.

Each of these projects, at the various layers the build up to form the tools that we use are generally well documented with enough information freely available to allow anyone who wants to become an expert in their use and design.  Now of course, not everyone will become an expert, and they experts for any one project are not necessarily experts in any other.  But specialization has allowed us to advance as a society in a way that would not be possible without it.

And because I love food:

When many of us began specializing in fields that did not involve agriculture and food production, we became dependent on those who did for our very survival.  Yet I can’t remember the last time I’ve heard anyone call farmers members of the “Agricultural Elite”.  Like the Internet tools I’ve mentioned, any of us have the agency to become experts in farming if we so choose.