In his post titled Class Material – reposted, zickbe asked a very good question about the content of ECE2524. This is a question that has come up at least once every semester, to paraphrase it is “Since there are modern GUI tools for Linux now, why are we learning all these old command line tools?” The example given in the post was a simple task of replacing all periods ‘.’ with commas ‘,’ in some text input. Indeed, many graphical editors do have search and replace functionality that make this particular task quite easy. So what’s the point of learning to do it from the command line?
There are two answers to this question, each from a different perspective.
You as the User
The first is probably the perspective you are all thinking about right now: you as a user of a general-purpose operating system, editing files, writing code, surfing the web, etc. As we have seen already, Unix has a strong tradition as a platform for text manipulation (remember, its first use was as an OS to run a word processing system for the AT&T Bell Labs patent department). When we store our data in plain text we have a large collection of powerful tools to manipulate and process that data.
Of course, when learning new concepts we start with simple examples. One of the simplest ways we can manipulate text is with a literal substitution, for example “replace all occurrences of the word ‘cat’ with ‘dog’ “, or “replace all occurrences of ‘.’ with ‘,’ “. Literal substitutions are used often enough that many graphical tools have implemented the feature into the interface. Let’s say we have a file myfile.txt and we want to change all occurrences of ‘dog’ to ‘cat’, we could either use the terminal:
sed -i 's/dog/cat/g' myfile.txt
Or we could open myfile.txt in our favorite text editor, choose the menu option for “search and replace”, enter “dog” and “cat” in the appropriate field, click “ok” and we’re done. For this simple case it seems like it’s hardly worth the brain-space to remember how to use sed. Let’s kick it up a notch though. When writing software applications we often have many files associated with one project. What if we wanted to replace ‘dog’ with ‘cat’ across several files? Using the GUI we would open each file in succession, click the menu that contained “search and replace” fill in our search and replace words, hit ‘ok’ and then repeat for the remaining files. This is probably doable for a few files. What about 100? 1000?
find project/ -name *.txt -exec sed -i 's/dog/cat/g' '{}' \+
or
find project/ -name *.txt | xargs sed -i 's/dog/cat/g'
The nice thing about this is that the amount of effort we put in is the same no matter how many files we want to process, whether it be 3, 100, 1000 or more. Try doing 100 text substitutions in a GUI and you’re asking for a repetitive stress injury!
“Ok”, you’re saying to yourself “but how often am I working with hundreds of files at once? I usually just have one or two files I want to modify, it’s not too bad to navigate the GUI menu a few times to do text substitution.” Let’s think of some more examples of text manipulation you might want to do. In my previous post I described the process I went through to compile a list of links to last semester’s projects. At one point I wanted to prepend each line with a ‘-‘ character to generate a list in Markdown syntax. I could have just manually added the character to each line, there were only 19, after all, but instead I used a sed
command
sed -r 's/^(.*)$/- \1/g'
It didn’t really save me many keystrokes in this case, but it easily fit into the automated workflow I had set up to convert the list of urls to a nice HTML format suitable for posting on the blog. It’s also a task that would have become quite tedious to do by hand if there were more than the 20 or so items that I had. And if I wanted to do somethling a bit more complex like “prepend only the lines containing a url with ‘-‘ but leave all others unchanged”
sed -r 's|^(.*https?://.*)|- \1|g'
Now I can selectively convert lines to a Markdown style list. This is much quicker for even medium sized files than scanning each line by eye to find urls, and then adding a ‘-‘. Can your GUI do that? And of course, if I had a few, or a few hundred files that I wanted to process like this, I could use the same `find … -exec` or `find … | xargs … ` idiom I used above.
Another quick example: You are probably familiar with the two main styles of naming functions with multiple words: CamelCase and underscore_case
def myHelloFunction: pass def my_hello_function: pass
Which style you use is largely a matter of preference, although sometimes when working on collaborative projects the project will define a particular style that you must adhere to. Let’s say you’ve been using one style for a few projects and then decide you want to switch (or you get a bunch of code from a friend who was using a different style, or… )
sed -r 's/([A-Z])/_\l\1/g'
Will convert CamelCase to camel_case. Doing the same automatic formatting in a GUI of your choice is left as an exercise for the reader. A quick google search will turn up a sed command to do the reverse transformation.
The take-away from all of this is that while the examples we use in class may be simple enough that it just so happens that a GUI editor has implemented similar functionality the tools themselves are much more powerful. GUIs are great in that they make it really easy to do the things that the GUI designers planned for. However, they make it difficult or impossible to do things that the designers didn’t plan for. In the case where you want to perform a text manipulation on a large number of files, or a complex manipulation on one or more files, the command line tools provide a solution where the graphical tools do not.
You as the Developer
But you’re not just any user are you? You are getting a degree in Computer Systems Engineering, and even if you plan to focus on hardware it is a guarantee that you will be writing software at some point (probably many points). You may even write some software that needs to do text manipulation. Perhaps a preprocessor for a compiler, or even your own text editor. What if you want to build in some functionality to allow the end-user to do some text manipulation. Maybe a simple text substitution, or perhaps you’re writing an IDE and want to provide a menu option to automatically convert CamelCase to camel_case across a set of project files. How would you implement this? For these examples it probably makes sense to use the regular expression library of whichever language you are programming in, but even in that case, the expressions themselves will be the same as in the sed example. In some cases you may actually want to spawn a child process running one of the sed commands from above directly (maybe you want to run a complex text manipulation on a large number of files that a user selects with a GUI and let the manipulation run in the background while the GUI is free to take additional requests from the user).
Summary
As you are working with the command line and working through the examples for this class remember to keep in mind the flexibility of the commands you are learning. In many cases the examples will be so simple that the same functionality has been implemented in any of the popular graphical tools, but the command line version provides much more control and flexibility, as I hope these few examples have demonstrated. Can you think of any other examples that could be done using command-line text manipulation tools but would be impossible in a general purpose graphical environment?
As I mentioned before, this question comes up every semester. How could the material in the class be modified to make the power of the tools we learn more apparent? Should more complex examples be included at the possible expense of clarity? More examples? Was the explanation I gave here convincing? If not, please explain why in the comments and I’ll do my best to revise!
Learning to do things on the command line is extremely important for a few reasons:
1) You are not always going to have physical access to a computer to use that GUI. It is very common as a developer to remotely access servers for development and debugging. As an example, when I worked at Booz Allen, there was a central server offsite where all the developed tools and proprietary data was stored. It was accessed by members of the team by connecting via SSH and doing everything over the command line, including debugging the production application and developing and running batch scripts to parse remote data. It would not have been feasible to have people either go to the computer’s location all the time to do work, nor to have multiple people running full remote desktop to get GUI access.
2) As developers, we are going to need to write complicated scripts, as well as writing applications that make external calls to the OS. To accomplish either of these, knowledge of command line operations is needed.
3) A lot of tasks are either easier by doing them from the command line than trying to use a GUI, or too expansive in terms of computing power to utilize a GUI, especially in cases with text parsing or stringing together multiple commands. If you need to do a search on huge amounts of log files, or a replacement of a string of text again on numerous amounts of files, which is easier: find/grep/sed, even if the command looks ugly, or trying to manually load up these files into a GUI editor (or even worse, trying to script running a GUI tool and automating the actions).
Those are great examples Matt. The remote-access thing in particular is something I’ve been trying to figure out how to integrate into the class. The main problem has been that the server space that I know everyone has access to (cvl.ece.vt.edu) is running quite outdated software (both Python and gcc) and I feel that trying to get everyone to compile newer versions of the software may be more work than it’s worth. Perhaps it would still be beneficial to come up with a few assignments that don’t rely on newer features of Python or gcc and make use of that resource.
I think that a lot of people don’t realize the scope of what projects they’ll be working on in the professional industry. Even coding projects in senior-level classes are just a tiny fragment of the complexity and size of applications that companies in the commercial and defense industry develop. The best analogy I can think of is writing a 5-10 page short story versus working collaboratively on a set of novels.
The way we use our computers at home is very different from what will go on in a development environment. Most everything can be done from a GUI for the average consumer or home user, but is not practical in an IT or development setting.