KISS

Earlier today, I was sifting through my RAID array's contents, doing a bit of cleaning (because even that is more appealing than studying for finals. I studied a lot today actually. I just get bored easily), and I happened upon my giant folder of wordlists. It's been collecting dust since April, when I last attended a CTF that gave points for password hash cracking, so I decided to decompress/untar it and see what was inside.

Two billion words/permutations. That's what was inside. Granted, these were not in one single file; there were lots of files. I couldn't help but think that there had to be duplicates among these files, because as much as I'd like to believe that I've 2,000,000,000 unique passwords to cram through my graphics cards, I doubted it. So, my goal was to filter out duplicate passwords from all of these files, but it wasn't that easy.

Many of these files were ordered by frequency of use. So, things like password1 were sitting at the top, while Ah4feU!@21kldjf3 were at the bottom. Additionally, if there were duplicates, the "earliest" one would need to be preserved. Going with the password1 example above, given 2 files: one with password1 at the top, and the other with password1 alphabetically sorted (near the middle), the latter had to be removed, with the former preserved.

Looking at all of this, as well as the data set size, I was thinking "Well, in order to work with this in a timely manner, I need to whip something multithreaded up in C++. But the files total over 20GB of RAM. If I run this on my code server, it will run out of RAM and swap. So do I only want to buffer part of the file? Maybe I can run it on my desktop. That has 32GB of RAM. So how am I preserving ordering?"

My thoughts swam for a good 15 minutes before I decided to ask #vtluug what they thought. I very quickly received the "Use a HashMap!" answer, which was what I was going for. However, I got a bit more feedback that surprised me. Two ideas, actually:

  • Start up a MySQL database and make a basic table with the "word" field as the PK. Then, insert the top word from each file, second word for each file, ... until I was out of words
  • Use simple BASH utils like cut/sort/grep/etc. and chain them together
I nearly immediately dismissed the BASH utils, because I needed a way to tack on a priority/line number to each line, as well as sorting once based on the priority, and once (with removal of duplicates) based on priority. The MySQL idea seemed interesting though. Databases are generally well-optimized for parallel access, so I thought "why not"?

And then someone in #vtluug said "why not just use the BASH utils as suggested above? tack on line numbers, sort + duplicate removal, sort on line numbers, remove line numbers, pipe out to file." The prospect of doing a few short BASH commands seemed better than installing mysql, setting it up, etc. all to save space/cracking time, so I looked into it. It turns out that a chain of 5 commands will do exactly what I want:
cat -n automatically adds line numbers to each line of a file
sed strips leading spaces from each line
sort -u will remove duplicates. Combined with the -k option, it sorts on specified columns. So the line numbers can be effectively ignored at will when removing duplicates.
sort will re-sort based on line numbers
sed will strip line numbers, leaving each word in its original form

So now I've got 6 "subprocesses" gathering words, filtering duplicates, and building properly prioritized wordlists. In that same shell, I've got one more waiting on the completion of the 6 in order to take the resultant 6 lists and merge them properly. Solving this task went from a complex, custom C++ program to setting up a SQL server... To five simple, proven commands. A task that would probably consume a day of my Christmas break turned to 5 minutes of reading man pages and 10 minutes of testing the chain of programs on a small subset of data...

Keep It Simple, Stupid.
Posted in Uncategorized

KISS

Earlier today, I was sifting through my RAID array's contents, doing a bit of cleaning (because even that is more appealing than studying for finals. I studied a lot today actually. I just get bored easily), and I happened upon my giant folder of wordlists. It's been collecting dust since April, when I last attended a CTF that gave points for password hash cracking, so I decided to decompress/untar it and see what was inside.

Two billion words/permutations. That's what was inside. Granted, these were not in one single file; there were lots of files. I couldn't help but think that there had to be duplicates among these files, because as much as I'd like to believe that I've 2,000,000,000 unique passwords to cram through my graphics cards, I doubted it. So, my goal was to filter out duplicate passwords from all of these files, but it wasn't that easy.

Many of these files were ordered by frequency of use. So, things like password1 were sitting at the top, while Ah4feU!@21kldjf3 were at the bottom. Additionally, if there were duplicates, the "earliest" one would need to be preserved. Going with the password1 example above, given 2 files: one with password1 at the top, and the other with password1 alphabetically sorted (near the middle), the latter had to be removed, with the former preserved.

Looking at all of this, as well as the data set size, I was thinking "Well, in order to work with this in a timely manner, I need to whip something multithreaded up in C++. But the files total over 20GB of RAM. If I run this on my code server, it will run out of RAM and swap. So do I only want to buffer part of the file? Maybe I can run it on my desktop. That has 32GB of RAM. So how am I preserving ordering?"

My thoughts swam for a good 15 minutes before I decided to ask #vtluug what they thought. I very quickly received the "Use a HashMap!" answer, which was what I was going for. However, I got a bit more feedback that surprised me. Two ideas, actually:

  • Start up a MySQL database and make a basic table with the "word" field as the PK. Then, insert the top word from each file, second word for each file, ... until I was out of words
  • Use simple BASH utils like cut/sort/grep/etc. and chain them together
I nearly immediately dismissed the BASH utils, because I needed a way to tack on a priority/line number to each line, as well as sorting once based on the priority, and once (with removal of duplicates) based on priority. The MySQL idea seemed interesting though. Databases are generally well-optimized for parallel access, so I thought "why not"?

And then someone in #vtluug said "why not just use the BASH utils as suggested above? tack on line numbers, sort + duplicate removal, sort on line numbers, remove line numbers, pipe out to file." The prospect of doing a few short BASH commands seemed better than installing mysql, setting it up, etc. all to save space/cracking time, so I looked into it. It turns out that a chain of 5 commands will do exactly what I want:
cat -n automatically adds line numbers to each line of a file
sed strips leading spaces from each line
sort -u will remove duplicates. Combined with the -k option, it sorts on specified columns. So the line numbers can be effectively ignored at will when removing duplicates.
sort will re-sort based on line numbers
sed will strip line numbers, leaving each word in its original form

So now I've got 6 "subprocesses" gathering words, filtering duplicates, and building properly prioritized wordlists. In that same shell, I've got one more waiting on the completion of the 6 in order to take the resultant 6 lists and merge them properly. Solving this task went from a complex, custom C++ program to setting up a SQL server... To five simple, proven commands. A task that would probably consume a day of my Christmas break turned to 5 minutes of reading man pages and 10 minutes of testing the chain of programs on a small subset of data...

Keep It Simple, Stupid.
Posted in Uncategorized

Scala: First Thoughts

As promised in yesterday's blog post, this post will be about my first thoughts of Scala as a language. I've gotten to around chapter 7 in Programming Scala 2nd ed. and I've read many online debates about the language's good aspects and bad aspects. I've also written a few very basic programs to get a feel for working with basic constructs in the language (flatMap/map/reduce/fold..., Lists, classes, objects, multithreading, etc.), thus I feel like I have seen the language enough to give my impression of it as a beginner.

In three words, I love it. But it could be more "friendly" from the get-go, and I believe that it is a very deliberate language.

What I mean to say by this is two things:

1. The type system is AMAZING. But the first time I saw the flatMap prototype, I recoiled a bit. By recoil, I mean my first reaction to seeing that signature was similar to my reaction to seeing this in C++:
template <template <class, class> class C,
class T,
class A,
class T_return,
class T_arg
>
C
<T_return, typename A::rebind<T_return>::other>
map
(C<T, A> &c,T_return(*func)(T_arg) )
{
C
<T_return, typename A::rebind<T_return>::other> res;
for ( C<T,A>::iterator it=c.begin() ; it != c.end(); it++ ){
res
.push_back(func(*it));
}
return res;
}
(Credit for the code to a post located here http://stackoverflow.com/questions/1722726/is-the-scala-2-8-collections-library-a-case-of-the-longest-suicide-note-in-hist)

When I saw that, my first thought was "Did his cat walk all over his keyboard and somehow produce valid C++?" But, after taking a minute to mentally parse it (and after learning what alloc::rebind<T> is), it made perfect sense. I can best liken it to getting into a cold pool; if you jump in, you will be in shock for a short while, but you'll be acclimated and swimming in no time.

2. Scala is a deliberate language. Many of the complaints that I read (that aren't about the weird type signatures) are about the power that it gives the programmer. 

For example, it allows them to use "operator overloading"[1]. This is very much frowned upon by some individuals because it needs to be applied judiciously, and the liberal use of it can be incredibly detrimental to the readability of the program. Try to mentally parse this:

Date d("14:00:00 12-Dec-2012 GMT");
d = d + 5;

What does that do? Are you adding five days? Five minutes? Why do you allow a raw number to be added to a Date? Now, concepts such as operator overloading aren't all bad. For example, take the following Scala:

val n = new Complex(1, 5)
val x = new Complex(4, 3)
val z = new Complex(-2, -2)
println(n + x - z) // Prints (7 + 10i)

That, in my opinion, reads far more nicely than n.addTo(x).subtractFrom(z). It also makes switching types from Complex to regular ints to whatever a breeze. Whereas in languages that lack operator overloading (see: Java), swapping between say, an int and a BigInteger requires changing every addition/subtraction/multiplication/division/etc. This is no fun.

Personally, I don't have a problem with a language that gives the programmer a lot of flexibility. In my opinion, languages that are very restrictive make it more difficult to write bad code, yes. But at the same time, they make it more difficult to write good code, as well. The opposite applies to languages such as Scala. If used properly, one can create amazingly concise programs in it. If misused, one can make programs that are downright unreadable in it. I'm not saying that Scala being a deliberate language is definitively a good or bad thing; I just believe it's something to be aware of.

--

What do I like about Scala so far?
  • The type system. Thus far, it's the second best I've used (below Haskell's, but I've had far more experience with Haskell than Scala, so it may be the best and I just haven't seen that yet :P). Personally, I'm still a fan of static typing for larger projects. For small, one-off scripts, I think it's nearly useless; I want to write it and move on. If there's a bug, it will normally be caught by me running the program. But for large projects (especially things that will hit those seldom-hit error cases/need to be reliable), static typing is a very convenient safety net. It won't catch logic errors, but it will be able to tell you that you made a typo, or that you're trying to pass in a char for a list, at compile time. 
  • Everything is immutable except when you don't want it to be. This speaks for itself. By default, everything is immutable (thus, threadsafe). If you find that you need to use a mutable container, swapping is very simple. Just be sure that your threads don't access it simultaneously.
  • Concurrency is at the core of the language. I love concurrency.
  • It runs on the JVM and has access to most/all of Java's libraries.
  • Use of "return" is discouraged, and each statement will implicitly "return" a value, as in Ruby. I like this because I'm human. I've forgotten to use return statements in the past. I will have my "result" variable all set-up, but for whatever reason, I'll just forget to write "return result;". Generally, this isn't so bad for statically typed languages that can yell at you at compile time. But for something like Python that will just implicitly return None, it's a bit of a pain. (Thankfully, the exceptions are pretty, so it's relatively easy to trace down)
  • "class" vs "object". There is now a distinct singleton/"static method container" specifier. No more println(SomeClass.getInstance().toString())! Yay!
  • Syntactic sugar is well-applied. See [3] for an example.
  • Constness is just a char away. var is a value that can be changed, val is a constant value. No "const var", no "final var", no "constexpr var", just "val". This is a very welcome change from the C++/Java world. Less typing for me! 
What do I dislike about Scala so far?
  • It runs on the JVM, so program startup could be faster. I'm going to look into using it with nailgun in the future though.
  • scalac takes around 5 seconds (timed very scientifically by "time scalac test.scala") to compile a basic program. Coming from the land of gcc/g++ (which can compile small files in < 2s) and interpreted languages (compiling? what?), this is definitely not good. I've read that there are some automated build tools for it though. So, hopefully that will cut down on the time I spend waiting for it to compile.
  • Understanding some code written it takes time. Because the goal is to write code quickly, there are some shortcuts one can use.[2] Once one knows these shortcuts, code reads nicely. But, as a beginner, I've yet to learn most of them.
What do I see as a potential pitfall in Scala so far?
Just one thing.

Methods defined as:
def someMethod
{
}

always implicitly return "Unit". Whereas methods defined as

def someMethod = 
{
}

will return the result of the block.

While I don't see it being a major issue, the difference is relatively subtle, and it's a mistake I see myself making at least a few times while I'm learning the language. :P

... And that about sums up my first thoughts of Scala. I'm definitely going to delve deeper into the language as I get the time, and I already have a few moderately large projects in mind that I can use it for. I'm really looking forward to learning more about the language, and being able to apply it in projects. Until then, though, I have to study for Physics. Final exam in two days! Yay!

As always, thank you for reading! If you have thoughts/feedback/comments/corrections/etc., feel free to add a comment below. :)


[1] I say "operator overloading", though (to my knowledge), it's really not. Scala allows you to define methods made purely out of symbols, whereas operator overloading only allows the redefinition of a small, finite set of operators. Member method calls with a single argument can omit parens and the '.'. Thus, all of the following statements are valid scala:
1 + 2 // (1).+(2)
arrlist append 3 // arrlist.append(3)
someList contains "a" // someList.contains("a")

[2] By shortcuts, I mean implicitly defined variables. For example,
val l = List(1, 2, 3, 4)
val s = l.map(_.toString)

The "_" seems to appear out of nowhere until you know what it stands for. Like said, it's not an issue; it just takes a bit of googling.

[3] Syntactic sugar time! Well, partially sugar, partially just the syntax of the language playing out to look pretty. Pulling from the example above...
val l = List(1, 2, 3, 4)
val s = l.map(_.toString)

the initialization of s can be made into a for loop with yield:

val l = List(1, 2, 3, 4)
val s = for (z <- l) yield z.toString

Apparently (meaning I've read on a few other blogs/tutorials), these two approaches generate identical bytecode. Normally, I'd verify this, but this post has taken over an hour to write, and I want to study for finals.

And finally, this class declaration:
class Complex(rl: Int, imag: Int)
{
    val real = rl
    val imaginary = imag

    def + (cx: Complex) = new Complex(real + cx.real, imaginary + cx.imaginary)

    def - (cx: Complex) = new Complex(real - cx.real, imaginary - cx.imaginary)

    override def toString: String = 
        if(imaginary == 0)
            real.toString
        else if(real == 0)
            imaginary.toString + "i"
        else
            "(" + real.toString + " + " + imaginary.toString + "i)"
}

(Please forgive my poor semantics; I still need to learn the style guidelines for Scala)

As you can see, each of these methods (+, -, toString) are only comprised of a single statement, so I needn't use brackets to surround their bodies. Additionally, because these are single-statement bodies, Scala can imply the return value of each function. I can still write it if I so desire (which I did for toString), but it's optional.

As far as this class' usage, it can be used as you would probably expect. The example above using Complex was actually the main() I used to test this class. Cool stuff.
Posted in Uncategorized

Scala: First Thoughts

As promised in yesterday's blog post, this post will be about my first thoughts of Scala as a language. I've gotten to around chapter 7 in Programming Scala 2nd ed. and I've read many online debates about the language's good aspects and bad aspects. I've also written a few very basic programs to get a feel for working with basic constructs in the language (flatMap/map/reduce/fold..., Lists, classes, objects, multithreading, etc.), thus I feel like I have seen the language enough to give my impression of it as a beginner.

In three words, I love it. But it could be more "friendly" from the get-go, and I believe that it is a very deliberate language.

What I mean to say by this is two things:

1. The type system is AMAZING. But the first time I saw the flatMap prototype, I recoiled a bit. By recoil, I mean my first reaction to seeing that signature was similar to my reaction to seeing this in C++:
template <template <class, class> class C,
class T,
class A,
class T_return,
class T_arg
>
C
<T_return, typename A::rebind<T_return>::other>
map
(C<T, A> &c,T_return(*func)(T_arg) )
{
C
<T_return, typename A::rebind<T_return>::other> res;
for ( C<T,A>::iterator it=c.begin() ; it != c.end(); it++ ){
res
.push_back(func(*it));
}
return res;
}
(Credit for the code to a post located here http://stackoverflow.com/questions/1722726/is-the-scala-2-8-collections-library-a-case-of-the-longest-suicide-note-in-hist)

When I saw that, my first thought was "Did his cat walk all over his keyboard and somehow produce valid C++?" But, after taking a minute to mentally parse it (and after learning what alloc::rebind<T> is), it made perfect sense. I can best liken it to getting into a cold pool; if you jump in, you will be in shock for a short while, but you'll be acclimated and swimming in no time.

2. Scala is a deliberate language. Many of the complaints that I read (that aren't about the weird type signatures) are about the power that it gives the programmer. 

For example, it allows them to use "operator overloading"[1]. This is very much frowned upon by some individuals because it needs to be applied judiciously, and the liberal use of it can be incredibly detrimental to the readability of the program. Try to mentally parse this:

Date d("14:00:00 12-Dec-2012 GMT");
d = d + 5;

What does that do? Are you adding five days? Five minutes? Why do you allow a raw number to be added to a Date? Now, concepts such as operator overloading aren't all bad. For example, take the following Scala:

val n = new Complex(1, 5)
val x = new Complex(4, 3)
val z = new Complex(-2, -2)
println(n + x - z) // Prints (7 + 10i)

That, in my opinion, reads far more nicely than n.addTo(x).subtractFrom(z). It also makes switching types from Complex to regular ints to whatever a breeze. Whereas in languages that lack operator overloading (see: Java), swapping between say, an int and a BigInteger requires changing every addition/subtraction/multiplication/division/etc. This is no fun.

Personally, I don't have a problem with a language that gives the programmer a lot of flexibility. In my opinion, languages that are very restrictive make it more difficult to write bad code, yes. But at the same time, they make it more difficult to write good code, as well. The opposite applies to languages such as Scala. If used properly, one can create amazingly concise programs in it. If misused, one can make programs that are downright unreadable in it. I'm not saying that Scala being a deliberate language is definitively a good or bad thing; I just believe it's something to be aware of.

--

What do I like about Scala so far?
  • The type system. Thus far, it's the second best I've used (below Haskell's, but I've had far more experience with Haskell than Scala, so it may be the best and I just haven't seen that yet :P). Personally, I'm still a fan of static typing for larger projects. For small, one-off scripts, I think it's nearly useless; I want to write it and move on. If there's a bug, it will normally be caught by me running the program. But for large projects (especially things that will hit those seldom-hit error cases/need to be reliable), static typing is a very convenient safety net. It won't catch logic errors, but it will be able to tell you that you made a typo, or that you're trying to pass in a char for a list, at compile time. 
  • Everything is immutable except when you don't want it to be. This speaks for itself. By default, everything is immutable (thus, threadsafe). If you find that you need to use a mutable container, swapping is very simple. Just be sure that your threads don't access it simultaneously.
  • Concurrency is at the core of the language. I love concurrency.
  • It runs on the JVM and has access to most/all of Java's libraries.
  • Use of "return" is discouraged, and each statement will implicitly "return" a value, as in Ruby. I like this because I'm human. I've forgotten to use return statements in the past. I will have my "result" variable all set-up, but for whatever reason, I'll just forget to write "return result;". Generally, this isn't so bad for statically typed languages that can yell at you at compile time. But for something like Python that will just implicitly return None, it's a bit of a pain. (Thankfully, the exceptions are pretty, so it's relatively easy to trace down)
  • "class" vs "object". There is now a distinct singleton/"static method container" specifier. No more println(SomeClass.getInstance().toString())! Yay!
  • Syntactic sugar is well-applied. See [3] for an example.
  • Constness is just a char away. var is a value that can be changed, val is a constant value. No "const var", no "final var", no "constexpr var", just "val". This is a very welcome change from the C++/Java world. Less typing for me! 
What do I dislike about Scala so far?
  • It runs on the JVM, so program startup could be faster. I'm going to look into using it with nailgun in the future though.
  • scalac takes around 5 seconds (timed very scientifically by "time scalac test.scala") to compile a basic program. Coming from the land of gcc/g++ (which can compile small files in < 2s) and interpreted languages (compiling? what?), this is definitely not good. I've read that there are some automated build tools for it though. So, hopefully that will cut down on the time I spend waiting for it to compile.
  • Understanding some code written it takes time. Because the goal is to write code quickly, there are some shortcuts one can use.[2] Once one knows these shortcuts, code reads nicely. But, as a beginner, I've yet to learn most of them.
What do I see as a potential pitfall in Scala so far?
Just one thing.

Methods defined as:
def someMethod
{
}

always implicitly return "Unit". Whereas methods defined as

def someMethod = 
{
}

will return the result of the block.

While I don't see it being a major issue, the difference is relatively subtle, and it's a mistake I see myself making at least a few times while I'm learning the language. :P

... And that about sums up my first thoughts of Scala. I'm definitely going to delve deeper into the language as I get the time, and I already have a few moderately large projects in mind that I can use it for. I'm really looking forward to learning more about the language, and being able to apply it in projects. Until then, though, I have to study for Physics. Final exam in two days! Yay!

As always, thank you for reading! If you have thoughts/feedback/comments/corrections/etc., feel free to add a comment below. :)


[1] I say "operator overloading", though (to my knowledge), it's really not. Scala allows you to define methods made purely out of symbols, whereas operator overloading only allows the redefinition of a small, finite set of operators. Member method calls with a single argument can omit parens and the '.'. Thus, all of the following statements are valid scala:
1 + 2 // (1).+(2)
arrlist append 3 // arrlist.append(3)
someList contains "a" // someList.contains("a")

[2] By shortcuts, I mean implicitly defined variables. For example,
val l = List(1, 2, 3, 4)
val s = l.map(_.toString)

The "_" seems to appear out of nowhere until you know what it stands for. Like said, it's not an issue; it just takes a bit of googling.

[3] Syntactic sugar time! Well, partially sugar, partially just the syntax of the language playing out to look pretty. Pulling from the example above...
val l = List(1, 2, 3, 4)
val s = l.map(_.toString)

the initialization of s can be made into a for loop with yield:

val l = List(1, 2, 3, 4)
val s = for (z <- l) yield z.toString

Apparently (meaning I've read on a few other blogs/tutorials), these two approaches generate identical bytecode. Normally, I'd verify this, but this post has taken over an hour to write, and I want to study for finals.

And finally, this class declaration:
class Complex(rl: Int, imag: Int)
{
    val real = rl
    val imaginary = imag

    def + (cx: Complex) = new Complex(real + cx.real, imaginary + cx.imaginary)

    def - (cx: Complex) = new Complex(real - cx.real, imaginary - cx.imaginary)

    override def toString: String = 
        if(imaginary == 0)
            real.toString
        else if(real == 0)
            imaginary.toString + "i"
        else
            "(" + real.toString + " + " + imaginary.toString + "i)"
}

(Please forgive my poor semantics; I still need to learn the style guidelines for Scala)

As you can see, each of these methods (+, -, toString) are only comprised of a single statement, so I needn't use brackets to surround their bodies. Additionally, because these are single-statement bodies, Scala can imply the return value of each function. I can still write it if I so desire (which I did for toString), but it's optional.

As far as this class' usage, it can be used as you would probably expect. The example above using Complex was actually the main() I used to test this class. Cool stuff.
Posted in Uncategorized

It’s Christmas time! (Almost)

It's almost Christmas! As an undergraduate college student, that is a very loaded statement. Not only does it mean a 1 month vacation from college, getting to see friends from High School, and generic festive food/eggnog, but it means finals are coming soon, as well. Due to the impending doom that is finals (and the month-long lapse of nothing afterward), I've decided to play with a few recent programming languages. Because as much as I love studying, I find times where I'd rather do something else. The primary language that I have chosen to study this holiday season is Scala, with either some Ruby (no, I am not Twitter) or Clojure mixed in. So, the question arises: why Scala?

First, let's look at my criteria for a language:
  • The language needed to have functional aspects. The first functional programming language I had ever worked with was Haskell, and I loved it (Except the finicky IO parts). It was a welcome challenge to think in terms of recursion and immutable state, and the incredible amount that one could do with a single line of code astounded me.
  • The language either needed to be dynamically typed, or have a very out-of-the-way static typing system. Because the following feels cluttered/gets old very quickly:
SomeClass<double> instance = new SomeClass<double>(); // Java
for(std::vector<int>::const_iterator i = begin(vec); i != end(vec); ++i) // C++
// (remedied by auto now, or just a ranged for loop, but still a good example.)
struct timeval* tv = (struct timeval*)malloc(sizeof(struct timeval)); // C

As much as I love repeating myself three times when making one object, I'd rather work in a language where I don't need to do that.

  • The language had to not have excessive boilerplate code. This kind of goes hand-in-hand with above, but seriously. 
BufferedReader myReader = new BufferedReader(new FileReader("config.txt"));
ArrayList<String> fileLines;
String prevLine = myReader.nextLine();
while (prevLine != null)
{

    fileLines.append(prevLine);
    prevLine = myReader.nextLine();
}

vs

File.open('config.txt').readlines Ruby

This is a language that I have to want to use in my free time. And somehow, I feel as though I'd rather write the latter than the former nine times out of ten.*

  • The language has to be fast enough. I'm not asking for the speed of C, but at the same time, I'm not going to want to wait for 10 minutes for the program to give me some kind of response
  • The language has to support concurrency. I love multithreading my code; watching my CPU hit 100% on all cores for 10 seconds makes me feel warm and fuzzy inside. Watching a single core on my CPU hit 100% for 70 seconds makes me die a little inside. I understand that some things can't be parallelized, but a fair number of problems can. 
  • The language will be used for small-medium sized (500 - 10,000 lines) projects. (Yes, I dislike throwing a number of lines out there to represent complexity, but it's the best thing I can think of for the moment.
* Please note, this is not to say that I refuse to write Java/C++/C, or that I'm incredibly bias against them. In fact, C++ remains my favorite programming language for lowish level things. I just wanted this language to be a high level language with a fair amount of syntactic sugar so I could get stuff done fast. :)

Armed with these aspirations, I decided to check out some languages out there. The first language I looked into was Clojure, a LISP dialect on the JVM. It looked really interesting; I've always wanted to learn a dialect of LISP, and the incredibly powerful macro system that LISP has really made it a strong contender. At first, I was thinking that Clojure was definitely going to be my primary interest this break, but then I saw a Scala code sample on a website. It looked clean, read well, and ran relatively fast. So, I decided to delve more into the language and see what it was like, and after looking at a few code samples and what I could do with the language, I was sold. It (apparently) combines the speed of Java with an amazing static type system, and is all about concurrency. (Also, that Twitter uses the language for most/all of their backend only helped fuel my interest)

So, that's what I'll be doing in my spare time for a while. Thanks to the wonderful things that are libraries, I was able to acquire Programming Scala, 2nd ed yesterday. Said book has gotten really good reviews, so I'm looking forward to reading it. So far, I've made it to chapter 3/35, so I'll reserve my thoughts on the language as a whole until I get further. But I plan to spend most of today/tomorrow studying for finals, so I may get pretty far. Stay tuned!
Posted in Uncategorized

It’s Christmas time! (Almost)

It's almost Christmas! As an undergraduate college student, that is a very loaded statement. Not only does it mean a 1 month vacation from college, getting to see friends from High School, and generic festive food/eggnog, but it means finals are coming soon, as well. Due to the impending doom that is finals (and the month-long lapse of nothing afterward), I've decided to play with a few recent programming languages. Because as much as I love studying, I find times where I'd rather do something else. The primary language that I have chosen to study this holiday season is Scala, with either some Ruby (no, I am not Twitter) or Clojure mixed in. So, the question arises: why Scala?

First, let's look at my criteria for a language:
  • The language needed to have functional aspects. The first functional programming language I had ever worked with was Haskell, and I loved it (Except the finicky IO parts). It was a welcome challenge to think in terms of recursion and immutable state, and the incredible amount that one could do with a single line of code astounded me.
  • The language either needed to be dynamically typed, or have a very out-of-the-way static typing system. Because the following feels cluttered/gets old very quickly:
SomeClass<double> instance = new SomeClass<double>(); // Java
for(std::vector<int>::const_iterator i = begin(vec); i != end(vec); ++i) // C++
// (remedied by auto now, or just a ranged for loop, but still a good example.)
struct timeval* tv = (struct timeval*)malloc(sizeof(struct timeval)); // C

As much as I love repeating myself three times when making one object, I'd rather work in a language where I don't need to do that.

  • The language had to not have excessive boilerplate code. This kind of goes hand-in-hand with above, but seriously. 
BufferedReader myReader = new BufferedReader(new FileReader("config.txt"));
ArrayList<String> fileLines;
String prevLine = myReader.nextLine();
while (prevLine != null)
{

    fileLines.append(prevLine);
    prevLine = myReader.nextLine();
}

vs

File.open('config.txt').readlines Ruby

This is a language that I have to want to use in my free time. And somehow, I feel as though I'd rather write the latter than the former nine times out of ten.*

  • The language has to be fast enough. I'm not asking for the speed of C, but at the same time, I'm not going to want to wait for 10 minutes for the program to give me some kind of response
  • The language has to support concurrency. I love multithreading my code; watching my CPU hit 100% on all cores for 10 seconds makes me feel warm and fuzzy inside. Watching a single core on my CPU hit 100% for 70 seconds makes me die a little inside. I understand that some things can't be parallelized, but a fair number of problems can. 
  • The language will be used for small-medium sized (500 - 10,000 lines) projects. (Yes, I dislike throwing a number of lines out there to represent complexity, but it's the best thing I can think of for the moment.
* Please note, this is not to say that I refuse to write Java/C++/C, or that I'm incredibly bias against them. In fact, C++ remains my favorite programming language for lowish level things. I just wanted this language to be a high level language with a fair amount of syntactic sugar so I could get stuff done fast. :)

Armed with these aspirations, I decided to check out some languages out there. The first language I looked into was Clojure, a LISP dialect on the JVM. It looked really interesting; I've always wanted to learn a dialect of LISP, and the incredibly powerful macro system that LISP has really made it a strong contender. At first, I was thinking that Clojure was definitely going to be my primary interest this break, but then I saw a Scala code sample on a website. It looked clean, read well, and ran relatively fast. So, I decided to delve more into the language and see what it was like, and after looking at a few code samples and what I could do with the language, I was sold. It (apparently) combines the speed of Java with an amazing static type system, and is all about concurrency. (Also, that Twitter uses the language for most/all of their backend only helped fuel my interest)

So, that's what I'll be doing in my spare time for a while. Thanks to the wonderful things that are libraries, I was able to acquire Programming Scala, 2nd ed yesterday. Said book has gotten really good reviews, so I'm looking forward to reading it. So far, I've made it to chapter 3/35, so I'll reserve my thoughts on the language as a whole until I get further. But I plan to spend most of today/tomorrow studying for finals, so I may get pretty far. Stay tuned!
Posted in Uncategorized

Programming on an iPad

(No, that's not a typo in the title. This post is going to legitimately be about programming on an iPad. In fact, I'm writing this article on it at the moment.)

Over the past year, I've been moving more and more to working "in the cloud". It's not been a conscious transition; simply one of convenience. I've just about always had a desktop and one (or more) laptops to keep files synchronized between. Dropbox is an absolutely amazing tool for this, but it's not perfect: it takes a while to sync at times, which is bad if you're developing on one box and trying to compile on another. There's also the issue of IDEs; Visual Studio is nice for Windows C++/C# dev, but isn't available on OS X or Linux; Code::Blocks never worked well for me under Windows, and I was left somewhat underwelmed for Linux. So, I ultimately ended up trying to learn Vim, and liking it. (Disclaimer: I have yet to put any real effort into EMACS. Vim works well for what I need it to do. If it stops working well, I'll see how EMACS works.)

So, I was using DropBox to synchronize files and Vim to write most of my code. This seemed a bit nonsensical after a while, because I could just use Vim over SSH (coupled with tmux) to develop. This way, I never had to worry about file synchronization problems; I had a solid, consistent set of tools to code with; and I had my state persist regardless of where I coded from. I could swap from one laptop to my desktop to another laptop without skipping a beat. Not to mention the battery life/heat/time saved by using my desktop for compiling as opposed to a laptop.

With software development removed from the equation, I started using my laptops as SSH clients. Of couse, there was some Facebooking/video watching included in that, but so long as I have a mobile device that can use SSH, I can use it for school work. Thus, ultrabooks became really appealing to me; a light, durable computer with excellent battery life (and a processor that can handle light compilation when needed). Then I found Mark O'Connor's blog at yieldthought.com; he had posted a few times on his blog about his experiences programming on an iPad 2.

At first, I was bewildered. I thought "how could someone deal with programming on an iPad, of all devices?" However, it piqued my interest, so I read on. He was using it as a thin client to ssh into a Linode and work on code; he spoke highly of the battery life and flexibility of the iPad. He told of working in fields, working on rooftop terraces, working in a wide variety of places. While I did not have such aspirations, the idea still interested me.

I was able to get my hands on an iPad (with a retina display) to give the idea a shot; worst case, if I hated it, it would make an excellent Christmas present or I could see if I could return/sell it, right? I got the Apple Wireless keyboard and a Smart Cover alongside it, because the idea of using a software keyboard (and lots of tiny scratches on the screen) annoyed me to no end. I loaded DropBox, iSSH ($10; SSH app), and Jump ($15; VNC app) on to the iPad and was on my way.

My first impressions of the iPad were a bit rough. I was used to a 15" screen to develop on, so shrinking down to 10" was a bit of an issue. At the same time, the iPad was noticeably lighter than anything else I had used. The lesser weight of the iPad was outweighed (no pun intended) by the smaller screen, but why not tough through it and see how things turn out, right?

After a few days, I decided to swap back to my ultrabook to see if I noticed a difference, and I did. I missed the iPad. Let me say, the ultrabook is no slouch, but it doesn't have true all-day battery life. Generally, it gets 5 hours of battery, so I had to lug the charger around with me. And the screen, while 13", wasn't as easy to read as the iPad's. Weight of the laptop itself wasn't noticeably different, but that I had to be conscious of the battery was annoying. Also, having everything automatically full-screened was missed... If only because having six windows opened simultaneously is more distracting than just having one full-screen window.

Since then, I've continued to use the iPad for software development as a sort-of laptop. The primary reasons that I am not using it day-to-day are few in number, but are rather important:
- I am a college student, which means I need to do online submissions. Thanks to no filesystem access, this is difficult without a third party web app, which I am not incredibly fond of using.
- I am in the Virginia Tech Cyber Security Club. As such, I am frequently in the company of people who enjoy sniffing Bluetooth traffic. As a user of a Bluetooth keyboard, this concerns me.
- Mobile Safari is, as Mark puts it, still a "toy browser"; the Blogger web app was moderately buggy when writing this. In fact, I've had to swap to Chrome on my desktop to finish this up (because shelling out $3 for a decent blogging app doesn't seem worth it)

The iPad, or even slate tablets in general, are excellent tools if one does most of his work in the cloud. They get amazing battery life, they're not as pricey as ultrabooks, and they are incredibly thin and light. At the time of writing, iOS is still unable to handle all of my daily needs (granted, this can be remedied by VNCing into a server/desktop), but it is a lot closer than I thought it would be. I would most definitely suggest it (and a read of Mark O'Connor's blog) to anyone who does most of their work in the cloud.

As always, thank you for reading; if you have thoughts, comments, questions, etc. Feel free to post below!
Posted in Uncategorized

Programming on an iPad

(No, that's not a typo in the title. This post is going to legitimately be about programming on an iPad. In fact, I'm writing this article on it at the moment.)

Over the past year, I've been moving more and more to working "in the cloud". It's not been a conscious transition; simply one of convenience. I've just about always had a desktop and one (or more) laptops to keep files synchronized between. Dropbox is an absolutely amazing tool for this, but it's not perfect: it takes a while to sync at times, which is bad if you're developing on one box and trying to compile on another. There's also the issue of IDEs; Visual Studio is nice for Windows C++/C# dev, but isn't available on OS X or Linux; Code::Blocks never worked well for me under Windows, and I was left somewhat underwelmed for Linux. So, I ultimately ended up trying to learn Vim, and liking it. (Disclaimer: I have yet to put any real effort into EMACS. Vim works well for what I need it to do. If it stops working well, I'll see how EMACS works.)

So, I was using DropBox to synchronize files and Vim to write most of my code. This seemed a bit nonsensical after a while, because I could just use Vim over SSH (coupled with tmux) to develop. This way, I never had to worry about file synchronization problems; I had a solid, consistent set of tools to code with; and I had my state persist regardless of where I coded from. I could swap from one laptop to my desktop to another laptop without skipping a beat. Not to mention the battery life/heat/time saved by using my desktop for compiling as opposed to a laptop.

With software development removed from the equation, I started using my laptops as SSH clients. Of couse, there was some Facebooking/video watching included in that, but so long as I have a mobile device that can use SSH, I can use it for school work. Thus, ultrabooks became really appealing to me; a light, durable computer with excellent battery life (and a processor that can handle light compilation when needed). Then I found Mark O'Connor's blog at yieldthought.com; he had posted a few times on his blog about his experiences programming on an iPad 2.

At first, I was bewildered. I thought "how could someone deal with programming on an iPad, of all devices?" However, it piqued my interest, so I read on. He was using it as a thin client to ssh into a Linode and work on code; he spoke highly of the battery life and flexibility of the iPad. He told of working in fields, working on rooftop terraces, working in a wide variety of places. While I did not have such aspirations, the idea still interested me.

I was able to get my hands on an iPad (with a retina display) to give the idea a shot; worst case, if I hated it, it would make an excellent Christmas present or I could see if I could return/sell it, right? I got the Apple Wireless keyboard and a Smart Cover alongside it, because the idea of using a software keyboard (and lots of tiny scratches on the screen) annoyed me to no end. I loaded DropBox, iSSH ($10; SSH app), and Jump ($15; VNC app) on to the iPad and was on my way.

My first impressions of the iPad were a bit rough. I was used to a 15" screen to develop on, so shrinking down to 10" was a bit of an issue. At the same time, the iPad was noticeably lighter than anything else I had used. The lesser weight of the iPad was outweighed (no pun intended) by the smaller screen, but why not tough through it and see how things turn out, right?

After a few days, I decided to swap back to my ultrabook to see if I noticed a difference, and I did. I missed the iPad. Let me say, the ultrabook is no slouch, but it doesn't have true all-day battery life. Generally, it gets 5 hours of battery, so I had to lug the charger around with me. And the screen, while 13", wasn't as easy to read as the iPad's. Weight of the laptop itself wasn't noticeably different, but that I had to be conscious of the battery was annoying. Also, having everything automatically full-screened was missed... If only because having six windows opened simultaneously is more distracting than just having one full-screen window.

Since then, I've continued to use the iPad for software development as a sort-of laptop. The primary reasons that I am not using it day-to-day are few in number, but are rather important:
- I am a college student, which means I need to do online submissions. Thanks to no filesystem access, this is difficult without a third party web app, which I am not incredibly fond of using.
- I am in the Virginia Tech Cyber Security Club. As such, I am frequently in the company of people who enjoy sniffing Bluetooth traffic. As a user of a Bluetooth keyboard, this concerns me.
- Mobile Safari is, as Mark puts it, still a "toy browser"; the Blogger web app was moderately buggy when writing this. In fact, I've had to swap to Chrome on my desktop to finish this up (because shelling out $3 for a decent blogging app doesn't seem worth it)

The iPad, or even slate tablets in general, are excellent tools if one does most of his work in the cloud. They get amazing battery life, they're not as pricey as ultrabooks, and they are incredibly thin and light. At the time of writing, iOS is still unable to handle all of my daily needs (granted, this can be remedied by VNCing into a server/desktop), but it is a lot closer than I thought it would be. I would most definitely suggest it (and a read of Mark O'Connor's blog) to anyone who does most of their work in the cloud.

As always, thank you for reading; if you have thoughts, comments, questions, etc. Feel free to post below!
Posted in Uncategorized

C++ "Tagging" + unique_ptr

As promised in my prior blog post, here's a quick blog entry that shows how to "tag" methods in C++, and why that may be useful.

In order to understand this, one must first know that C++ supports overloaded methods. This means that there can exist multiple methods in the same namespace with the same name, so long as their arguments differ. (In classes, there can also be methods overloaded on constness, but that's out of the scope of this post) For example, the following declarations makes two foo() methods. One takes a char, the other takes a string:
void foo(char);
void foo(std::string);

One other thing that should be known before this is that arguments can be explicitly ignored in a method definition. For example, take the following implementations of foo:

void foo(char c)
{

    std::cout << c;
}


void foo(std::string)
{
   std::cout << "No!" << std::endl;
}


Both of these implementations are valid, and will compile (barring silly typos on my part). The difference lies in that foo(std::string) does not name its argument. Thus, it is saying the argument is unneeded. This may allow for link-time/compile-time optimization wherein the string is passed in, but I'm not 100% sure on that. So take it worth a grain of salt for the moment. ;)

So what is tagging, then? Tagging is simply making dummy structs/classes and adding them to argument lists, in order to allow the compiler to take different actions based on what tags are passed in. For example, take the following code:

struct tag1 {};
struct tag2 {};

void println(tag1)
{
    std::cout << "Hello, ";
}


void println(tag2)
{
    std::cout << "world!" << std::endl;
}
// ...
println(tag1());
println(tag2());
// prints "Hello, world!"

How is this useful, though? Why not just make printHello() and printWorld()? Read on and see. :)

In C++11, there was introduced a sane version of std::auto_ptr: std::unique_ptr. This is much like a QScopedPointer for ye QT fans; it will automatically delete its contents when the pointer goes out of scope. Additionally, for efficiency reasons, the deleter to be used on the pointer when it goes out of scope is specified in the unique_ptr's template arguments. The issue with this approach is that unique_ptr takes a class to a deleter, not a function pointer. So, one can't simply write

std::unique_ptr<int, ::free> ptr(malloc(sizeof(int)));

in order to use malloc/free for a variable. Instead, one must write:

template <typename T>
struct CFree
{
    void operator()(T* p) 
    {
        free(p);
    }
};
// ...
std::unique_ptr<int, CFree<int> > ptr(static_cast<int*>(malloc(sizeof(int))));


This gets to be not-so-fun after a short while. A better way is needed to create things that can use custom deleters. Enter: tagging.

So how does tagging help with something like this? Let's build a quick std::unique_ptr factory with a generic deleter class, and we'll see!

struct cfree {};

template <typename T>
void doDelete(cfree, T* mem)
{
    free(mem);
}


template <typename T, class Tag> 
struct OurDeleter
{
    void operator()(T* mem)
    {

        doDelete(Tag(), mem);
    }
};

template <typename T, class Tag>
struct UniquePtrFactory
{

    typedef std::unique_ptr<T, OurDeleter<T, Tag> > type;
};


// ...

UniquePtrFactory<int, cfree>::type ptr(malloc(sizeof(int)));

Ok, now I've thrown a lot of code at you, and it's ended up being a lot more code (and a lot more complicated) than the CFree method. What does this buy us? Let's pick the code apart.

First, we'll start off with the last line, where we make our pointer.

UniquePtrFactory<int, cfree>::type

is actually just

std::unique_ptr<T, OurDeleter<T, Tag> >

which, if we do manual template argument substitution, becomes

std::unique_ptr<int, OurDeleter<int, cfree> >

Ok, cool. So the UniquePtrFactory<int, cfree>::type is just a unique_ptr with a deleter OurDeleter<int, cfree>. What does that bit do? Substituting in template args, we get the following for the new OurDeleter struct:


struct OurDeleter
{
    void operator()(int* mem)
    {

        doDelete(cfree(), mem);
    }
};

That's not too bad. The only confusing part here is cfree(), which is just constructing the cfree struct in place and passing it to doDelete, which doesn't even use the object. Considering that cfree is an empty struct and is unused, it shouldn't be constructed at all.

So, this is a glorified version of the CFree class above. It, in fact, gives us the same result as the CFree class above (with maybe one level of redirection of a function call, depending on whether the definition of doDelete(cfree, T) is available at the time). What does it and tagging even do for us? 

Reusability. What happens when one wants to make a custom deleter to call delete[]? With the CFree class method, he would have to create another class. With the slightly-more-complicated example above, it's easier, and (in my opinion) more clear/straightforward:

struct arrdelete {};

template <typename T>
void doDelete(arrdelete, T* mem)
{
    delete[] mem;
}

// ...
UniquePtrFactory<int, arrdelete>::type arrai(new int[10]);

And that's how to use tagging to make use of things like std::unique_ptr easier. Another cool application for this is as a general RAII container; if you make the deleter a method that unlocks a std::mutex, what's to keep you from using it as a container that will automatically unlock mutexes for you? Because this post has been a bit long already, I'll leave something like that as an exercise to the reader.

Questions/thoughts? Feel free to post below! Thanks for reading.
Posted in Uncategorized

C++ "Tagging" + unique_ptr

As promised in my prior blog post, here's a quick blog entry that shows how to "tag" methods in C++, and why that may be useful.

In order to understand this, one must first know that C++ supports overloaded methods. This means that there can exist multiple methods in the same namespace with the same name, so long as their arguments differ. (In classes, there can also be methods overloaded on constness, but that's out of the scope of this post) For example, the following declarations makes two foo() methods. One takes a char, the other takes a string:
void foo(char);
void foo(std::string);

One other thing that should be known before this is that arguments can be explicitly ignored in a method definition. For example, take the following implementations of foo:

void foo(char c)
{

    std::cout << c;
}


void foo(std::string)
{
   std::cout << "No!" << std::endl;
}


Both of these implementations are valid, and will compile (barring silly typos on my part). The difference lies in that foo(std::string) does not name its argument. Thus, it is saying the argument is unneeded. This may allow for link-time/compile-time optimization wherein the string is passed in, but I'm not 100% sure on that. So take it worth a grain of salt for the moment. ;)

So what is tagging, then? Tagging is simply making dummy structs/classes and adding them to argument lists, in order to allow the compiler to take different actions based on what tags are passed in. For example, take the following code:

struct tag1 {};
struct tag2 {};

void println(tag1)
{
    std::cout << "Hello, ";
}


void println(tag2)
{
    std::cout << "world!" << std::endl;
}
// ...
println(tag1());
println(tag2());
// prints "Hello, world!"

How is this useful, though? Why not just make printHello() and printWorld()? Read on and see. :)

In C++11, there was introduced a sane version of std::auto_ptr: std::unique_ptr. This is much like a QScopedPointer for ye QT fans; it will automatically delete its contents when the pointer goes out of scope. Additionally, for efficiency reasons, the deleter to be used on the pointer when it goes out of scope is specified in the unique_ptr's template arguments. The issue with this approach is that unique_ptr takes a class to a deleter, not a function pointer. So, one can't simply write

std::unique_ptr<int, ::free> ptr(malloc(sizeof(int)));

in order to use malloc/free for a variable. Instead, one must write:

template <typename T>
struct CFree
{
    void operator()(T* p) 
    {
        free(p);
    }
};
// ...
std::unique_ptr<int, CFree<int> > ptr(static_cast<int*>(malloc(sizeof(int))));


This gets to be not-so-fun after a short while. A better way is needed to create things that can use custom deleters. Enter: tagging.

So how does tagging help with something like this? Let's build a quick std::unique_ptr factory with a generic deleter class, and we'll see!

struct cfree {};

template <typename T>
void doDelete(cfree, T* mem)
{
    free(mem);
}


template <typename T, class Tag> 
struct OurDeleter
{
    void operator()(T* mem)
    {

        doDelete(Tag(), mem);
    }
};

template <typename T, class Tag>
struct UniquePtrFactory
{

    typedef std::unique_ptr<T, OurDeleter<T, Tag> > type;
};


// ...

UniquePtrFactory<int, cfree>::type ptr(malloc(sizeof(int)));

Ok, now I've thrown a lot of code at you, and it's ended up being a lot more code (and a lot more complicated) than the CFree method. What does this buy us? Let's pick the code apart.

First, we'll start off with the last line, where we make our pointer.

UniquePtrFactory<int, cfree>::type

is actually just

std::unique_ptr<T, OurDeleter<T, Tag> >

which, if we do manual template argument substitution, becomes

std::unique_ptr<int, OurDeleter<int, cfree> >

Ok, cool. So the UniquePtrFactory<int, cfree>::type is just a unique_ptr with a deleter OurDeleter<int, cfree>. What does that bit do? Substituting in template args, we get the following for the new OurDeleter struct:


struct OurDeleter
{
    void operator()(int* mem)
    {

        doDelete(cfree(), mem);
    }
};

That's not too bad. The only confusing part here is cfree(), which is just constructing the cfree struct in place and passing it to doDelete, which doesn't even use the object. Considering that cfree is an empty struct and is unused, it shouldn't be constructed at all.

So, this is a glorified version of the CFree class above. It, in fact, gives us the same result as the CFree class above (with maybe one level of redirection of a function call, depending on whether the definition of doDelete(cfree, T) is available at the time). What does it and tagging even do for us? 

Reusability. What happens when one wants to make a custom deleter to call delete[]? With the CFree class method, he would have to create another class. With the slightly-more-complicated example above, it's easier, and (in my opinion) more clear/straightforward:

struct arrdelete {};

template <typename T>
void doDelete(arrdelete, T* mem)
{
    delete[] mem;
}

// ...
UniquePtrFactory<int, arrdelete>::type arrai(new int[10]);

And that's how to use tagging to make use of things like std::unique_ptr easier. Another cool application for this is as a general RAII container; if you make the deleter a method that unlocks a std::mutex, what's to keep you from using it as a container that will automatically unlock mutexes for you? Because this post has been a bit long already, I'll leave something like that as an exercise to the reader.

Questions/thoughts? Feel free to post below! Thanks for reading.
Posted in Uncategorized