This is the first post on data science for this blog. See this post for an idea what this blog is about.
Programming is immensely important for data analysis. I didn’t start programming until I was in college, but it is now something I do almost every day. There are some things Microsoft Excel is good for, but these days you often need more power.
Technical computing requires some special tools. The following properties are required of a language:
- The ability for interactive sessions (can help speed development)
- Linear algebra operations (matrix, vector operations)
- Quick and powerful graphing
- A large library of built in functions (statistics, machine learning, etc)
In this post I will summarize the pros and cons of MATLAB and python for scientific computing. Both have pros and cons. Spoiler alert (or TLDR for the millennials out there): I see it as unavoidable to have to use both in my work, but I prefer python if I have the choice.
I really cut my teeth in scientific computing using MATLAB. If you really master it, you can do some pretty amazing things. I would say 90% of my research since I got my PhD uses MATLAB.
MATLAB has some definite benefits:
- Very polished IDE. You have a nice interactive workspace and variable viewer in the main window. The editor has really become more modern in past years with nice features like auto-completion and variable highlighting.
- Big set of libraries. There seems to be an ever expanding universe of toolboxes that do low-level tasks for you so you can get to the real science.
- Powerful Graphing. There is a steep learning curve to learn to graph by code, but it is pretty powerful and easy once you get the hang of it.
- Pretty good support at Universities. Virginia Tech has a pretty comprehensive license that covers most toolboxes. The Advanced Research Computing center at VT has a parallel MATLAB cluster that’s available to students and faculty, as an example of university level support.
- Interaction with 3rd party software. I use some software that plugs into MATLAB/Simulink (CarSim, PreScan).
- Simulink. I don’t use it much, but there are some capabilities you do not have with other programming languages.
But there are also drawbacks:
- Closed source. It is not always obvious how MathWorks computes things.
- Expensive. I am lucky that VT is a big university with a big budget. I know other universities and companies have to shell out big bucks to get MATLAB.
- MATLAB is lacking as a modern programming language. MathWorks claims MATLAB is a comprehensive language, but it lacks some features of a modern programming language. I won’t go into much detail here1, but if you go far enough with MATLAB, eventually you will find yourself frustrated with some shortcomings that are not present in other languages (like python).
In comes python, something I have been trying out for the last few months and am slowly transition more and more to. Using the NumPy, SciPy, and matplotlib packages, you can do everything you can do in MATLAB in python.
Python has its strengths:
- Python is free. As in speech and beer, as they like to say.
- Open source. You can inspect most everything you use (you may need to be a python wiz to understand it all though).
- Easy to read. If soccer is the beautiful game, python is the beautiful language. A main design goal of python is readability.
- A powerful, full-featured language. The python standard library will handle almost everything you will encounter: XML parsing, web browsing, file systems, hashing. You can do things in python that take some clugey java backdoors in MATLAB.
- Classes and functions can be defined anywhere. This really drives me mad in MATLAB2. It makes good object oriented (and functional) programming very easy in python.
But there are some drawbacks to Python:
- Python is not as neatly packaged as MATLAB. The language installs fine on all operating systems, but then you have to pick an IDE and make sure you have all the packages you need installed (which can be an unruly process).
- Engineers might not be as versed in python as MATLAB. A lot of engineering programs use MATLAB in their coursework, so python could be another skill that needs to be learned.
These drawbacks might seem scary (as they did to me), but can easily be overcome.
There are some portable scientific python installations that make using python more like the all-in-one solution that MATLAB delivers. Portable means that you download a zip, then extract the files to a folder that lives (anywhere) on your computer. This folder contains your IDE and all the required packages for scientific computing without installing anything. This is especially useful for maintaining different versions of python 3. My favorite right now is WinPython. It uses the Spyder IDE, which will look sort of familiar to MATLAB users. I have also tried and can recommend Pyzo.
Scientific computing in python looks very similar to MATLAB. My experience is that the MATLAB they teach in school is pretty basic anyway. Most people that suddenly find themselves doing a lot of programming (like I did) will need to learn a lot. Whether they learn MATLAB or python makes little difference. Someone who is already a MATLAB wiz will easily be able to pick up scientific computing in python.
So which one?
I will take the easy road and say: both.
Personally, I do not see how I can completely ditch MATLAB because of my dependency on 3rd party software I use regularly in my work. But, I prefer python. It’s easier to read, it’s easier to write, and it gives me the most power. Being free might also be a big factor for some.
My advice to someone starting out from scratch would be MATLAB. It works 100% out of the box with minimal setup and has a large community for help. I did many a google search of “how do I do X in matlab” when I was learning. Skill in MATLAB will easily transfer to python. If you go far enough with MATLAB, like I have, you will eventually start to get annoyed with certain limitations. I suspect that the average user might not see any difference between MATLAB and python.
If you have been meaning to ditch excel and do more work in programming language, try out either MATLAB or python.
- The folks at Pyzo have a very nice python vs matlab discussion.
- NumPy, the main numerical python library, has a great NumPy for MATLAB users that gives recipes for doing things in both languages. I used this a lot while I was transitioning.
- I came across a very witty and biting blog, Abandon MATLAB a little while ago. At first, as a ardent MATLAB users, I was almost offended. About the same time I started to hit the MATLAB ceiling and started to see the error in my ways.
1 My main gripes with MATLAB: only has a global namespace, is not fully introspective, has poor string handling/regexp functions, it’s a pain to define classes (and to a certain extent functions), and the GUI library is limited.
2 In MATLAB, functions cannot be define in script files. You can define multiple functions in a file that starts with a function. I have recently resorted to making my main script into a function, but that makes debugging a real pain. Not a problem in python 🙂
3 Python is in a sort of transition right now. The “future” is Python version 3 (currently 3.3.3). There are still some libraries that only work on the older version (Python 2.7), but this is becoming rare. All of the scientific python packages work in Python 3.X. If you have to chose, the conventional wisdom is to use the newest version possible. It might not be uncommon to have to maintain both versions on your machine to run older libraries.