*This is the first post on data science for this blog. See this post for an idea what this blog is about.*

Programming is immensely important for data analysis. I didn’t start programming until I was in college, but it is now something I do almost every day. There are some things Microsoft Excel is good for, but these days you often need more power.

Technical computing requires some special tools. The following properties are required of a language:

- The ability for interactive sessions (can help speed development)
- Linear algebra operations (matrix, vector operations)
- Quick and powerful graphing
- A large library of built in functions (statistics, machine learning, etc)

In this post I will summarize the pros and cons of MATLAB and python for scientific computing. Both have pros and cons. Spoiler alert (or TLDR for the millennials out there): I see it as unavoidable to have to use both in my work, but I prefer python if I have the choice.

# MATLAB

I really cut my teeth in scientific computing using MATLAB. If you really master it, you can do some pretty amazing things. I would say 90% of my research since I got my PhD uses MATLAB.

MATLAB has some definite benefits:

**Very polished IDE**. You have a nice interactive workspace and variable viewer in the main window. The editor has really become more modern in past years with nice features like auto-completion and variable highlighting.**Big set of libraries**. There seems to be an ever expanding universe of toolboxes that do low-level tasks for you so you can get to the real science.**Powerful Graphing**. There is a steep learning curve to learn to graph by code, but it is pretty powerful and easy once you get the hang of it.**Pretty good support at Universities**. Virginia Tech has a pretty comprehensive license that covers most toolboxes. The Advanced Research Computing center at VT has a parallel MATLAB cluster that’s available to students and faculty, as an example of university level support.**Interaction with 3rd party software**. I use some software that plugs into MATLAB/Simulink (CarSim, PreScan).**Simulink**. I don’t use it much, but there are some capabilities you do not have with other programming languages.

But there are also drawbacks:

**Closed source**. It is not always obvious how MathWorks computes things.**Expensive**. I am lucky that VT is a big university with a big budget. I know other universities and companies have to shell out big bucks to get MATLAB.**MATLAB is lacking as a modern programming language**. MathWorks claims MATLAB is a comprehensive language, but it lacks some features of a modern programming language. I won’t go into much detail here^{1}, but if you go far enough with MATLAB, eventually you will find yourself frustrated with some shortcomings that are not present in other languages (like python).

# Python

In comes python, something I have been trying out for the last few months and am slowly transition more and more to. Using the NumPy, SciPy, and matplotlib packages, you can do everything you can do in MATLAB in python.

Python has its strengths:

**Python is free**. As in speech and beer, as they like to say.**Open source**. You can inspect most everything you use (you may need to be a python wiz to understand it all though).**Easy to read**. If soccer is the beautiful game, python is the beautiful language. A main design goal of python is readability.**A powerful, full-featured language**. The python standard library will handle almost everything you will encounter: XML parsing, web browsing, file systems, hashing. You can do things in python that take some clugey java backdoors in MATLAB.**Classes and functions can be defined anywhere**. This really drives me mad in MATLAB^{2}. It makes good object oriented (and functional) programming very easy in python.

But there are some drawbacks to Python:

**Python is not as neatly packaged as MATLAB**. The language installs fine on all operating systems, but then you have to pick an IDE and make sure you have all the packages you need installed (which can be an unruly process).**Engineers might not be as versed in python as MATLAB**. A lot of engineering programs use MATLAB in their coursework, so python could be another skill that needs to be learned.

These drawbacks might seem scary (as they did to me), but can easily be overcome.

There are some portable scientific python installations that make using python more like the all-in-one solution that MATLAB delivers. Portable means that you download a zip, then extract the files to a folder that lives (anywhere) on your computer. This folder contains your IDE and all the required packages for scientific computing without installing anything. This is especially useful for maintaining different versions of python ^{3}. My favorite right now is WinPython. It uses the Spyder IDE, which will look sort of familiar to MATLAB users. I have also tried and can recommend Pyzo.

Scientific computing in python looks very similar to MATLAB. My experience is that the MATLAB they teach in school is pretty basic anyway. Most people that suddenly find themselves doing a lot of programming (like I did) will need to learn a lot. Whether they learn MATLAB or python makes little difference. Someone who is already a MATLAB wiz will easily be able to pick up scientific computing in python.

# So which one?

I will take the easy road and say: **both**.

Personally, I do not see how I can completely ditch MATLAB because of my dependency on 3rd party software I use regularly in my work. But, I prefer python. It’s easier to read, it’s easier to write, and it gives me the most power. Being free might also be a big factor for some.

My advice to someone starting out from scratch would be MATLAB. It works 100% out of the box with minimal setup and has a large community for help. I did many a google search of “how do I do X in matlab” when I was learning. Skill in MATLAB will easily transfer to python. If you go far enough with MATLAB, like I have, you will eventually start to get annoyed with certain limitations. I suspect that the average user might not see any difference between MATLAB and python.

If you have been meaning to ditch excel and do more work in programming language, try out either MATLAB or python.

# Further Reading

- The folks at Pyzo have a very nice python vs matlab discussion.
- NumPy, the main numerical python library, has a great NumPy for MATLAB users that gives recipes for doing things in both languages. I used this a lot while I was transitioning.
- I came across a very witty and biting blog, Abandon MATLAB a little while ago. At first, as a ardent MATLAB users, I was almost offended. About the same time I started to hit the MATLAB ceiling and started to see the error in my ways.

# Footnotes

^{1} My main gripes with MATLAB: only has a global namespace, is not fully introspective, has poor string handling/regexp functions, it’s a pain to define classes (and to a certain extent functions), and the GUI library is limited.

^{2} In MATLAB, functions cannot be define in script files. You can define multiple functions in a file that starts with a function. I have recently resorted to making my main script into a function, but that makes debugging a real pain. Not a problem in python ðŸ™‚

^{3} Python is in a sort of transition right now. The “future” is Python version 3 (currently 3.3.3). There are still some libraries that only work on the older version (Python 2.7), but this is becoming rare. All of the scientific python packages work in Python 3.X. If you have to chose, the conventional wisdom is to use the newest version possible. It might not be uncommon to have to maintain both versions on your machine to run older libraries.

johnny OntheSpotNice comparison!

Also, it’s “free: as in freedom, not as in beer”.

Luis GoncalvesI was also “raised on Matlab”, so to speak, having used it extensively (daily!) for more than a decade.

I agree that Python is much more of a complete language than Matlab is … but there are a lot of little things that are great in Matlab and a pain in Python, and they add up (especially if your focus is scientific computation):

1 – the syntax for mathematics (matrix computation) in Matlab is much cleaner than in Python. I think this is mainly because numpy is an add-on to python (I think of it as an afterthought :), whereas Matlab was designed primarily for matrix computation. For example, if A is a matrix and b a vector of the right size, matrix-vector multiplication is just A*b in Matlab, but has to be written as a.dot(b), or np.matmul(a,b) in Python. Numpy even has two types of matrices/arrays, with different properties and not interchangeable .. you end up using np.arrays because they are more versatile than np.matrix .. even though the syntax is clumsier.

2 – plotting in Matlab is much more powerful. Again, this is because plotting is an add-on to Python, whereas it is a core functionality for Matlab. You can easily make 3D plots that you can rotate and spin (continuous rotation) in Matlab. Even just the plot manipulation functions are more useful, such as clicking on a point to zoom in by a factor of 2 — this is super useful to me, because I often need to compare “before and after” images, and the best way to do that, in Matlab, is to have two figures right on top of each other, turn “zoom on” on both, and then place the mouse exactly where I want to zoom in on, and 1) click first figure, 2) ctrl-tab to switch to other figure, 3) click on new figure at the same spot. The mouse hasn’t moved, and I’ve zoomed in both plots exactly the same way, so now ctrl-alt let’s me visually alternate between the two figures and the before-and-after is very discernible. Basically, matplotlib is clunky and not as feature rich.

3 – Matlab, for numerical computation is almost always faster. Some 3rd party modules (like scipy.ndimage.zoom) are incredibly slow. The reason it that all the official packages for Matlab have been optimized for speed, whereas with a 3rd party module in Python there is not as much of a guarantee for that. (Also, Matlab’s vectorized computation is super optimized for speed, so it’s easy to write super fast code) One amazing example (if you happen to care about dimensionality reduction and data visualization) is the code for t-SNE available online by the author. The Python code is 100 times slower.

4 – Debugging in Matlab is super easy and powerful. In Matlab, a “keyboard” **anywhere** in your code stops the computation and gives you the prompt, as if you had just typed in all the code up to that point. That means you can look at all the variables, change them, plot whatever you want, move up and down the call stack to see other variables, and then single step through the code, or just “return” to run the rest of the code. As far as I know, there’s still nothing like that in Python. There’s more than one “debugger” module, but none of them as powerful or easy to use. Even in iPython (the friendliest of the IDEs) you can’t get back to the “normal” iPython prompt in debug mode.

Because all of the open-source Deep Learning frameworks are in Python, I’ve been using Python exclusively (no Matlab!) for over a year now (to the point that I’m even forgetting some of the Matlab syntax — scary!) — but all the things above are still problems, and frustrate me.

My view of Python is that it was created by a non-mathematician/scientist/engineer (ie, a “pure” CS guy), who realized that a high level, interpreted, prompt-based language can be to C/C++ what C/C++ is to assembly code — it gives you power to do some things (prototyping, say) much more quickly and easily (to see this expressed comically, try ” import antigravity ” in Python ðŸ™‚

Interestingly, there is a new open-source language from MIT, Julia (julialang.org) which may be *the ultimate* language!

Julia combines the speed of C/C++, with the numerics syntax of Matlab, and the flexibility of Python. It’s a work in progress, but they’ve come a long way!! You can use it with jupyter notebook, and people are contributing a lot of modules. On the main page of the website, they have a small benchmark showing the speed of C, Julia, Python, Matlab, Mathematica, and other languages, showing that Julia approaches the speed of C (and where you can see that Python usually *is* slower than Matlab). The goal of Julia is to be able to go from prototyping (as with Matlab and Python) to production (as in fast C code) within the same language — and they seem to be doing it!!

KrisPost authorHi Luis,

Thanks for your detailed post. It’s been a few years since I authored this post. Since then, I have switched jobs and have been almost exclusively developing in python for the last year (I don’t even have matlab installed on my personal machine anymore).

To respond to your points:

1 – Agreed, numpy matrices are clunky.

2 – I disagree. In my experience, matplotlib is

betterthan matlab plotting in almost every case EXCEPT for 3D plots. I have ran into a wall before with matlab plots where I could not get the result I wanted (e.g. setting the axes label a different color than the axes labels). matplotlib has better fine grained control and is more object oriented than matlab. (e.g., matplotlib.patches). The situation you give about zooming simultaneously on subplots is trivial in matplotlib (the sharex option synchronizes the x-axis). I had to get a user contributed library to have this functionality in matlab:import matplotlib.pyplot as plt

import numpy as np

x = np.arange(0, 4*np.pi, 0.1)

y1 = np.sin(x)

y2 = np.sin(x - np.pi/4)

fig, ax = plt.subplots(2, 1, sharex=True)

ax[0].plot(x, y1, '.-b')

ax[1].plot(x, y2, '.-r')

plt.show()

3 – I also disagree that matlab is always faster. Both matlab and python suffer from being interpreted languages. Both can also benefit from C-compiled code. In general both have “slow” libraries. Your millage will vary depending on what libraries you use.

4 – I will give you debugging is a little easier in matlab since it is built in to the IDE. But, python has the pdb module, which is just as functional as matlab debugging, albeit with a command line interface (c++ programmers might prefer pdb since it has a similar interface to the GNU gdb).

Julia is awesome! I dabbled with it a while ago (I guess after I authored the original post). At that time, it was almost unusable because loading modules (plotting, whatever) took FOREVER. It didn’t matter that my code executed at near-c-speed if it took 20-30 s at the beginning to load modules. I think there was a fix in the works (something like the module caching python does), but I have not looked into in a while.

Luis GoncalvesThis post : http://cyrille.rossant.net/whats-wrong-with-scientific-python/

further expands on the limitations of numpy and matplotlib, and argues perhaps more clearly than I did that Python is a superior general-purpose language, but not so for scientific computation. The post is more than two years old, though.

KrisPost authorThanks! I learned something new (numpy take() function).