Which Programming Language is Best for Quants?

Posted on July 2, 2011 by

19


if there is one…


As a professor of computing, I’m sometimes asked “Which programming language is best?” The answer depends significantly on the application, but there are other important factors as well…

read related articles

If you’re building a company around code, the commitment to a particular programming language is an important business decision. I know of well run fund management companies that rely on C, R, C++, Java, MATLAB, ML, Python and PERL. I’ll bet there are others that rely on Visual BASIC, Mathematica, and assembly language as well.

a checklist…

Here’s a checklist of considerations you might think about when making this decision:

  • Which language do you think in best?

If you’re the lead developer or CTO, at least some consideration should be given to the languages you’re able to develop in most fluently and easily. This will enable you to participate more effectively in setting the direction of your company’s code. A closely related consideration is: Which languages provide the best expressivity for your tasks?  Many quants like to manipulate large matrices, and MATLAB is very good at that, as is Python/NumPy, which has a similar notation for matrices.

  • Will you be leading a large collaborative team?

In this case you may need to give priority to languages that provide well-defined interface boundaries so that programming tasks can be partitioned easily between developers. Java is probably the best alternative in this case, and C is probably the worst.

  • How important is speed?

If you’re code must make split-second trading decisions, you’ll probably have to be writing in C or C++.  It’s my opinion that a culture of “machismo” has arisen around these languages that leads to an overemphasis on speed, and prevents developers from making thoughtful decisions in this regard. Yes, these are the fastest languages, but there are costs for development management and systems reliability. C/C++ is the language most susceptible to memory leaks and other sorts of bugs.  These languages are also more difficult to develop in collaboratively.

You can often get the performance of C in other languages through library calls.  MATLAB for instance makes strong use of underlying C libraries.

  • How well does the language span your company’s tasks?

Some languages, like R, are great for modeling, but don’t have the robustness or speed you might need for execution. Other languages, like Python, are pretty good at both ends of the spectrum.

  • What about multiple languages?

Avoid if at all possible.  It’s best if your developers are all thinking in the same language, and can be re-targeted to different coding tasks within the company.  I once led a project in which our team used 4 different programming languages.  It was a nightmare.  There are a number of reasons this was a bad choice: Most critically we created problems for ourselves at the boundaries between the languages.

On the other hand, I can imagine some situations where a combination of two languages might make sense.  I know of a fund for instance, that does their modeling in R, but their execution in C (because they’re an HFT shop).

  • How strong is the developer community?

R, Python (NumPY) and MATLAB lead in this arena.  They all have strong communities developing code for numerical analysis and quantitative finance.  R is especially strong in the finance community; many academic researchers do their work in R.  There’s even an annual R in Finance conference that’s worth attending even if you’re not a R user.  The numerical analysis community in Java seems to have evaporated over the last few years.

  • Open source or closed source?

Even though MATLAB is a great choice for quant work, the fact that it is a closed source platform is sometimes problematic.  The developer community is slowed a bit in this environment. And even if you’re using an open platform (like Python) you might consider a proprietary library, e.g., for optimization. Consider the costs of these choices carefully — and I don’t mean just the licensing costs.

We chose Python…  

In the end, our company (Lucena Research, LLC) chose Python.  Here’s why: I came to appreciate the power of MATLAB’s matrix syntax during my time at Cerebellum Capital. It’s very expressive and well suited for quant work. But I didn’t like the closed source model of MATLAB. Python offers the sam syntax plus the power of the active NumPy community. We also discovered a very nice open source toolkit for financial time series called Pandas.

Finally, we’re not an HFT shop, so we weren’t constrained by a need for speed.  I should say though that Python’s numerical routines are written in C and Fortran, so we get the best of both worlds.

Some related links… 

About these ads
Tagged: , , , ,
Posted in: research, technology