Thursday, April 21, 2011

Survival skills for today's analyst

I suffer a little from the age-old affliction of contrarianism. If a software package is used by the majority of the population, I assume it is flawed, highly limited, and its continued use will ultimately result in the downfall of the human race. Conversely, I am always extremely interested in a piece of software that has spread no further than the ivory tower in which it was first conceived.

The most longstanding example of this is my profound preference for the statistical computing language, R, over Microsoft Excel–a program in which I have begrudgingly developed an extremely high level of expertise. As every analyst knows, in the world of statistical software Excel is like McDonald's, Burger King, Pizza Hut, and KFC all rolled into one. It is so prepackaged and devoid of customization, yet so ubiquitous that we cannot do without it. Like the fast-food chains, we loathe Excel because it always produces the same graphs, the same simple statistical analyses. Yet when we find ourselves lost in a strange, unfriendly foreign country we go running back to the grid lines of excel to order a Big Mac. As soon as we enter the jungle, our survival skills are found wanting.

Yes my friends, like it or not, Excel is here to stay although not for lack of alternatives. The fact is that it is the user-friendly nature of this program that has been the key to its success. A friend of mine once put it thus: "Excel has allowed a generation of knowledge workers to survive without being able to program."

In truth, the driving force behind Excel's success is simple: Excel is easy. Oh I know that the die-hards will talk about how it is a superior visual tool, and that spreadsheets allow for increased transparency in financial models. But this argument falls flat on its face when we introduce macros to the equation; if spreadsheets are about transparency then why do we add VBA scripts that the user can neither see nor understand. And if we are happy to use scripts at some level, why on earth do we need to do everything else in a cumbersome visual environment.

Furthermore, the simple ends to which Excel users put their tools is demonstrated by the tiny fraction of users use the (admittedly limited) functionality afforded by VBA. That so many users can get by without loops, functions, or any notion of encapsulation is testament to the primitive uses to which Excel is put: it is just a big button calculator with an autofill feature. Surely there are more skills that we need to survive in the analytical jungle.

Finding an alternative
As I write this, I am sure that I have just alienated the entire community of so called "Power Excel Users". But I am sure that many engineers, economists, and scientists will agree that Excel is too limited to be the only quantitative tool that you have available in your office. The problem is finding software that you can successfully use in an office environment, and that is worth investing the time in learning.

The most important obstacle to overcome is the cost barrier. One of my friends, Rex, works for a major insurance group in their risk division. As far as I can tell, there are few organisations as willing to shell out money on analytical software as an insurance company. As a result, Rex regularly tells me about the wonderful software package that they just bought for $X million. These packages are highly customised and very user friendly (that's why they cost big dollars). The problem is, what happens when the company's systems change, or when you need to solve a new problem? Moreover, how does Rex do his job when he no longer has access to the software (ie, if he moves to another job). The cost of these highly customized packages means that they are not useful tools to acquire for your repertoire. As a rule of thumb, if it costs more than the latest version of Excel then assume that it is not portable: you cannot take it with you.

Enter R
Since I am a contrarian, I am sure that my advice should be taken with a grain (if not a barrel) of salt. However, I believe that there is now a viable alternative to Excel: R. R has been around for a long time, but it has taken a while to gain the following that it so rightfully deserves.

R is completely free and thus available at your fingertips wherever you go. No need to negotiate with the boss about breaking the budget for some fancy new piece of software. Download the binary, install it, and you are good to go. The advantage of this is not just that it is freely available, but that you can rely on it being available.

That just leaves its functionality, and my friends the good news is that R has functionality in spades. Take a quick look at its graphical features and you will see that almost any chart or graph you can dream of can be generated in R. In addition, the R community is continually adding new packages with new functions. In the last few years, the development of these packages has exploded in line with growth in the user base.

Transcending Excel and transitioning to R
Having used R for a reasonable amount of time, I find it hard to see why other analysts struggle day-in day-out with Excel. However, the great barrier to using R is that it is one step closer to all-out coding. Run through an interpreter, R seems strange and frightening to the non-programmer. If you have never learned a programming language, then chances are it will take you some time to shift to R.

Another issue is the need for other people to have the ability to review, check, and edit your work. Unless your boss is up to speed with R or is willing for your work to be checked by another R-literate colleague, you may have to stick with Excel for the moment.

There is, however, great scope for the analyst to grow their organisation into R over time. Whenever you are asked to do a self-contained piece of work independently, try doing it in R. I tend to go overboard and try to create advanced graphics that showcase R's capabilities. The majority of the time, people ask how I made the graph and are then keen to see what else R can do.

Into the jungle
As the old saying goes, "to the man that has only a hammer, every problem looks like a nail". At the moment, there are an awful lot of organisations that are filled with people who only have Excel and every problem sure looks like a spreadsheet.

I believe that analysts that fail to expand their toolkit tend to lose the ability to solve new problems. The generation of knowledge workers who are now in their 40s may have been lucky enough to survive on nothing more than their spreadsheet skills. However, as a twenty-something making my way in the business world, I cannot see how an analyst will be able to survive without some high-powered programming in their utility belt. R may not be enough on its own, but it seems like a good starting point.

Good luck in the analytical jungle.


  1. I agree 50%. Excel is overused in my organization too. Some spreadsheets are incredibly complicated, so that it is near impossible to debug them--a few lines of Python or R would be much easier to maintain. Others try to use excel when a database should be used (V/HLOOKUP ARRGH!).

    If "R seems strange and frightening to the non-programmer", I assure that you that it seems even stranger to programmers! R is one of the nicest statistical DSLs, but it is also an ugly, unpredictable beast that tends to encourage poor coding habits. Code often switches from OO to procedural to functional paradigms in a single script, it has inconsistent naming among packages (and in the R Core too). To get many things done, one often has to inline C code. R is a mess, its packages are more of a mess and most R-code that I see in companies is barf-worthy, bug prone, ad hoc and slow as molasses (having been written by non-programmer Excel converts). I don't agree that we should be convincing non-programmers to try R, they just make a mess. These folks should be encouraged to learn a good SQL database--that would take care of much of the Excel problem. Then, they need a good data analysis platform like SPSS or Stata that will allow them to do proper analyses and make graphs, but won't give them enough rope to hang themselves.

    Face it, most people just aren't up to learning to PROPERLY program. That doesn't mean they shouldn't use better tools, but I don't think R is the right choice for those who aren't motivated to learn to program. R is hard, and it takes at least a few months for an experienced programmer to learn to use it properly. Most folks are not going to invest the time to learn the basics. You can look forward to trying to debug thousands of lines of R spaghetti code in the coming year if you stay on your soapbox.

  2. Al. Thanks very much your comment. I can see that there are strong feelings from both the Excel and Programmer communities. As for the problems associated with R, I understand completely where you are coming from. In particular, I can understand the frustration that you, as an experienced programmer, face in working with the poor R code of the non-programmer.

    However, I think that although far from perfect R may be a stepping stone in the history of programming languages. After a time, programming skills may developed in the wider analytic community and languages like Python will become common tools for analysts. Of course, that seems like a highly optimistic view when you spend your days debugging nightmare code created by business majors.

    Thanks again for your comments.

  3. I must have missed this post of yours. I share every thoughts you have here. I have worked in a research institue for 3 years and am working for a GOVT ministry now. I use R on the daily base. Guess what? The work I have done have ALWAYS stand out because of R. For example, quickness to process complicated data analysis and superior visualization (see some examples from my blog They are the value added service but not a simple line or bar plot produced using M$ Excel.

    I must say that there may not be rooms for analysts without programming skills in 5 years time when any kinds of data are beyond the handle of M$ Excel. Invest in R is a wise move.