Today, in one of my more productive days, I managed to create a sleek R script that plotted several histograms in a lattice, allowing for easy identification of the underlying trend. Although the majority of the time taken consisted of collecting the data and making various adjustments, it took a not inconsiderable amount of work to write the code.

As I was cursing the

Here at least is my contribution to the collective intelligence of

Of course, Rome was not built in a day, and a thorough knowledge of R plotting cannot be built in one. Instead we will progress one layer at a time, adding additional levels of complexity and functionality. We start with a simple script that allows us to plot several graphs at the same time, each with a different value of a key variable. Here is the output:

The first few lines of code create a data frame with 4 variables: x,y,z,a. Three of these variables are randomly generated, with the z variable dependent upon the other 3. Suppose that we are analysing this data set, unaware of the relationship between the x,y and z variables. A preliminary inspection shows that there are only 4 observed values of a. It seems sensible to plot z against y for each of these different a values.

The relevant code to create this plot starts with the function "

filters the data frame to include only those rows where the variable a=l. The next line simply creates a standard R plot of y versus z. Finally, we use the abline function to plot a linear fit to highlight the trend. Too easy.

Now we come to the fun part. Using the

Okay, so the graph does not look like it came from NASA – or to be honest with NASA's ailing reputation maybe it does. But note that once we have created the plotM function, we only have to write 3 lines of code to make 4 separate charts. Moreover, the code would not increase even if we were plotting 100 charts.

Of course, we have not yet even drawn upon any of R's custom plotting packages. In the next edition of this blog we will look at how to use the ggplot2 package to add colour and a wide range of other features to our graphs.

As I was cursing the

*apply*function – not for the last time I am sure – I suddenly realised the insane level of productivity that I have come to see as "par for the course". The level of computational analysis that can be conducted in a few hours with nothing more than a desktop PC, a broadband connection, and copious amounts of caffeine is phenomenal. No longer can a lack of computing power or software be blamed for a lack of productivity growth: rate-limiting factors are now exclusively human.Here at least is my contribution to the collective intelligence of

*biologicals*. I have noticed that the most common reason that people avoid R is that they cannot rapidly make graphs that meet the high standards of their clients. In the next few editions of this blog we will build up a basic repertoire of plotting techniques, focusing on graphics that are sickeningly impressive.Of course, Rome was not built in a day, and a thorough knowledge of R plotting cannot be built in one. Instead we will progress one layer at a time, adding additional levels of complexity and functionality. We start with a simple script that allows us to plot several graphs at the same time, each with a different value of a key variable. Here is the output:

Figure 1: Multiple plots using par |

**The code**# Clear all objects rm(list=ls()) # Create a data set using random variables df<-data.frame(x=rnorm(160),y=runif(160),a=sample(c(1,0,-1,10), replace=TRUE,160)) df$z<-with(df,{3*a*y+x}) # Create a function that plots the value of "z" against the "y" value plotM<-function(l){ df.temp<-df[df$a==l,] plot(df.temp$y,df.temp$z,xlab="Y Value",ylab="Z Value", main=paste("Value of key variable: ",toString(l))) abline(lm(df.temp$z~df.temp$y),col="red") } # Create a grid to plot the different values of "a" par(mfrow=c(2,2)) # Loop through each value of "a" and call the plotM function for (i in c(1,0,-1,10)){ plotM(i) }

**Walk-through**The first few lines of code create a data frame with 4 variables: x,y,z,a. Three of these variables are randomly generated, with the z variable dependent upon the other 3. Suppose that we are analysing this data set, unaware of the relationship between the x,y and z variables. A preliminary inspection shows that there are only 4 observed values of a. It seems sensible to plot z against y for each of these different a values.

The relevant code to create this plot starts with the function "

*plotM*", standing for**plot M**ultiple. This function takes an argument "l" that determines which value of the variable we are plotting. The linedf.temp<-df[df$a==l,]

filters the data frame to include only those rows where the variable a=l. The next line simply creates a standard R plot of y versus z. Finally, we use the abline function to plot a linear fit to highlight the trend. Too easy.

Now we come to the fun part. Using the

*par(mfrow=c(2,2))*we create a 2x2 grid in which to place the next four plots. Whatever plots we now create will be placed sequentially into this grid. Hence we can iterate over a vector containing the values of a, calling*plotM*each time. This gives us our grid.**Comments**Okay, so the graph does not look like it came from NASA – or to be honest with NASA's ailing reputation maybe it does. But note that once we have created the plotM function, we only have to write 3 lines of code to make 4 separate charts. Moreover, the code would not increase even if we were plotting 100 charts.

Of course, we have not yet even drawn upon any of R's custom plotting packages. In the next edition of this blog we will look at how to use the ggplot2 package to add colour and a wide range of other features to our graphs.

hi!

ReplyDeletethanks for this post. this is a great way to learn. just one thing: i tried to run this code on my computer, and the abline came out wrong. in fact, one needs to interchange

lm(df.temp$y~df.temp$z)

with

lm(df.temp$z~df.temp$y)

in order to get the picture you produced.

cheers

flo

Thanks for the comment flo. I have updated the code.

ReplyDeletecool!

ReplyDeletewhen is the next lesson due? can't wait!

:-)

thanks a lot for sharing!

Thanks so much for the nice example.

ReplyDelete