Below are some examples of profvis in use. Keep in mind that R’s sampling profiler is non-deterministic, and that the code in these examples is run and profiled when this knitr document is executed, so the numeric timing values may not exactly match the text.

Example 1

In this first example, we’ll work with a data frame that has 151 columns. One of the columns contains an ID, and the other 150 columns contain numeric values. What we will do is, for each numeric column, take the mean and subtract it from the column, so that the new mean value of the column is zero.

# Generate data
times <- 4e5
cols <- 150
data <- as.data.frame(x = matrix(rnorm(times * cols, mean = 5), ncol = cols))
data <- cbind(id = paste0("g", seq_len(times)), data)

profvis({
  data1 <- data   # Store in another variable for this run

  # Get column means
  means <- apply(data1[, names(data1) != "id"], 2, mean)

  # Subtract mean from each column
  for (i in seq_along(means)) {
    data1[, names(data1) != "id"][, i] <- data1[, names(data1) != "id"][, i] - means[i]
  }
})

Most of the time is spent in the apply call, so that’s the best candidate for a first pass at optimization. We can also see that the apply results in a lot of memory being allocated and deallocated. Memory “churning” is often a cause for performance problems.

Looking at the flame graph, we can see that apply calls as.matrix and aperm. These two functions convert the data frame to a matrix and transpose it – so even before we’ve done any useful computations, we’ve spent a large amount of time transforming the data.

We could try to speed this up in a number of ways. One possibility is that we could simply leave the data in matrix form (instead of putting it in a data frame in line 4). That would remove the need for the as.matrix call, but it would still require aperm to transpose the data. It would also lose the connection of each row to the id column, which is undesirable. In any case, using apply over columns looks like it will be expensive because of the call to aperm.

An obvious alternative is to use the colMeans function. But there’s also another possibility. Data frames are implemented as lists of vectors, where each column is one vector, so we could use lapply or vapply to apply the mean function over each column. Let’s compare the speed of these four different ways of getting column means.

profvis({
  data1 <- data
  # Four different ways of getting column means
  means <- apply(data1[, names(data1) != "id"], 2, mean)
  means <- colMeans(data1[, names(data1) != "id"])
  means <- lapply(data1[, names(data1) != "id"], mean)
  means <- vapply(data1[, names(data1) != "id"], mean, numeric(1))
})

colMeans is about 6x faster than using apply with mean, but it looks like it’s still using as.matrix, which takes a significant amount of time. lapply/vapplyare faster yet – about 10x faster than apply. lapply returns the values in a list, while vapply returns the values in a numeric vector, which is the form that we want, so it looks like vapply is the way to go for this part.

(If you want finer-grained comparisons of code performance, you can use the microbenchmark package. There’s more information about microbenchmark in the profiling chapter of Hadley Wickham’s book, Advanced R.)

You can also see that the faster methods also result in less memory allocation and deallocation. This is not a coincidence – allocating and deallocating memory can be expensive.

Let’s take the original code and replace apply with vapply:

profvis({
  data1 <- data
  means <- vapply(data1[, names(data1) != "id"], mean, numeric(1))

  for (i in seq_along(means)) {
    data1[, names(data1) != "id"][, i] <- data1[, names(data1) != "id"][, i] - means[i]
  }
})

Our code is about 3x faster than the original version. Most of the time is now spent on line 6, and the majority of that is in the [<- function. This is usually called with syntax x[i, j] <- y, which is equivalent to `[<-`(x, i, j, y). In addition to being slow, the code is ugly: on each side of the assignment operator we’re indexing into data1 twice with [.

In this case, it’s useful to take a step back and think about the broader problem. We want to normalize each column. Couldn’t we we apply a function over the columns that does both steps, taking the mean and subtracting it? Because a data frame is a list, and we want to assign a list of values into the data frame, we’ll need to use lapply.

profvis({
  data1 <- data

  # Given a column, normalize values and return them
  col_norm <- function(col) {
    col - mean(col)
  }

  # Apply the normalizer function over all columns except id
  data1[, names(data1) != "id"] <- lapply(data1[, names(data1) != "id"], col_norm)
})

Now we have code that’s not only about 8x faster than our original – it’s shorter and more elegant as well. Not bad! The profiler data helped us to identify performance bottlenecks, and understanding of the underlying data structures allowed us to approach the problem in a more efficient way.

Could we further optimize the code? It seems unlikely, given that all the time is spent in functions that are implemented in C (mean and -). That doesn’t necessarily mean that there’s no room for improvement, but this is a good place to move on to the next example.

Example 2

This example addresses some more advanced issues. This time, it will be hard to directly see the causes of slowness, but we will be able to see some of their side-effects, most notably the side-effects from large amounts of memory allocation.

Suppose you have a data frame that contains a column for which you’d like to take a cumulative sum (and you don’t know about R’s built-in cumsum function). Here’s one way to do it:

profvis({
  data <- data.frame(value = runif(5e4))

  data$sum[1] <- data$value[1]
  for (i in seq(2, nrow(data))) {
    data$sum[i] <- data$sum[i-1] + data$value[i]
  }
})

This takes over 12 seconds to calculate the cumulative sum of 50,000 items. That’s pretty slow for a computer program. Looking at the profvis visualization, we can see a number of notable features:

Almost all the time is spent in one line of code, line 6. Although this is just one line of code, many different functions that are called on that line.

That line also results in a large amount of memory being allocated and deallocated, which is somewhat suprising: just looking at the code, that line appears to just modify the data in-place, but that’s not actually what’s happening internally.

In the flame graph, you’ll see that some of the flame graph blocks have the label $, which means that those samples were spent in the $ function for indexing into an object (in R, the expression x$y is equivalent to `$`(x, "y")).

Because $ is a generic function, it calls the corresponding method for the object, in this case $.data.frame. This function in turn calls [[, which calls [[.data.frame. (Zoom in to see this more clearly.)

Other flame graph cells have the label $<-. The usual syntax for calling this function is x$y <- z; this is equivalent to `$<-`(x, "y", z). (Assignment with indexing, as in x$y[i] <- z is actually a bit more complicated, and it turns out that this is the cause of the excessive memory allocation and deallocation.)

Finally, many of the flame graph cells contain the entire expression from line 6. This can mean one of two things:

  1. R is currently evaluating the expression but is not inside another function call.
  2. R is in another function, but that function does not show up on the stack. (A number of R’s internal functions do not show up in the profiling data. See more about this in the FAQ.)

This profiling data tells us that much of the time is spent in $ and $<-. Maybe avoiding these functions entirely will speed things up. To do that, instead of operating on data frame columns, we can operate on temporary vectors. As it turns out, writing a function that takes a vector as input and returns a vector as output is not only convenient; it provides a natural way of creating temporary variables so that we can avoid calling $ and $<- in a loop.

profvis({
  csum <- function(x) {
    if (length(x) < 2) return(x)

    sum <- x[1]
    for (i in seq(2, length(x))) {
      sum[i] <- sum[i-1] + x[i]
    }
    sum
  }
  data$sum <- csum(data$value)
})

Using this csum function, it takes about 20 ms, which is about 600x as fast as before.

It may appear that no functions are called from line 7, but that’s not quite true: that line also calls [, [<-, -, and +.

The code panel shows that there is still a large amount of memory being allocated in the csum function. In the flame graph. you probably have noticed the gray blocks labeled <GC>. These represent times where R is doing garbage collection – that is, when it is freeing (deallocating) chunks of memory that were allocated but no longer needed. If R is spending a lot of time freeing memory, that suggests that R is also spending a lot of time allocating memory. This is another common source of slowness in R code.

In the csum function, sum starts as a length-1 vector, and then grows, in a loop, to be the same length as x. Every time a vector grows, R allocates a new block of memory for the new, larger vector, and then copies the contents over. The memory allocated for the old vector is no longer needed, and will later be garbage collected.

To avoid all that memory allocation, copying, and garbage collection, we can pre-allocate a correctly-sized vector for sum. For this data, that will result in 49,999 fewer allocations, copies, and deallocations.

profvis({
  csum2 <- function(x) {
    if (length(x) < 2) return(x)

    sum <- numeric(length(x))  # Preallocate
    sum[1] <- x[1]
    for (i in seq(2, length(x))) {
      sum[i] <- sum[i-1] + x[i]
    }
    sum
  }
  data$sum <- csum2(data$value)
})

This version of the code, with csum2, is around 60x faster than our original code, and requires almost no memory allocation. These performance improvements were possible by avoiding calls to $ and $<-, and by avoiding unnecessary memory allocation and copying from growing a vector in a loop.

Example 3 - Profiling a Shiny Application

In addition to R code, you can also profile Shiny applications. To do this, simply execute the runApp() command inside of profvis. For instance, we can run one of shiny’s built-in examples using the runExample command (which is a wrapper for runApp).

library(shiny)
profvis({
  runExample(example = "06_tabsets", display.mode = "normal")
})

Your Shiny application will launch, and after interacting and closing the app a profile will be generated.

The profile for Shiny applications automatically hides functions that Shiny runs behind the scenes. More information is available in the FAQ. The profile automatically colors outputs in blue. In this example, you can identify each time output$plot was called to re-create the plot.

Profiling a Shiny application is particularly helpful for understanding reactive dependencies. For more information, checkout this video.