Async Programming in Shiny with crew and callr

Asynchronous, aka async, programming in Shiny. It’s a subject that sounds daunting to many. With async programming we aim to achieve multiple things: improve application speed, better responsiveness and better resource allocation. It all comes down to doing multiple things in parallel, so that multiple tasks can be executed at the same time instead of after each other.

There are multiple ways to make your Shiny app asynchronous. In this blog I’ll go over two different packages: crew and callr. Both of these packages are in essence task managers that allow you to spin up background R processes that can execute tasks instead of the main thread that runs your Shiny app. Sounds complicated? Don’t worry, I got you! After reading this blog you will be able to turn your Shiny app into an asynchronous one with just a few lines of code.  

What is async programming?

To understand what asynchronous programming is, it is useful to first take a look at what synchronous programming is.

By default, a Shiny app runs on a single R session. A session in R is single-threaded, and can only do one thing at a time. This means that tasks get put in a queue and these tasks are being served on a first come first serve basis. If you execute a script, it goes over it line by line until it is finished. This is no different for a Shiny app that runs on an R session: one observer in your app gets invalidated, and once that finishes, the program moves on the next observer. This is called synchronous programming

Obviously, there are smarter ways to execute tasks! We can execute tasks in parallel, or somewhere else. We can let our program continue while it is waiting for results, or instead of using the R session that runs the Shiny app (the main R session), we can use other R sessions to execute tasks (also called worker sessions, or child sessions). This is called asynchronous programming.

There are many smart solutions to make your Shiny app asynchronous. Looking for an overview? Head over to one of my previous blog posts about asynchronous programming in Shiny.

In this blog you’ll learn more about two of these smart solutions: crew and callr.

Async programming with crew and callr

First, it’s useful to understand what a crew and callr actually do. They both use the same underlying principles to manage tasks and collect the results of those tasks. Then you’re going to start with a synchronous base app. The goal is to rewrite this synchronous Shiny app into an asynchronous one with both crew and callr.

Ready? Let’s go! 🚀 

Using tasks managers to schedule tasks

You probably heard of task managers in the context of operation systems. Unix based systems, Windows, they all have something called a task manager. In essence, a task manager takes care of all the processes that are running on a system. You can ask a task manager for the current status of what is running, but you can also manage scheduling, and start or stop processes.

In the context of Shiny, this isn’t any different. If we want to execute tasks in another R session away from the main session that’s running the Shiny app, we need something that is able to help us with managing those R sessions. Both crew and callr are equipped to act as such a task manager. You can ask for a status update, you can start and stop processes, and you can collect the result. All without the need of managing different sessions on your own.

The base app

First, the synchronous situation. The beautifully designed base app below is a simple Shiny app that retrieves stock data from the Yahoo Finance API. It uses a function called run_task, to do so. 

The run_task function uses httr to make a GET request to the API. Since the API is very fast, we simulate a long retrieval time by setting Sys.sleep() to 5 seconds. This makes it easier to see what is actually going on. 

When the data is retrieved, we parse the JSON content so it ends up in a usable R dataframe. With the data, we make a ggplot2 that simply displays the movement of the closing price for that stock over time. 

The UI part of the app has some basic elements like pickers to select the stock you’re interested in, an input for date ranges, and an action button. There are three output elements: the time, a status message, and the actual plot. The time is there for a reason ⏰. It demonstrates how the R session stops running the clock when we try to get the stock data. After all, this is a synchronous app, and a synchronous app can only do one time at a time. If the R session is busy retrieving stock data, it cannot update the clock at the same time. Hence, the clock stops ticking for at least 5 seconds. In the eyes of the user, the app becomes unresponsive

				
					library(shiny)
library(ggplot2)
library(httr)
library(jsonlite)

# Function to retrieve stock data
run_task <- function(symbol, start_date, end_date) {
  
  # simulate long retrieval time
  Sys.sleep(5)
  
  # get stock data
  url <- paste0("https://query1.finance.yahoo.com/v8/finance/chart/", symbol, "?period1=", 
                as.numeric(as.POSIXct(start_date)), "&period2=", as.numeric(as.POSIXct(end_date)), 
                "&interval=1d")
  
  response <- GET(url)
  json_data <- fromJSON(content(response, as = "text"))
  prices <- json_data$chart$result$indicators$quote[[1]]$close[[1]]
  dates <- as.Date(as.POSIXct(json_data$chart$result$timestamp[[1]], origin = "1970-01-01"))
  
  stock <- data.frame(Date = dates, Close = prices, stringsAsFactors = FALSE)
  
  ggplot(stock, aes(x = Date, y = Close)) +
    geom_line(color = "steelblue") +
    labs(x = "Date", y = "Closing Price") +
    ggtitle(paste("Stock Data for", symbol)) +
    theme_minimal()
  
}

ui <- fluidPage(
  
  titlePanel("Calling an API synchronously and get AEX stock data 🧊 "),
  sidebarLayout(
    sidebarPanel(
      selectInput("company", "Select Company", choices = c("ADYEN.AS", "ASML.AS", "UNA.AS", "HEIA.AS", "INGA.AS", "RDSA.AS", "PHIA.AS", "DSM.AS", "ABN.AS", "KPN.AS")),
      dateRangeInput("dates", "Select Date Range", start = Sys.Date() - 365, end = Sys.Date()),
      actionButton("task", "Get stock data (5 seconds)")
    ),
    mainPanel(
      textOutput("status"),
      textOutput("time"),
      plotOutput("stock_plot")
    )
  )
  
)

server <- function(input, output, session) {
  
  # reactive values
  reactive_result <- reactiveVal(ggplot())
  reactive_status <- reactiveVal("No task submitted yet")
  
  # outputs
  output$stock_plot <- renderPlot(reactive_result())
  
  output$status <- renderText(reactive_status())
  
  output$time <- renderText({
    invalidateLater(1000, session)
    as.character(Sys.time())
  })
  
  # button to submit a task
  observeEvent(input$task, {
    
    reactive_status("Running 🏃")
    
    reactive_result(run_task(symbol = input$company,
                             start_date = input$dates[1],
                             end_date = input$dates[2]))
    
    reactive_status("Done ✅ ")
    
  })
  
}

shinyApp(ui = ui, server = server)

				
			

crew

Now it’s time to change the synchronous base app, to an asynchronous one. Let’s start with crew. Crew is a distributed worker launcher. With crew, you can launch workers. A worker is a non-interactive R session that can run one or more tasks. Under the hood, crew makes use of mirai: a lightweight async evaluation framework.

It’s fairly straightforward to implement crew in your Shiny app. The API is easy to understand, that’s why I love working with crew! You start with creating a crew controller, where you specify how many workers you want to have and how long those workers should stay active. Whenever there is a task that needs to be send to one of these workers, you push the task to the controller. The controller will send the task to one of the workers. One thing to keep in mind is that these workers don’t know anything about the main R session, so it’s like starting an R session from scratch: you need to give the necessary libraries, functions, and variables.

When the task is send to the controller and the worker, you can monitor the status of the controller and its workers. If the controller is nonempty it means that there are still tasks that are being processed. Whenever a task is ready, you can pop the result. If all results are “popped”, controller$nonempty() will return FALSE.
				
					library(crew) #NEW
library(shiny)
library(ggplot2)
library(httr)
library(jsonlite)

# Function to retrieve stock data
run_task <- function(symbol, start_date, end_date) {
  
  # simulate long retrieval time
  Sys.sleep(5)
  
  # get stock data
  url <- paste0("https://query1.finance.yahoo.com/v8/finance/chart/", symbol, "?period1=", 
                as.numeric(as.POSIXct(start_date)), "&period2=", as.numeric(as.POSIXct(end_date)), 
                "&interval=1d")
  
  response <- GET(url)
  json_data <- fromJSON(content(response, as = "text"))
  prices <- json_data$chart$result$indicators$quote[[1]]$close[[1]]
  dates <- as.Date(as.POSIXct(json_data$chart$result$timestamp[[1]], origin = "1970-01-01"))
  
  stock <- data.frame(Date = dates, Close = prices, stringsAsFactors = FALSE)
  
  ggplot(stock, aes(x = Date, y = Close)) +
    geom_line(color = "steelblue") +
    labs(x = "Date", y = "Closing Price") +
    ggtitle(paste("Stock Data for", symbol)) +
    theme_minimal()
  
}

ui <- fluidPage(
  
  titlePanel("crew: alling an API asynchronously and get AEX stock data 🚀 "), #NEW
  sidebarLayout(
    sidebarPanel(
      selectInput("company", "Select Company", choices = c("ADYEN.AS", "ASML.AS", "UNA.AS", "HEIA.AS", "INGA.AS", "RDSA.AS", "PHIA.AS", "DSM.AS", "ABN.AS", "KPN.AS")),
      dateRangeInput("dates", "Select Date Range", start = Sys.Date() - 365, end = Sys.Date()),
      actionButton("task", "Get stock data (5 seconds)")
    ),
    mainPanel(
      textOutput("status"),
      textOutput("time"),
      plotOutput("stock_plot")
    )
  )
  
)

server <- function(input, output, session) {
  
  # reactive values
  reactive_result <- reactiveVal(ggplot())
  reactive_status <- reactiveVal("No task submitted yet")
  reactive_poll <- reactiveVal(FALSE) #NEW
  
  # outputs
  output$stock_plot <- renderPlot(reactive_result())
  
  output$status <- renderText(reactive_status())
  
  output$time <- renderText({
    invalidateLater(1000, session)
    as.character(Sys.time())
  })
  
  # crew controller #NEW
  controller <- crew_controller_local(workers = 4, seconds_idle = 10)
  controller$start()
  
  # make sure to terminate the controller on stop #NEW
  onStop(function() controller$terminate())
  
  # button to submit a task #NEW
  observeEvent(input$task, {
    
    reactive_status("Running 🏃")
    
    controller$push(
      command = run_task(symbol, start_date, end_date),
      # pass the function to the workers, and arguments needed
      data = list(run_task = run_task,
                  symbol = input$company,
                  start_date = input$dates[1],
                  end_date = input$dates[2]), 
      packages = c("httr", "jsonlite", "ggplot2")
    )
    reactive_poll(TRUE)
    
  })
  
  # event loop to collect finished tasks #NEW
  observe({
    
    req(reactive_poll())
    
    invalidateLater(millis = 100)
    result <- controller$pop()$result
    
    if (!is.null(result)) {
      reactive_result(result[[1]])
      print(controller$summary()) # get a summary of workers
    }
    
    if (isFALSE(controller$nonempty())) {
      reactive_status("Done ✅ ") 
      reactive_poll(controller$nonempty()) # this will return FALSE again when its done
    }
    
  })
  
}

shinyApp(ui = ui, server = server)

				
			

callr

On to the next package: callr. As the name suggests, it lets you call R from R. From the main R session, one or more background R processes can be started. Whenever you launch a background R process with callr, you get a rich API returned. You can query this API for the status, the results, the current load and much more.

In Shiny, we can create this background R process whenever the action button (input$task) is being clicked. Again, we need to realize that this background R session needs information about the packages, function, and any variables. By using supervise = TRUE, we can keep track of the progress of the background process.

We store this progress information in an object called p, and we can query p using the API mentioned earlier. In the app below, we’re using p$is_alive() to retrieve the status of the background process. If it is busy, this will return TRUE. If it is finished, it will return FALSE. When it’s finished, we can get the result with p$get_result().
				
					library(callr) #NEW
library(shiny)
library(ggplot2)
library(httr)
library(jsonlite)

# Function to retrieve stock data
run_task <- function(symbol, start_date, end_date) {
  
  # simulate long retrieval time
  Sys.sleep(5)
  
  # get stock data
  url <- paste0("https://query1.finance.yahoo.com/v8/finance/chart/", symbol, "?period1=", 
                as.numeric(as.POSIXct(start_date)), "&period2=", as.numeric(as.POSIXct(end_date)), 
                "&interval=1d")
  
  response <- GET(url)
  json_data <- fromJSON(content(response, as = "text"))
  prices <- json_data$chart$result$indicators$quote[[1]]$close[[1]]
  dates <- as.Date(as.POSIXct(json_data$chart$result$timestamp[[1]], origin = "1970-01-01"))
  
  stock <- data.frame(Date = dates, Close = prices, stringsAsFactors = FALSE)
  
  ggplot(stock, aes(x = Date, y = Close)) +
    geom_line(color = "steelblue") +
    labs(x = "Date", y = "Closing Price") +
    ggtitle(paste("Stock Data for", symbol)) +
    theme_minimal()
  
}

ui <- fluidPage(
  
  titlePanel("callR: calling an API asynchronously and get AEX stock data 🚀 "), #NEW
  sidebarLayout(
    sidebarPanel(
      selectInput("company", "Select Company", choices = c("ADYEN.AS", "ASML.AS", "UNA.AS", "HEIA.AS", "INGA.AS", "RDSA.AS", "PHIA.AS", "DSM.AS", "ABN.AS", "KPN.AS")),
      dateRangeInput("dates", "Select Date Range", start = Sys.Date() - 365, end = Sys.Date()),
      actionButton("task", "Get stock data (5 seconds)")
    ),
    mainPanel(
      textOutput("status"),
      textOutput("time"),
      plotOutput("stock_plot")
    )
  )
  
)

server <- function(input, output, session) {
  
  # reactive values
  reactive_result <- reactiveVal(ggplot())
  reactive_status <- reactiveVal("No task submitted yet")
  bg_proc <- reactiveVal(NULL) #NEW
  reactive_poll <- reactiveVal(FALSE) #NEW
  
  # outputs
  output$stock_plot <- renderPlot(reactive_result())
  
  output$status <- renderText(reactive_status())
  
  output$time <- renderText({
    invalidateLater(1000, session)
    as.character(Sys.time())
  })
  
  # button to submit a task
  observeEvent(input$task, {
    
    p <-
      
      r_bg(
        
        func =
          function(run_task, symbol, start_date, end_date) {
            
            library(httr)
            library(jsonlite)
            library(ggplot2)
            
            # the result
            return(run_task(symbol, start_date, end_date))
            
          },
        
        supervise = TRUE, 
        args = list(run_task = run_task,
                    symbol = input$company,
                    start_date = input$dates[1],
                    end_date = input$dates[2])
        
      )
    
    # update reactive vals
    bg_proc(p)
    reactive_poll(TRUE)
    reactive_status("Running 🏃")
    
  })
  
  observe({
    
    req(reactive_poll())
    
    invalidateLater(millis = 1000)
    
    p <- bg_proc()
    
    # whenever the background job is finished the value of is_alive() will be FALSE
    if (p$is_alive() == FALSE) {
      
      reactive_status("Done ✅ ")
      
      reactive_poll(FALSE)
      bg_proc(NULL)
      
      # update the table data with results
      reactive_result(p$get_result())
      
    }
    
  })
  
}

shinyApp(ui = ui, server = server)

				
			

Comparison

Comparing crew and callr, you can see that callr requires a little bit more code compared to crew. Crew also has a nice argument to specify the necessary libraries. With crew you seem to have a bit more control over the workers and when you want to shut them down. Both crew and callr allow for persistent R background sessions though. For crew, there are also plugins available that allow you to use platforms like AWS Batch or Kubernetes. Personally, I find crew a bit more intuitive to use.

Whatever you choose, both will achieve the same thing: tasks are being processed in another R session, while the current R session stays free. And that’s what we’re after!

More examples

As mentioned in the introduction, there are multiple ways to make your app asynchronous. The crew and callr packages are just two methods. Other methods include using future/promises, coro or mirai. And asynchronous processes can be used to call APIs, run calculations and knit Quarto or R Markdown documents. There’s a world of possibilities! In the async_shiny repo we gather all kinds of examples related to async programming in Shiny. After all, it’s easier to get started if there’s a demo available!

Watch "Async Programming in Shiny with crew and callr" on YouTube

Do you want me to take you through the above examples? You can watch “Async Programming in Shiny with crew and callr” on YouTube! YouTube will open in a new tab.

Leave a Reply

Your email address will not be published. Required fields are marked *