Tourism forecasting competition data in the Tcomp R package

19 Oct 2016

At a glance:

A new R package `Tcomp` makes data from the 2010 tourism forecasting competition available in a format designed to facilitate the fitting and testing of en masse automated forecasts, consistent with the M1 and M3 forecasting competition data in the `Mcomp` R package.

Tourism competition data

The tourism forecasting competition described in Athanasopoulos et al (2011) was an important investigation into domain-specific time-series forecasting; a different approach from the broader-scope “M” series forecasting competitions which covered multiple areas. Tourism is important to me in my day job and the article referenced above has been influential in how we construct New Zealand’s official tourism forecasts. I have an interest in making the data from that competition as widely available as possible, and this week was able to release the Tcomp R package on CRAN that has a conveniently shaped version of the data from that competition.

The best introduction is in the package vignette.

Only the univariate version of the data is available. This is unfortunate in a slightly counter-intuitive way - one of the more important findings from the competition was that univariate methods of forecasting performed better than more complex methods that relied on relationships between the variables of interest (eg tourism numbers and spend) and other economic explanatory variables (eg economic growth in market countries). So although we know from the competititon that those explanatory variables are of less value than might be hoped, we can’t explore that finding as much as we would like as the explanatory variables aren’t available for re-use. That’s not the fault of the organisers of the competition, just a reality of the availability of data and the conditions under which it was provided to them.

Contents of the Tcomp R package

In the new Tcomp package, the data is stored in a single object, tourism. tourism is a list of class Mcomp that has 1,311 elements. The Mcomp S3 class is inherited from Hyndman’s Mcomp R package. Each element in tourism has the information needed to fit and test a forecast to single univariate time series. For example, the training series is always a ts object called x. To plot the 34th training time series is as simple as:

library(Tcomp)
plot(tourism[[34]]$x)

xonly

We immediately see the strong seasonality that is characteristic of much tourism data.

Because tourism inherits classes from the Mcomp package, methods provided in that package apply to the tourism object and its elements. For example, the plot method superimposes the test dataset on the graphic of the training dataset, so we can see what happened next after the first part of the competition training series #34:

plot(tourism[[34]])

basic

See the helpfiles or the vignette for more details.

Building blocks

Part of the development of the Tcomp package was a moderately rigorous testing and validation process, which included attempting to reproduce the results in Athanasopoulos et al (2011). This was successful for the mean absolute percentage error of the naive forecasts, giving assurance that the data extract-transform-load was succesful (a naive forecast is that the value next period will be the same as last period for non seasonal data, or the the last period one cycle previously for seasonal data). One step to facilitate the testing was the forecast_comp function which is now shipped with the Tcomp package and designed to facilitate rule of thumb comparison of forecast results for a selection of standard forecast methods:

Here is that convenience function in action. It returns a data frame, the first four rows of which are the mean absolute percentage error (MAPE) for different forecast horizons (columns), and the final four rows are mean absolute scaled error (MASE). For both MAPE and MASE, smaller is better.

forecast_comp(tourism[[500]], tests = list(2, 4, 6, 8), plot = TRUE)
##           2     4    6     8
## ARIMA  8.87  1.12 1.27  1.26
## ETS   10.48  4.56 1.04  3.14
## Theta  6.82 11.01 1.59 11.15
## Naive  4.77 10.53 2.43 11.86
## ARIMA  1.17  0.13 0.16  0.15	
## ETS    1.38  0.52 0.13  0.36
## Theta  0.90  1.25 0.19  1.29
## Naive  0.63  1.20 0.30  1.37

t500

Because forecast_comp is a generic function that works with all Mdata S3 class objects, it also works with objects from Hyndman’s Mcomp package:

library(Mcomp)
forecast_comp(M3[[1000]], tests = list(2, 4, 6, 8), plot = TRUE)

M1000

More comprehensive use

The main purpose of large collections of time series data like this is rigorous study of different approaches to forecasting. Here’s a simple taster of what I hope the Tcomp package will be used for. In this section, I assess the relative accuracy of several forecasting methods: ARIMA, ETS, combination of ARIMA and ETS, naive, and Theta, using quarterly tourism data. Here’s the summary results (Theta wins, so long as the forecasting horizon of interest is four or more quarters):

results

Notice how much easier it is to predict four quarters ahead rather than six or eight.

“It’s tough to make predictions, particularly about the future.”

(attributed to Yogi Berra amongst others but it’s not sure who really said this first)

We see that, perhaps a little depressingly, it’s hard to beat the naive forecast that next quarter’s result will be the same as a year ago; or the simple theta method. The naive method is particularly effective for one cycle after the last data point (ie horizon equals four or eight, in this instance). Ho hum, c’est la vie. The average of the ETS and ARIMA methods performs better than either individually, consistent with previous findings about the value of ensemble forecasts. Many questions are begged by this set of results, which could be explored fruitfully with this and other real-life data.

Here’s the code for this mini-competition. It shows how to use the tourism data and make the most of parallel processing to speed things up.

#------setup-------
library(ggplot2)
library(scales)
library(forecastHybrid)
library(tidyr)
library(dplyr)
library(parallel)
library(english)
library(directlabels)

#----------------------helper function for comparing forecasts---------
#' Function to perform five forecasts to all members of a set of series
#' ARIMA, ETS, Theta, Naive and ensemble of ETS and ARIMA
#' @param the_series a list of class Mcomp containing series of class Mdata
#' @param tests a list of horizons over which to return the MASE of the forecast
#' @return a list of results
accuracy_point <- function (the_series, tests = list(the_series$h)) {
   x <- the_series$x
   xx <- the_series$xx
   h <- the_series$h
   
   # determine best value for BoxCox transformation, 
   # but limit it to between 0 and 1
   bc <- BoxCox.lambda(x)
   bc <- min(max(0, bc), 1)
   
   # fit ARIMA and ETS models
   mod1 <- hybridModel(x, models ="ae", 
                       a.args = list(lambda = bc), e.args = list(lambda = bc))
   
   # create forecasts
   fc1 <- forecast(mod1$auto.arima, h = h)
   fc2 <- forecast(mod1$ets, h = h)
   fc3 <- thetaf(x, h = h)
   fc4 <- snaive(x, h = h)
   fc5 <- forecast(mod1, h = h)
   
   # estimate MASE for all the values of tests
   MASEs <- matrix(0, nrow = 5, ncol = length(tests))
   for (j in 1:length(tests)) {
      this_test <- tests[[j]]
      MASEs[, j] <- c(accuracy(fc1, xx, test = this_test)["Test set", "MASE"], 
                      accuracy(fc2, xx, test = this_test)["Test set", "MASE"], 
                      accuracy(fc3, xx, test = this_test)["Test set", "MASE"], 
                      accuracy(fc4, xx, test = this_test)["Test set", "MASE"],
                      accuracy(fc5, xx, test = this_test)["Test set", "MASE"])                                                                                                                                                            
   }
   
   colnames(MASEs) <- gsub(":", "-", as.character(tests))
   rownames(MASEs) <- c("ARIMA", "ETS", "Theta", "Naive", "ARIMA-ETS average")
   return(MASEs)
}

#------------Set up multi core cluster----------------
# set up cluster with number of cores minus one
cores <- detectCores()
cluster <- makePSOCKcluster(max(1, cores - 1))

# set up the functionality on the cluster
clusterEvalQ(cluster, {
   library(Tcomp)
   library(forecastHybrid)
})
clusterExport(cluster, "accuracy_point")


#-----------------Fit to tourism data----------------------
# takes a few minutes on my laptop:
results_point <- parLapply(cluster,
                           subset(tourism, "quarterly"),
                           accuracy_point,
                           tests = list(2, 4, 6, 8))

stopCluster(cluster)

#----------------presenting results-----------
results_point_df <- as.data.frame(do.call("rbind", results_point))
results_point_df$model <- row.names(results_point_df)
results_point_df <- results_point_df %>%
   gather(horizon, MASE, -model) %>%
   group_by(model, horizon) %>%
   summarise(MASE = round(mean(MASE), 2)) %>%
   mutate(horizon = as.numeric(horizon)) %>%
   ungroup() %>%
   mutate(model = gsub("ARIMA-ETS average", "Hybrid", model))

results_point_df %>%
   ggplot(aes(x = horizon, y = MASE, colour = model)) +
   geom_point(size = 2) +
   geom_line() +
   scale_colour_brewer(palette = "Set1") +
   labs(x = "Forecasting horizon",
        y = "Mean Absolute Scaled Error (lower is better)",
        caption = "Source: 2010 tourism competition data in Tcomp R package") +
   ggtitle("It's hard to beat a naive forecast or a Theta forecast",
           subtitle = "Comparison of forecasting methods with tourism competition data")