Forecasting New Zealand elections with R and Stan

Peter Ellis

16 August 2017

Today’s content

  • Election forecasts overview
  • Data acquisition and management via its own R package
  • Statistical modelling
  • Dissemination with Shiny, Jekyll and Markdown

To help pitch, hands up if you’ve…

(these aren’t necessarily sequential)

  • written one or more lines of R code
  • made a plot with ggplot and been comfortable with what aes(...) means
  • hated or loved %>% and have an opinion on mutate
  • built your own R package
  • written a program to fit a statistical model with Stan
  • written an article that went viral and destroyed a statistician’s career after a talk to an R users’ group
  • stood for Parliament

Election forecasts overview

Current forecasts

Whereas six months before 2014

2014

And for differing coalitions…

http://ellisp.github.io/

Choose “Explore coalitions and assumptions” from “NZ Election Forecasts”.

Forecasts over time

A range of possibilities

Getting hold of data

Polling data back to 2002

p <- polls %>%
  filter(Party %in% c("Labour", "National", "NZ First", "Green")) %>%
  ggplot(aes(x = MidDate, y = VotingIntention, colour = Pollster)) +
  facet_wrap(~Party, scales = "free_y") +
  geom_line() +
  geom_text(aes(label = ifelse(Pollster == "Election result", "O", "")), 
            size = 8, colour = "black") +
  scale_y_continuous(label = percent) 

Polling data back to 2002

On Wikipedia, it looks like this:

wikipedia

We need it to be “tidy”

Wikipedia’s format isn’t bad as things go, but it’s not a robust data model (eg the parties change each election; pollsters and their clients are wrapped up together; messy dates)

library(nzelect)
sample_n(polls, 10) %>%
  kable(format = 'html') %>%
  kable_styling(font_size = 10)
Pollster WikipediaDates StartDate EndDate MidDate Party VotingIntention Client ElectionYear
1780 Roy Morgan 31 August – 13 September 2009 2009-08-31 2009-09-13 2009-09-06 NZ First 0.025 NA 2011
1708 Roy Morgan 20 April – 3 May 2009 2009-04-20 2009-05-03 2009-04-26 ACT 0.020 NA 2011
1458 Roy Morgan 1–14 September 2008 2008-09-01 2008-09-14 2008-09-07 National 0.475 NA 2008
1728 Roy Morgan 1–14 June 2009 2009-06-01 2009-06-14 2009-06-07 Labour 0.330 NA 2011
720 Roy Morgan 30 October–12 November 2006 2006-10-30 2006-11-12 2006-11-05 Maori 0.025 NA 2008
3455 Roy Morgan 16–29 June 2014 2014-06-14 2014-06-29 2014-06-22 NZ First 0.055 NA 2014
1259 Nielsen 9–22 April 2008 2008-04-09 2008-04-22 2008-04-15 ACT 0.000 Fairfax Media 2008
3390 Colmar Brunton 17–21 May 2014 2014-05-17 2014-05-21 2014-05-19 ACT 0.010 One News 2014
1793 Roy Morgan 21 September – 4 October 2009 2009-09-21 2009-10-04 2009-09-27 National 0.575 NA 2011
749 Roy Morgan 3–21 January 2007 2007-01-03 2007-01-21 2007-01-12 Green 0.075 NA 2008

So first step is to clean up

Reflections on nzelect

  • if you’re not using version control, you’re in trouble
  • separate the grooming and preparation into ./prep/, away from the package source code
  • a project or Git repository is bigger than a single R package (in this context - note this differs from the standard Wickham / RStudio approach)
  • building both R packages via ./integrate.R
  • example grabbing polls via ./prep/download_polls_2017.R
  • sometimes some things are easier and worth doing by hand

Statistical modelling

Three challenges in common to both methods

  1. Sampling error from surveys
  2. Other challenges for surveys
    • house effects
    • miscellaneous “total survey error”"
  3. People might change their mind

Predicting is difficult, particularly about the future

Approach

  1. Probability of party vote on election day as a time series forecast
    • House effects and survey error as nuisance factors
  2. Turn into simulated votes
  3. Simulated votes get converted to seats

https://github.com/ellisp/nz-election-forecast/tree/master/setup

Model A - GAMs

gam

House effects

https://github.com/ellisp/nz-election-forecast/blob/master/method-gam/estimate-house-effects.R

https://github.com/ellisp/nz-election-forecast/blob/master/method-gam/estimate-house-effects.stan

Election day error

https://github.com/ellisp/nz-election-forecast/blob/master/method-gam/estimate-election-variance.R

Model B - Latent State Space

ss

Model B - Latent State Space

https://github.com/ellisp/nz-election-forecast/tree/master/method-statespace

Dissemination with Shiny, Jekyll and Markdown

The main shiny app

https://github.com/ellisp/nz-election-forecast/tree/master/shiny

The website

https://github.com/ellisp/nz-election-forecast/blob/master/setup/copy-files.R

https://github.com/ellisp/ellisp.github.io/tree/source/elections