Peter's stats stuff

Applying statistics and data science 'in the wild'

I write about applications of data and analytical techniques like statistical modelling and simulation to real-world situations. I show how to access and use data, and provide examples of analytical products and the code that produced them.

Recent posts

State-space modelling of the Australian 2007 federal election

24 June 2017

As part of familiarising myself with the Stan probabilistic programming language, I replicate Simon Jackman's state space modelling with house effects of the 2007 Australian federal election.

Stats NZ encouraging active sharing for microdata access projects

17 June 2017

Excited that New Zealand's Government Statistician is promoting reproducibility and open access to code, tabular outputs and research products from research with confidentialised microdata.

Global choropleth maps of military expenditure

04 June 2017

For choropleth maps showing the whole world, we don't need to stick to static maps with Mercator projections. I like rotating globes, and interactive slippery maps with tooltips.

Sankey charts for swinging voters

21 May 2017

Sankey charts based on individual level survey data are a good way of showing change from election to election. I demonstrate this, via some complications with survey-reweighting and missing data, with the New Zealand Election Study for the 2014 and 2011 elections.

Web app for individual party vote from the 2014 New Zealand election study

14 May 2017

Introducing a Shiny web tool for exploring individual characteristics and party vote in the 2014 New Zealand general election.

Modelling individual party vote from the 2014 New Zealand election study

06 May 2017

I work through a fairly complete modelling case study utilising methods for complex surveys, multiple imputation, multilevel models, non-linear relationships and the bootstrap. People who voted for New Zealand First in the 2014 election were more likely to be older, born in New Zealand, identify as working class and male.

Luke-warm about micromaps

30 April 2017

Linked micromaps are an ok way of presenting data and are probably the right tool in some circumstances; but they're not as cool as I thought they might be.

More cartograms of New Zealand census data (district and city level)!

25 April 2017

Shapefiles for cartogram by New Zealand Territorial Authority (ie District or City), with area proportional to population in 2013, have been added to the nzcensus package on GitHub.

Cartograms of New Zealand census data

23 April 2017

Choropleth maps are useful ways of using fill colour to show densities, proportions and growth rates by political or economic boundaries, but can be visually problematic when large geographic areas represent few people, or small areas (ie cities) represent many. One solution is a cartogram, and I have a go at using them to present New Zealand census data in this post and accompanying shiny app.

Impact of omitted variables on estimating causal effects - simulations

15 April 2017

It's much more important to get a well-specified model than worry about propensity score matching versus weighting, or either versus single stage regression, or increasing sample size. A regression that includes all "true" 100 explanatory variables with only 500 observations performs better in estimating a treatment effect than any of those methods when only 90 of the 100 variables are observed, even with 100,000 observations.