Peter's stats stuff

Applying statistics and data science 'in the wild'

I write about applications of data and analytical techniques like statistical modelling and simulation to real-world situations. I show how to access and use data, and provide examples of analytical products and the code that produced them.

Latest post »

Recent posts

Tourism forecasting competition data in the Tcomp R package

19 October 2016

A new R package `Tcomp` makes data from the 2010 tourism forecasting competition available in a format designed to facilitate the fitting and testing of en masse automated forecasts, consistent with the M1 and M3 forecasting competition data in the `Mcomp` R package.

Statistics New Zealand experimental API initiative

15 October 2016

Statistics New Zealand recently launched experimental access to some of their data over the web via an application programming interface; it can be accessed easily via the equally experimental statsNZ R package by Jonathan Marshall.

New Zealand Election Study individual level data

18 September 2016

Individual level data on voting behaviour are freely available from the New Zealand Election Study and everyone should have a go at analysing them!

Why you need version control

16 September 2016

Using version control software is a defining sign of a professional approach to serious data analysis

Analysing the Modelled Territorial Authority GDP estimates for New Zealand

13 September 2016

My presentation and paper on the development and use of "Modelled Territorial Authority Gross Domestic Product" for New Zealand, as presented to the New Zealand Association of Economists conference in June 2016

Dual axes time series plots with various more awkward data

28 August 2016

I finish enhancements of the dual axes time series plotting function in R so it handles reasonably well series that may start at different times, have different frequencies, or include negatives.

Dual axes time series plots may be ok sometimes after all

18 August 2016

Dual axis time series charts are often deprecated, but the standard alternatives have weaknesses too. In some circumstances, if done carefully, dual axis time series charts may be ok after all. In particular, you can choose two vertical scales so the drawing on the page is equivalent to drawing two indexed series, but retaining the meaningful mapping to the scale of the original variables.

Elastic net regularization of a model of burned calories

13 August 2016

Elastic net regularization of estimates is a good way of dealing with collinearity and feature selection; this is illustrated with a simple dataset of 30 daily observations from a fitbit tracker.

nzcensus on GitHub

04 August 2016

Demonstration analysis of area unit demographic data from the nzcensus R package on GitHub, which is approaching more maturity and readiness for general use.