Peter's stats stuff

Applying statistics and data science 'in the wild'

I write about applications of data and analytical techniques like statistical modelling and simulation to real-world situations. I show how to access and use data, and provide examples of analytical products and the code that produced them.

Latest post »

Recent posts

Do tweeps with more followers follow tweeps with more followers?

24 February 2018

I investigate a question about the Twitter network, and find that generally (until reaching a level of fame few of us aspire to), having more followers oneself is associated with following people who have less followers, not more.

Books I liked in 2017

17 February 2018

I outline the stats- and data-related books I most enjoyed reading in 2017.

Visualising an ethnicity statistical classification

10 February 2018

An interactive network graph is a great way to understand a statistical classification standard.

Average spend, activities and length of visit in the NZ International Visitor Survey

03 February 2018

Resolving an apparent conundrum where the mean spend and other value variables seems to be higher for nearly everywhere... an adventure in double counting (individuals contributing to multiple groups' averages).

How to recruit data analysts for the public sector

23 January 2018

Reflections on recruiting data scientists for the public sector, which could maybe be used as practical guidance for someone.

Some quirks with R and SQL Server

09 December 2017

I note for future use a couple of things to be aware of in using R to schedule the execution of SQL scripts.

Seasonality of plagues

19 November 2017

Playing around with polishing graphics, including an animation, of the seasonality of plague deaths in medieval Europe, early modern Europe, and nineteenth century India and China.

New Zealand fatal traffic crashes

15 October 2017

I explore half a million rows of disaggregated crash data for New Zealand, and along the way illustrate geo-spatial projections, maps, forecasting with ensembles of methods, a state space model for change over time, and a generalized linear model for understanding interactions in a three-way cross tab.

New Zealand 2017 election results

07 October 2017

New Zealand's election results have been released and were within the range of my probabilistic predictions. The pollsters did a good job.

nzelect 0.4.0 on CRAN with results from 2002 to 2014 and polls up to September 2017

05 October 2017

A new version of the nzelect R package is on CRAN, with election results by voting location back to 2002, and polls up to the latest election. I show how to extract and understand the "special" votes and how they are different to advance voting.