Applying statistics and data science 'in the wild'
I write about applications of data and analytical techniques like statistical modelling and simulation to real-world situations. I show how to access and use data, and provide examples of analytical products and the code that produced them.
I investigate a question about the Twitter network, and find that generally (until reaching a level of fame few of us aspire to), having more followers oneself is associated with following people who have less followers, not more.
I outline the stats- and data-related books I most enjoyed reading in 2017.
An interactive network graph is a great way to understand a statistical classification standard.
Resolving an apparent conundrum where the mean spend and other value variables seems to be higher for nearly everywhere... an adventure in double counting (individuals contributing to multiple groups' averages).
Reflections on recruiting data scientists for the public sector, which could maybe be used as practical guidance for someone.
I note for future use a couple of things to be aware of in using R to schedule the execution of SQL scripts.
Playing around with polishing graphics, including an animation, of the seasonality of plague deaths in medieval Europe, early modern Europe, and nineteenth century India and China.
I explore half a million rows of disaggregated crash data for New Zealand, and along the way illustrate geo-spatial projections, maps, forecasting with ensembles of methods, a state space model for change over time, and a generalized linear model for understanding interactions in a three-way cross tab.
New Zealand's election results have been released and were within the range of my probabilistic predictions. The pollsters did a good job.
A new version of the nzelect R package is on CRAN, with election results by voting location back to 2002, and polls up to the latest election. I show how to extract and understand the "special" votes and how they are different to advance voting.