Applying statistics and data science 'in the wild'
I write about applications of data and analytical techniques like statistical modelling and simulation to real-world situations. I show how to access and use data, and provide examples of analytical products and the code that produced them.
I adjust my state-space model of New Zealand voting behaviour to allow for the house effect of one of the pollsters to change from the time they started including an on-line sample, and get some interesting results.
I outline how I structure analytical project folder systems and some hints for matching R Markdown documents to a corporate style guide including adding logos, watermarks, and of course colours and fonts.
New Zealand electoral polls going back 15 years
I introduce a new web app that allows nons-specialists to explore voting behaviour in the New Zealand Election Study, and reflect on what I've done so far with that data.
Calculating the Gini coefficient for inequality directly of mean income by decile produces a slightly biased downwards estimate. I correct for this and demonstrate on the World Panel Income Distribution data.
I play around with the sampling distribution of Gini coefficients calculated with weighted data; and verify that the Gini calculation method in a recent Stats NZ working paper is the one in the acid R package.
I play around with population-weighted income inequality of countries with data from the World Development Indicators, re-creating (with some amendments) some graphics from Branko Milanovic's recent book "Global Inequality".
I explore the demographic characteristics of who voted (and who didn't), out of people on the electoral roll, in the 2014 New Zealand general election. I use multiple imputation and a generalized linear model with a quasibinomial response. The people who vote tend to have characteristics associated with doing ok out of society (owning a home, having a partner, university qualifications, etc).
I revisit the state space model of Labor party vote leading up to the 2007 Australian election; and a re-think about total survey error in the context of polling data leads to a more stable, less wiggly underlying state of voting intention. Also, vectorization in Stan leads to much faster estimation.
I look at the interaction between deprivation, being Māori, and family violence - combining data from the New Zealand census, the New Zealand index of deprivation, and the Family Violence Death Review Committee.