Peter's stats stuffI write about applications of data and analytical techniques like statistical modelling and simulation to real-world situations. I show how to access and use data, and provide examples of analytical products and the code that produced them.
http://ellisp.github.io
Luke-warm about micromapsLinked micromaps are an ok way of presenting data and are probably the right tool in some circumstances; but they're not as cool as I thought they might be.Sun, 30 Apr 2017 00:00:00 +1200
http://ellisp.github.io/blog/2017/04/30/micromaps
http://ellisp.github.io/blog/2017/04/30/micromapsMore cartograms of New Zealand census data (district and city level)!Shapefiles for cartogram by New Zealand Territorial Authority (ie District or City), with area proportional to population in 2013, have been added to the nzcensus package on GitHub.Tue, 25 Apr 2017 00:00:00 +1200
http://ellisp.github.io/blog/2017/04/25/more-cartograms
http://ellisp.github.io/blog/2017/04/25/more-cartogramsCartograms of New Zealand census dataChoropleth maps are useful ways of using fill colour to show densities, proportions and growth rates by political or economic boundaries, but can be visually problematic when large geographic areas represent few people, or small areas (ie cities) represent many. One solution is a cartogram, and I have a go at using them to present New Zealand census data in this post and accompanying shiny app.Sun, 23 Apr 2017 00:00:00 +1200
http://ellisp.github.io/blog/2017/04/23/cartograms
http://ellisp.github.io/blog/2017/04/23/cartogramsImpact of omitted variables on estimating causal effects - simulationsIt's much more important to get a well-specified model than worry about propensity score matching versus weighting, or either versus single stage regression, or increasing sample size. A regression that includes all "true" 100 explanatory variables with only 500 observations performs better in estimating a treatment effect than any of those methods when only 90 of the 100 variables are observed, even with 100,000 observations.Sat, 15 Apr 2017 00:00:00 +1200
http://ellisp.github.io/blog/2017/04/15/propensity-simulations
http://ellisp.github.io/blog/2017/04/15/propensity-simulationsExploring propensity score matching and weightingCompared to the older style propensity matching to create a pseudo control sample, it may be better to weight the full data by inverse propensity score because it doesn't discard data. Performing a regression (rather than simple cross tabs) after the weighting or matching is a good idea to handle inevitable imperfections. The whole family of methods doesn't necessarily deliver big gains over more straightforward single stage regressions. And if you have omitted material variables you're in trouble whatever you do.Sun, 09 Apr 2017 00:00:00 +1200
http://ellisp.github.io/blog/2017/04/09/propensity-v-regression
http://ellisp.github.io/blog/2017/04/09/propensity-v-regressionNew Zealand election forecastsMy New Zealand Election Forecasts web page is up; and I have some reflections on election day randomness, and on quality control.Sun, 26 Mar 2017 00:00:00 +1300
http://ellisp.github.io/blog/2017/03/26/election-forecasts
http://ellisp.github.io/blog/2017/03/26/election-forecastsHouse effects in New Zealand voting intention pollsI use generalized additive models to explore "house effects" (ie statistical bias) in polling firms' estimates of vote in previous New Zealand elections.Tue, 21 Mar 2017 00:00:00 +1300
http://ellisp.github.io/blog/2017/03/21/house-effects
http://ellisp.github.io/blog/2017/03/21/house-effectsSimulations to explore excessive lagged X variables in time series modellingAdding lots of lagged explanatory variables to a time series model without enough data points is a trap, and stepwise-selection doesn't help. The lasso or other regularization might be a promising alternative.Sun, 12 Mar 2017 00:00:00 +1300
http://ellisp.github.io/blog/2017/03/12/stepwise-timeseries
http://ellisp.github.io/blog/2017/03/12/stepwise-timeseriesNew data and functions in nzelect 0.3.0 R packageVersion 0.3.0 of the nzelect R package now on CRAN includes historical polling data and a few convenience functionsSat, 11 Mar 2017 00:00:00 +1300
http://ellisp.github.io/blog/2017/03/11/nzelect-0.3.0
http://ellisp.github.io/blog/2017/03/11/nzelect-0.3.0Visualising relationships between children's booksStatistical methods like hierarchical clustering and principal components analysis can help understand and visualise literary concepts but don't replace reading the books and engaging with them in traditional critical ways!Sat, 04 Mar 2017 00:00:00 +1300
http://ellisp.github.io/blog/2017/03/04/childrens-books
http://ellisp.github.io/blog/2017/03/04/childrens-books