Peter's stats stuffI write about applications of data and analytical techniques like statistical modelling and simulation to real-world situations. I show how to access and use data, and provide examples of analytical products and the code that produced them.
http://ellisp.github.io
House effects in New Zealand voting intention pollsI use generalized additive models to explore "house effects" (ie statistical bias) in polling firms' estimates of vote in previous New Zealand elections.Tue, 21 Mar 2017 00:00:00 +1300
http://ellisp.github.io/blog/2017/03/21/house-effectsSimulations to explore excessive lagged X variables in time series modellingAdding lots of lagged explanatory variables to a time series model without enough data points is a trap, and stepwise-selection doesn't help. The lasso or other regularization might be a promising alternative.Sun, 12 Mar 2017 00:00:00 +1300
http://ellisp.github.io/blog/2017/03/12/stepwise-timeseriesNew data and functions in nzelect 0.3.0 R packageVersion 0.3.0 of the nzelect R package now on CRAN includes historical polling data and a few convenience functionsSat, 11 Mar 2017 00:00:00 +1300
http://ellisp.github.io/blog/2017/03/11/nzelect-0.3.0Visualising relationships between children's booksStatistical methods like hierarchical clustering and principal components analysis can help understand and visualise literary concepts but don't replace reading the books and engaging with them in traditional critical ways!Sat, 04 Mar 2017 00:00:00 +1300
http://ellisp.github.io/blog/2017/03/04/childrens-booksSuccess rates of appeals to the Supreme Court by CircuitIt's important to use the correct denominator when considering performance. While a high percentage (more than 50%) of decisions from the US appeal circuit courts that get all the way to the Supreme Court are overturned, this is only a tiny proportion of total appeals decided by the lower courts.Sun, 26 Feb 2017 00:00:00 +1300
http://ellisp.github.io/blog/2017/02/26/appeal-circuitsMoving largish data from R to H2O - spam detection with Enron emailsI finally solve my problem of writing large sparse matrices from R into SVMLight format for importing to H2O; and demonstrate application with spam detection trained on the Enron email data comparing a generalized linear model, random forest, gradient boosting machine, and deep neural network.Sat, 18 Feb 2017 00:00:00 +1300
http://ellisp.github.io/blog/2017/02/18/svmliteUS Presidential inauguration speechesI do some basic textual analysis and visualization with US Presidential inauguration speeches.Mon, 23 Jan 2017 00:00:00 +1300
http://ellisp.github.io/blog/2017/01/23/inaugural-speechesDoes seasonally adjusting first help forecasting?I test some forecasting models on nearly 3,000 seasonal timeseries to see if it's better to seasonally adjust first or to incorporate the seasonality into the model used for forecasting. Turns out it is marginally better to seasonally adjust beforehand when using an ARIMA model and it doesn't matter with exponential smoothing state space models. Automated use of Box-Cox transformations also makes forecasts with these test series slightly worse. The average effects were very small, and dwarfed by different performance on different domains and frequency of data.Sun, 22 Jan 2017 00:00:00 +1300
http://ellisp.github.io/blog/2017/01/22/forecast-seasadj-lambdaBooks I likeMy ten recommended books for applied statistics and data science. Then 13 more!Sat, 14 Jan 2017 00:00:00 +1300
http://ellisp.github.io/blog/2017/01/14/booksCross-validation of topic modellingCross-validation of the "perplexity" from a topic model, to help determine a good number of topics.Thu, 05 Jan 2017 00:00:00 +1300
