Applying statistics and data science 'in the wild'
I write about applications of data and analytical techniques like statistical modelling and simulation to real-world situations. I show how to access and use data, and provide examples of analytical products and the code that produced them.
Tree-based predictive analytics methods like random forests and extreme gradient boosting may perform poorly with data that is out of the range of the original training data.
A quick demonstration of the impact of inevitably random estimates of the parameters and meta-parameters in ARIMA time series modelling
I check out exponential smoothing state space models for univariate time series as a general family of forecasting models, and in particular the `ets`, `stlm` and `thetaf` functions from Hyndman's forecast R package. For monthly and quarterly seasonal data, `thetaf` seems to be slightly outperformed by its more flexible and general cousins.
Adding a (small amount of) polish to a well known chart of seasonal Arctic sea declining over the years.
I look more into this business of energy from earthquakes.
I polish up a dramatic pie chart from stuff.co.nz on earthquake energy released in New Zealand over the last few years.
I'm working on a new R package to make it easier to forecast timeseries with the xgboost machine learning algorithm. So far in tests against large competition data collections (thousands of timeseries), it performs comparably to the nnetar neural network method, but not as well as more traditional timeseries methods like auto.arima and theta.
I have a quick look at the polling data used by the FiveThirtyEight website in predicting the USA presidential election results
A new R package `Tcomp` makes data from the 2010 tourism forecasting competition available in a format designed to facilitate the fitting and testing of en masse automated forecasts, consistent with the M1 and M3 forecasting competition data in the `Mcomp` R package.
Statistics New Zealand recently launched experimental access to some of their data over the web via an application programming interface; it can be accessed easily via the equally experimental statsNZ R package by Jonathan Marshall.