International Household Income Inequality data

30 Jun 2016

At a glance:

I explore the University of Texas Inequality Project's Estimated Household Income Inequality data, which provides modelled estimates of inequality for more than 150 countries from 1963 to 2008.

I’m at the New Zealand Association of Economists annual conference in Auckland. The opening keynote speech was from James K. Galbraith on a global view of inequality. He showed a variety of results from the University of Texas Inequality Project’s Estimated Household Income Inequality dataset, which I hadn’t realised existed before. It’s the result of a patient and painstaking effort to make the most internationally comparable estimate possible of household inequality, and involves modelling when needed to create predicted inequality based on the best indicators available.

The data are also online, with a whole bunch of supporting material. Well done Professor Galbraith and University of Texas! Here’s a taster view.

First, download the data, bring it into R, and tidy it up from its wide format into a more analysis-friendly tidy or normalised form:

library(dplyr)
library(tidyr)
devtools::install_github("hadley/ggplot2") # dev version needed for subtitle and caption
library(ggplot2) 
library(scales)
library(openxlsx)
library(ggseas)


download.file("http://utip.lbj.utexas.edu/data/EHII-UPDATED-10-30-2013.xlsx",
              destfile = "ehii.xlsx", mode = "wb")

ehii <- read.xlsx("ehii.xlsx")[ , -1] # don't need the first column

ehii_tidy <- ehii %>%
   gather(Year, Gini, -Country, -Code) %>%
   mutate(Year = as.numeric(Year))

Let’s take a first look

ggplot(ehii_tidy, aes(x = Year, y = Gini, colour = Country)) +
   geom_line() +
   theme(legend.position = "none")

plain

OK, lots of lovely data, not a terribly attractive plot. Not informative either, having chopped off the legend. We should be able to do better than that.

One thing of interest might be which countries have seen the biggest changes over time. Restricting ourselves to just countries with data in 1963 (to make comparison valid), let’s have a go:

gini-growth

Here’s the code that constructed that plot:

full_countries <- ehii_tidy %>%
   filter(Year = min(Year) & !is.na(Gini))

final_result <- ehii_tidy %>%
   filter(Country %in% full_countries$Country & !is.na(Gini)) %>%
   group_by(Country) %>%
   mutate(Gini_index = Gini / Gini[1] * 100) %>%
   filter(Year == max(Year)) %>%
   mutate(Year = Year + 1,
          label = paste(Code, round(Gini)))

ehii_tidy %>%
   filter(Country %in% full_countries$Country) %>%
   ggplot(aes(x = Year, y = Gini, colour = Country)) +
   stat_index(index.ref = 1, alpha = 0.3) +
   theme(legend.position = "none") +
   geom_text(data = final_result, aes(label = label, y = Gini_index)) +
   ggtitle("Relative changes in inequality since 1963",
           subtitle = "(for countries with data from 1963)") +
   labs(y = "Index of Gini coefficient, set to be 100 in 1963",
        x = "", caption = "UTIP Estimated Household Income Inequality dataset") +
   xlim(1960, 2012) +
   annotate("text", x = 1977, y = 150, 
            label = "Country codes appear at their final data point; 
numbers are the latest available Gini coefficient")

This is obviously just the beginning. The countries have their ISO 3 character codes, which will make it easy to join them with other data for analysis. Maps are an obvious presentation step too, and Galbraith’s team look to make extensive use of the data for this purpose. Looking forward to a closer look when I’ve got more time.