Statistical graphics for communicating

Peter Ellis

June 2016

Today’s content

  • Different purposes of graphics
  • What makes graphics excellence
  • Improving graphics

Purposes of graphics

Data science workflow

datascience

Grolemund and Wickham, http://r4ds.had.co.nz/introduction.html

Different purposes

…exploratory…

…analysis and diagnosis…

…presentation…

Comprehend this:

data(anscombe)
anscombe[ , c(1,5,2,6,3,7,4,8)]
##    x1    y1 x2   y2 x3    y3 x4    y4
## 1  10  8.04 10 9.14 10  7.46  8  6.58
## 2   8  6.95  8 8.14  8  6.77  8  5.76
## 3  13  7.58 13 8.74 13 12.74  8  7.71
## 4   9  8.81  9 8.77  9  7.11  8  8.84
## 5  11  8.33 11 9.26 11  7.81  8  8.47
## 6  14  9.96 14 8.10 14  8.84  8  7.04
## 7   6  7.24  6 6.13  6  6.08  8  5.25
## 8   4  4.26  4 3.10  4  5.39 19 12.50
## 9  12 10.84 12 9.13 12  8.15  8  5.56
## 10  7  4.82  7 7.26  7  6.42  8  7.91
## 11  5  5.68  5 4.74  5  5.73  8  6.89

compared to:

Put the data in its place

Use during analysis

present results

Compare to

Dependent variable:
MedianIncome
MeanBedrooms 0.012
(0.011)
PropPrivateDwellings 0.650***
(0.111)
PropSeparateHouse -0.148***
(0.025)
PropMultiPersonHH -0.082
(0.105)
PropNotOwnedHH 0.170***
(0.033)
MedianRentHH 0.0002***
(0.00003)
PropLandlordPublic -0.014
(0.018)
PropNoMotorVehicle -0.274***
(0.067)
PropOld 0.490***
(0.073)
PropAreChildren 0.221***
(0.075)
PropSameResidence5YearsAgo -0.053*
(0.029)
PropOverseas5YearsAgo -0.501***
(0.087)
PropMaori -0.074***
(0.028)
PropPacific -0.249***
(0.041)
PropAsian -0.296***
(0.031)
PropNoReligion -0.137***
(0.035)
PropSmoker 0.064
(0.068)
PropSeparated -0.318***
(0.075)
PropDoctorate 1.914***
(0.215)
PropPTStudent 0.334*
(0.184)
PropUnemploymentBenefit -0.211
(0.172)
PropStudentAllowance -1.959***
(0.189)
PropFullTimeEmployed 1.674***
(0.056)
PropPartTimeEmployed 0.095
(0.107)
PropUnemployed 0.058
(0.208)
PropEmployer 0.876***
(0.073)
PropSelfEmployedNoEmployees -0.333***
(0.053)
PropTrades -0.492***
(0.079)
PropLabourers -0.460***
(0.057)
PropAgForFish 0.029
(0.034)
PropPubAdmin 0.120**
(0.047)
PropFinServices 0.996***
(0.158)
PropProfServices 1.235***
(0.088)
PropWorked40_49hours 0.277***
(0.060)
PropPublicTransport -0.005
(0.057)
PropWalkJogBike -0.076*
(0.043)
PropNoUnpaidActivities -0.842***
(0.086)
Constant 8.878***
(0.139)
Observations 1,785
Note: p<0.1; p<0.05; p<0.01

<>

Illustrate concepts

animation1

Graphic excellence

Principles

  • well-designed presentation of interesting data - substance, statistics, and design
  • complex ideas communicated with clarity, precision, and efficiency
  • greatest number of ideas in the shortest time with the least ink in the smallest space
  • nearly always multivariate
  • telling the truth about the data

Adapted from Tufte

Some specifics

  • Comparative
  • Multivariate
  • High data density
  • Reveal interactions and comparisons
  • Nearly all the ink is data ink

Examples

Change this…

…to this:

This is good

But this is better

This is good

But this is better

More detailed examples

Perception of quantity

From best to worst

  1. Position
  2. Length
  3. Area
  4. Volume
  5. Area and slope
  6. Colour and density

Typical stacked bars…

Orient for easy reading

Sequential colours

Diverging scale

Use position

Much better than

Cluttered

Minimal axis guides

Fade axis title

Remove borders

Remove boxes

Guidelines to back

Background to back

Consistent doc theme

Consistent font

Corporate colours

Direct labels

Much better than:

Principles

  • Remove all unnecessary ink
  • Focus on the data

Example

Original

User-friendly labels

Horizontal text

Meaningful ordering

Better shape and geom

Labels on points

Title and annotation

Another dimension

Better than:

More principles

  • Use order / position on page
  • As multivariate and comparative as possible
  • Choose right geom to make comparison easy
  • Use colour to make comparison easy
  • Avoid strobing and similar unfortunate effects

Improvement example 3

Difficult

Use cartesian coordinates

Use height

Flip for readability

Sequence

Maximise focus on data

Labels near the data

Use like a table

Better than

More principles

  • Don’t rely on angle and slope - use position instead
  • Minimise non-data ink
  • Subtle colours to focus on the data
  • Make matching data to labels easy

Statistical transformations

Not just this

But this

Or this

Last set of tips

  • Don’t be afraid of using a statistical transform to make the data meaningful
  • Discrete annotations, but which don’t take the foreground from the data
  • Lots of subtlety in colour (shades of grey) to allow focus on data

Final word

  • Comparative
  • Multivariate
  • High data density
  • Reveal interactions and comparisons
  • Nearly all the ink is data ink
  • All attention to the data and to the story!

image