Effective Data Visualisation with R
Visual Accuracy and Capacity

Paul Murrell
The University of Auckland

Review

A data visualisation consists of data symbols, guides, and labels.
A data visualisation can help to answer questions.
- An effective data visualisation will pose questions that the visual system is good at answering.
We need to choose a mapping from data values to data symbols.
- An effective data visualisation will have good mappings from data to data symbols.

Visual Features

The combination of geom and aesthetic that we choose maps data values to a visual feature.
Visual features are the properties of data symbols that our visual system identifies very rapidly, in parallel, and without conscious effort.
Our task is to identify a good mapping from data values to visual features.

Effective Data Visualisation

We can select geoms and aesthetics so that data values are mapped to visual features that are perceived more accurately.
We can select geoms and aesthetics so that data values are mapped to visual features that have the capacity to represent all data values.

Visual Accuracy and Capacity

Quantitative accuracy
Quantitative caveats
Qualitative capacity
Visual Congruence

Quantitative Accuracy

For quantitative comparisons, we need to be able to perceive the size of differences between values.
It is important that we can accurately perceive the size of differences and ratios.

Perception is Nonlinear

Steven’s Law suggests that the perceptual response to stimuli is a power function.

Perception is Nonlinear

For each set of symbols, how much larger (or brighter) are the larger (or brighter) symbols?

Experimental studies have established an agreed ordering of visual dimenions in terms of accuracy.

An effective data visualisation will map quantitative data to length or position.

We can accurately perceive distances and ratios when data values are mapped to the length of bars or the position of points.

Accurate Features in {ggplot2}

ggplot(crimeLevelTotal) + 
    geom_col(aes(x=total, y=level))

Accurate Features in {ggplot2}

ggplot(crimeLevelTotal) + 
    geom_point(aes(x=total, y=level))

Quantitative Caveats

Unaligned Position and Length

Comparing two positions or lengths is less effective when they are on unaligned scales.

Unaligned Length

The Distance Effect

Comparing two positions or lengths is less effective when there is greater separation.

The Distance Effect

Which is higher: Northland or Tasman?

Ordering the bars in a bar plot can help to make comparisons are more accurate.

Ordering Bars in {ggplot2}

crimeDistrictTotal$districtOrdered <- 
    with(crimeDistrictTotal, reorder(district, total))

ggplot(crimeDistrictTotal) +
    geom_col(aes(x=total, y=districtOrdered))

Non-Linear Mappings

Non-linear scales can be useful to make differences visible

Non-Linear Mappings

The accuracy of position and length only holds for linear scales.
Using a non-linear scaling restricts these features to just ordinal comparisons.

Non-Linear Mappings in {ggplot2}

The transform argument to scale_x_continuous() allows non-linear mappings

ggplot(crimeTypeOrdered) +
    geom_point(aes(x=total, y=type)) +
    scale_x_continuous(transform="log10")

Non-Linear Mappings in {ggplot2}

There are also convenience functions like scale_x_log10().

ggplot(crimeTypeOrdered) +
    geom_point(aes(x=total, y=type)) +
    scale_x_log10()

Ordinal Accuracy

If we only require ordinal comparisons for quantitative data, then even luminance and/or chroma may suffice.
Which level of crime is most common?
Which is least common?

Ordinal Accuracy

It may be necessary to fall back on luminance and/or chroma if position has already been used for other things.

Qualitative Capacity

Capacity of Visual Features

For qualitative data, we only need to be able to perceive that two values are different.
Given N different categories, we require N different visual values that are clearly distinguishable from each other.

Colour Capacity

Colour (hue) is limited to between 5 and 7 distinct values.

Colour Capacity

Colour (hue) is limited to between 5 and 7 distinct values.

Colour Capacity

Colour (hue) is limited to between 5 and 7 distinct values.

Colour Capacity

The limit is even lower for chroma and luminance

Shape Capacity

Shape is also limited to between 5 and 7 distinct values.

Shape Capacity

Shape is limited to between 5 and 7 distinct values.

Position Capacity

Position provides the greatest capacity, only limited by the physical space available.

Position Capacity

Position provides the greatest capacity, only limited by the physical space available.

Position Capacity

Using redundant mappings can be even more effective.

Visual Congruence

congruent
/ˈkɒŋɡrʊənt/

adjective: congruent

    1.
    in agreement or harmony.

Google

Effective Data Visualisation

We can select geoms and aesthetics so that data values are mapped to visual features that are congruent with the data.
Familiar connections between visual features and data values make it easier and faster to comprehend a data visualisation.

Tversky and Morrison (2002)

Part-to-Whole Data

A proportion represents a part of a whole.

crimeLevelTotal

## # A tibble: 5 × 3
##   level       total   prop
##   <ord>       <int>  <dbl>
## 1 Low         24242 0.283 
## 2 Low-Medium  21513 0.251 
## 3 Medium      13823 0.161 
## 4 Medium-High 18315 0.214 
## 5 High         7793 0.0909

Part-to-Whole Data

An angle of a circle represents a part of a whole.

Part-to-Whole Data

A length of a rectangle can represent a part of a whole.

Cyclic Data

Dates are cyclic.

head(offencesByMonth)

##   month year Freq
## 1     1 2019    0
## 2     2 2019    0
## 3     3 2019    0
## 4     4 2019    0
## 5     5 2019    0
## 6     6 2019    0

Cyclic Data

Angles around a circle are cyclic.
Note that these are overall reports of crimes
(not youth crime).

Negative Data

We have already seen that some visual features have no representation of zero (e.g., hue).
Features that can represent zero still may not be able to
represent negative values.

Colour Congruence

Some hues are congruent with particular objects or concepts.
Blue is cold and red is hot.
Bananas are yellow.
Some associations are very culture-specific and/or era-specific.

More is More

A larger length or area should correspond to a larger data value.
A darker colour should correspond to a larger data value.

Summary

For quantitative data, accuracy matters.
- position and length are the most accurate,
  area and angle are worse, and
  volume, luminance and chroma are poor.
For qualitative data, capacity matters.
- colour and shape are limited to 5-7 different values.
- position has a large capacity.
Congruent visual features make it easier to comprehend a data visualisation.

Exercises

Exercise

We want to compare the total of offenders for different ethnic groups.

crimeGroupTotal

## # A tibble: 3 × 4
##   group          total groupFactor    groupNumeric
##   <chr>          <int> <fct>                 <dbl>
## 1 European/Other 31274 European/Other            1
## 2 Māori          41787 Māori                     2
## 3 Pasifika        6993 Pasifika                  3

Exercise

Identify the the mappings in the data visualisation below.
Write {ggplot2} code to draw the data visualisation.
Write {ggplot2} code to draw an improved version.

Exercise

We want to compare the number of cleanbreaks for countries from different different hemispheres.

head(RWCperGame, 3)

##    country hemisphere yellowcards redcards cleanbreaks tackles points
## 11 Namibia      South        1.00      0.5        2.50  102.00   9.25
## 14 Romania      North        1.25      0.0        2.75  142.50   8.00
## 3    Chile      South        1.25      0.0        3.75  132.25   6.75
##    conversions offloads tries   runs
## 11        0.50     2.25  0.75  92.00
## 14        0.75     1.50  1.00  81.00
## 3         0.50     5.25  1.00 102.25

Exercise

Identify the the mappings in the data visualisation below.
Write {ggplot2} code to draw the data visualisation.
Write {ggplot2} code to draw an improved version.

Exercise

Can you see anything wrong with this data visualisation?

Effective Data Visualisation with RVisual Accuracy and Capacity

Paul MurrellThe University of Auckland

Review

Visual Features

Effective Data Visualisation

Visual Accuracy and Capacity

Quantitative Accuracy

Perception is Nonlinear

Perception is Nonlinear

Accurate Features in {ggplot2}

Accurate Features in {ggplot2}

Unaligned Position and Length

Unaligned Length

The Distance Effect

The Distance Effect

Ordering Bars in {ggplot2}

Non-Linear Mappings

Non-Linear Mappings

Non-Linear Mappings in {ggplot2}

Non-Linear Mappings in {ggplot2}

Ordinal Accuracy

Ordinal Accuracy

Capacity of Visual Features

Colour Capacity

Colour Capacity

Colour Capacity

Colour Capacity

Shape Capacity

Shape Capacity

Position Capacity

Position Capacity

Position Capacity

Visual Congruence

Effective Data Visualisation

Part-to-Whole Data

Part-to-Whole Data

Part-to-Whole Data

Cyclic Data

Cyclic Data

Negative Data

Colour Congruence

More is More

Summary

Exercise

Exercise

Exercise

Exercise

Exercise

Effective Data Visualisation with R
Visual Accuracy and Capacity

Paul Murrell
The University of Auckland