Effective Data Visualisation with R
Visual Accuracy and Capacity

Paul Murrell
The University of Auckland

Review

  • A data visualisation consists of data symbols, guides, and labels.

  • A data visualisation can help to answer questions.

    • An effective data visualisation will pose questions that the visual system is good at answering.
  • We need to choose a mapping from data values to data symbols.

    • An effective data visualisation will have good mappings from data to data symbols.

Visual Features

  • The combination of geom and aesthetic that we choose maps data values to a visual feature.

  • Visual features are the properties of data symbols that our visual system identifies very rapidly, in parallel, and without conscious effort.

  • Our task is to identify a good mapping from data values to visual features.

Effective Data Visualisation

  • We can select geoms and aesthetics so that data values are mapped to visual features that are perceived more accurately.

  • We can select geoms and aesthetics so that data values are mapped to visual features that have the capacity to represent all data values.

Visual Accuracy and Capacity

  • Quantitative accuracy

  • Quantitative caveats

  • Qualitative capacity

  • Visual Congruence

Quantitative Accuracy

Quantitative Accuracy

  • For quantitative comparisons, we need to be able to perceive the size of differences between values.

  • It is important that we can accurately perceive the size of differences and ratios.

Perception is Nonlinear

  • Steven’s Law suggests that the perceptual response to stimuli is a power function.

Perception is Nonlinear

  • For each set of symbols, how much larger (or brighter) are the larger (or brighter) symbols?

  • Experimental studies have established an agreed ordering of visual dimenions in terms of accuracy.

An effective data visualisation will map quantitative data to length or position.

  • We can accurately perceive distances and ratios when data values are mapped to the length of bars or the position of points.

Accurate Features in {ggplot2}

ggplot(crimeLevelTotal) + 
    geom_col(aes(x=total, y=level))

Accurate Features in {ggplot2}

ggplot(crimeLevelTotal) + 
    geom_point(aes(x=total, y=level))

Quantitative Caveats

Unaligned Position and Length

  • Comparing two positions or lengths is less effective when they are on unaligned scales.

Unaligned Length

The Distance Effect

  • Comparing two positions or lengths is less effective when there is greater separation.

The Distance Effect

  • Which is higher: Northland or Tasman?

Ordering the bars in a bar plot can help to make comparisons are more accurate.

Ordering Bars in {ggplot2}

crimeDistrictTotal$districtOrdered <- 
    with(crimeDistrictTotal, reorder(district, total))
ggplot(crimeDistrictTotal) +
    geom_col(aes(x=total, y=districtOrdered))

Non-Linear Mappings

  • Non-linear scales can be useful to make differences visible

Non-Linear Mappings

  • The accuracy of position and length only holds for linear scales.

  • Using a non-linear scaling restricts these features to just ordinal comparisons.

Non-Linear Mappings in {ggplot2}

  • The transform argument to scale_x_continuous() allows non-linear mappings

    ggplot(crimeTypeOrdered) +
        geom_point(aes(x=total, y=type)) +
        scale_x_continuous(transform="log10")

Non-Linear Mappings in {ggplot2}

  • There are also convenience functions like scale_x_log10().

    ggplot(crimeTypeOrdered) +
        geom_point(aes(x=total, y=type)) +
        scale_x_log10()

Ordinal Accuracy

  • If we only require ordinal comparisons for quantitative data, then even luminance and/or chroma may suffice.

  • Which level of crime is most common?
    Which is least common?

Ordinal Accuracy

  • It may be necessary to fall back on luminance and/or chroma if position has already been used for other things.

Qualitative Capacity

Capacity of Visual Features

  • For qualitative data, we only need to be able to perceive that two values are different.

  • Given N different categories, we require N different visual values that are clearly distinguishable from each other.

Colour Capacity

  • Colour (hue) is limited to between 5 and 7 distinct values.

Colour Capacity

  • Colour (hue) is limited to between 5 and 7 distinct values.

Colour Capacity

  • Colour (hue) is limited to between 5 and 7 distinct values.

Colour Capacity

  • The limit is even lower for chroma and luminance

Shape Capacity

  • Shape is also limited to between 5 and 7 distinct values.

Shape Capacity

  • Shape is limited to between 5 and 7 distinct values.

Position Capacity

  • Position provides the greatest capacity, only limited by the physical space available.

Position Capacity

  • Position provides the greatest capacity, only limited by the physical space available.

Position Capacity

  • Using redundant mappings can be even more effective.

Visual Congruence

Visual Congruence

congruent
/ˈkɒŋɡrʊənt/

adjective: congruent

    1.
    in agreement or harmony.

Effective Data Visualisation

  • We can select geoms and aesthetics so that data values are mapped to visual features that are congruent with the data.

  • Familiar connections between visual features and data values make it easier and faster to comprehend a data visualisation.

Part-to-Whole Data

  • A proportion represents a part of a whole.

    crimeLevelTotal
    ## # A tibble: 5 × 3
    ##   level       total   prop
    ##   <ord>       <int>  <dbl>
    ## 1 Low         24242 0.283 
    ## 2 Low-Medium  21513 0.251 
    ## 3 Medium      13823 0.161 
    ## 4 Medium-High 18315 0.214 
    ## 5 High         7793 0.0909

Part-to-Whole Data

  • An angle of a circle represents a part of a whole.

Part-to-Whole Data

  • A length of a rectangle can represent a part of a whole.

Cyclic Data

  • Dates are cyclic.

    head(offencesByMonth)
    ##   month year Freq
    ## 1     1 2019    0
    ## 2     2 2019    0
    ## 3     3 2019    0
    ## 4     4 2019    0
    ## 5     5 2019    0
    ## 6     6 2019    0

Cyclic Data

  • Angles around a circle are cyclic.

  • Note that these are overall reports of crimes
    (not youth crime).

Negative Data

  • We have already seen that some visual features have no representation of zero (e.g., hue).

  • Features that can represent zero still may not be able to
    represent negative values.

Colour Congruence

  • Some hues are congruent with particular objects or concepts.

  • Blue is cold and red is hot.

  • Bananas are yellow.

  • Some associations are very culture-specific and/or era-specific.

More is More

  • A larger length or area should correspond to a larger data value.

  • A darker colour should correspond to a larger data value.

Summary

Summary

  • For quantitative data, accuracy matters.

    • position and length are the most accurate,
      area and angle are worse, and
      volume, luminance and chroma are poor.
  • For qualitative data, capacity matters.

    • colour and shape are limited to 5-7 different values.

    • position has a large capacity.

  • Congruent visual features make it easier to comprehend a data visualisation.

Exercises

Exercise

  • We want to compare the total of offenders for different ethnic groups.

    crimeGroupTotal
    ## # A tibble: 3 × 4
    ##   group          total groupFactor    groupNumeric
    ##   <chr>          <int> <fct>                 <dbl>
    ## 1 European/Other 31274 European/Other            1
    ## 2 Māori          41787 Māori                     2
    ## 3 Pasifika        6993 Pasifika                  3

Exercise

  • Identify the the mappings in the data visualisation below.

  • Write {ggplot2} code to draw the data visualisation.

  • Write {ggplot2} code to draw an improved version.

Exercise

  • We want to compare the number of cleanbreaks for countries from different different hemispheres.

    head(RWCperGame, 3)
    ##    country hemisphere yellowcards redcards cleanbreaks tackles points
    ## 11 Namibia      South        1.00      0.5        2.50  102.00   9.25
    ## 14 Romania      North        1.25      0.0        2.75  142.50   8.00
    ## 3    Chile      South        1.25      0.0        3.75  132.25   6.75
    ##    conversions offloads tries   runs
    ## 11        0.50     2.25  0.75  92.00
    ## 14        0.75     1.50  1.00  81.00
    ## 3         0.50     5.25  1.00 102.25

Exercise

  • Identify the the mappings in the data visualisation below.

  • Write {ggplot2} code to draw the data visualisation.

  • Write {ggplot2} code to draw an improved version.

Exercise

  • Can you see anything wrong with this data visualisation?