Effective Data Visualisation with R
Combining Visual Features

Paul Murrell
The University of Auckland

Review

  • A data visualisation consists of data symbols (and guides) and labels.

  • We create a data symbol by mapping data values to the visual features of a shape.

  • Which visual feature we choose depends on:

    • Whether we have quantitative or qualitative data.
    • The accuracy and capacity of the visual feature.
    • The question we are interested in answering.

Review

  • We have so far focused on a single mapping from data values to a visual feature (and back).

Combining Visual Features

  • All data visualisations employ multiple visual features, but we have so far assumed that those features operate independently.

  • We have also largely assumed that we are only interested in mapping back to the raw data.

  • In this section we will begin to consider more complex mappings between data values and visual features.

Combining Visual Features

  • Describing Mappings

  • Visual Summaries

  • Redundant Mappings

  • Independent Visual Features

  • Visual Features that Interact

  • Other Types of Interaction

Describing Mappings

Rugby World Cup

  • The RWCperGame data frame contains measures of performance at the Rugby World Cup of 2023 for different countries, plus the hemisphere that each country is from.

    head(tibble(RWCperGame))
    ## # A tibble: 6 × 11
    ##   country   hemisphere yellowcards redcards cleanbreaks
    ##   <chr>     <fct>            <dbl>    <dbl>       <dbl>
    ## 1 Namibia   South             1        0.5         2.5 
    ## 2 Romania   North             1.25     0           2.75
    ## 3 Chile     South             1.25     0           3.75
    ## 4 Samoa     South             1.25     0.25        3.75
    ## 5 Australia South             0.5      0           5.25
    ## 6 Georgia   North             0.5      0           5.25
    ## # ℹ 6 more variables: tackles <dbl>, points <dbl>,
    ## #   conversions <dbl>, offloads <dbl>, tries <dbl>,
    ## #   runs <dbl>

A Simple Mapping

  • The simplest case involves mapping one data value from one variable to one visual feature of one data symbol.

  • The number of cleanbreaks for one country maps to the position of one point.

A Simple Mapping

  • The number of cleanbreaks for one country maps to the position of one point.
ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y=""),
               position=position_jitter(width=0, height=.1, seed=123))

A Simple Mapping

  • The simplest case involves mapping one data value from one variable (a) to one visual feature (x) of one data symbol.

A Simple Mapping

  • The simplest case involves mapping back from the visual feature to the raw data.

A Simple Mapping

  • The position of each point allows us to compare the cleanbreaks of different countries.

    • One country makes 2.5 cleanbreaks per game.
    • One country makes 10 more cleanbreaks.
    • Some countries make twice the number of cleanbreaks compared to others.

Visual Summaries

Visual Summaries

  • We can also perceive visual summaries from a visual feature:

    • The range of cleanbreaks.
    • The average number of cleanbreaks.
    • The mode of cleanbreaks.

Visual Summaries

  • We can map back from a visual feature to the raw data.

  • We can also map back from visual summaries to data statistics.

An effective data visualisation will produce visual summaries of the data.

  • An effective visual summary relies on rapid parallel processing of multiple basic data symbols and features.

  • An effective visual summary does not require conscious attention to each individual data symbol.

Visual summaries provide less precision, but more detail than a typical mathematical summary.

Redundant Mappings

Redundant Mappings

  • We can map one data value from one variable to two visual features of one data symbol.

  • The number of cleanbreaks for one country maps to the position and colour of one point.

Redundant Mappings

  • The number of cleanbreaks for one country maps to the position and colour of one point.
ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y="", colour=cleanbreaks),
               position=position_jitter(width=0, height=.1, seed=123))

Redundant Mappings

  • We can map one data value from one variable (a) to two visual features (x and y) of one data symbol.

Redundant Mappings

  • A redundant mapping provides multiple paths from the visual features back to the raw data.

It can be effective to map a data value to multiple visual features.

  • A colour-blind viewer who is unable to map colour back to data values may be able to map another visual feature.

Independent Visual Features

Independent Visual Features

  • We can map data values from two variables to two visual features of a data symbol.

  • The cleanbreaks and the hemisphere from one country maps to the position and colour of one point.

Independent Visual Features

  • The cleanbreaks and the hemisphere from one country maps to the position and colour of one point.
ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y="", colour=hemisphere),
               position=position_jitter(width=0, height=.1, seed=123))

Independent Visual Features

  • We can map data values from two variables (a and b) to two visual features (x and y) of one data symbol.

Independent Visual Features

  • Position and colour do not interact.

  • We can perceive the positions of points independently of the colour of the points.

Independent Visual Features

  • This is also true for a combination of position and length.

An effective data visualisation allows each individual visual feature to map back to the corresponding data values.

Visual Features that Interact

Visual Features that Interact

  • We can map data values from two variables to two visual features of a data symbol.

Visual Features that Interact

  • The cleanbreaks and the tries from one country map to the horizontal and vertical position of one point.

  • The cleanbreaks and the tries from one country map to the horizontal and vertical position of one point.
ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y=tries))

Visual Features that Interact

  • The horizontal and vertical position interact to produce position in space (2D).

Visual Features that Interact

  • Position in space allows us to perceive interactions between the original data values.

  • Position in space provides visual summaries.

    • Clusters or groups of points (proximity).
    • The correlation between cleanbreaks and tries.

Some combinations of visual features can be effective by producing additional visual features.

  • The additional visual features can map back to information about interactions between data variables.

Youth Crime

  • The crimeEthnicity data frame contains the crime rate and the population for different ethnicitys over multiple years.
head(crimeEthnicity)
##   ethnicity year count      prop   yearDate
## 1     Māori 2011  5957 46.451965 2011-06-30
## 2  Pasifika 2011  1092  8.515284 2011-06-30
## 3     Asian 2011   243  1.894885 2011-06-30
## 4     MELAA 2011    99  0.771990 2011-06-30
## 5     Other 2011    58  0.452277 2011-06-30
## 6  European 2011  5375 41.913600 2011-06-30

Visual Features that Interact

  • We can map the pop and rate to the horizontal and vertical length (width and height) of rectangles.

  • We can map the pop and rate to the horizontal and vertical length (width and height) of rectangles.
ggplot(crimeGroup) +
    geom_tile(aes(x=year, y=group, width=pop/200000, height=rate/1500))

  • Horizontal and vertical length interact to produce area and shape.

  • In this case, area allows us to perceive count.

  • In this case, the additional shape feature makes it harder to perceive the separate length features.

Some combinations of visual features can be effective by producing additional visual features.

  • Additional visual features map back to data statistics.

  • We may not be able to map back to the raw data.

Other Types of Interaction

Accidental Interaction

  • Is there a correlation between country and cleanbreaks?

An effective data visualisation combines visual features that generate additional visual features only on purpose.

  • We want to combine visual features that interact when the interaction is useful.

  • We want to combine visual features that are independent when interaction is not useful.

Visual Features that Interact

  • The perception of correlation is affected by the amount of white space.

Visual Features that Interact

  • The perception of correlation is affected by the amount of white space.

Interaction Between Data Symbols

  • The interaction between visual features can be weakened or broken by the presence of other data symbols.

Proximity

  • Gestalt proximity says that items that are close together are seen as a group.

  • Discretised data values can create columns or rows of data symbols.

Summary

Summary

  • We can perceive visual summaries that map back to data statistics.

  • When we combine visual features, the features may act independently, or the features may interact.

  • Interactions between features can produce additional visual features.

  • We want to select visual features that interact only when we want to produce the additional visual effects.

  • The interaction between visual features can be sensitive to context.

Exercises

Exercise

  • The RWCperGame data frame contains measures of performance at the Rugby World Cup of 2023 for different countries, plus the hemisphere that each country is from.

  • We can also calculate the conversionRate for each country.

    RWCperGame$conversionRate <- 
        RWCperGame$conversions/RWCperGame$tries

Exercise

  • We want to explore the conversionRate
    • Comparing one country with another.
    • Is there a correlation with the number of tries?
head(RWCperGame[c("country", "tries", "conversionRate")])
##      country tries conversionRate
## 11   Namibia  0.75      0.6666667
## 14   Romania  1.00      0.7500000
## 3      Chile  1.00      0.5000000
## 15     Samoa  2.75      0.7272727
## 2  Australia  2.75      0.6363636
## 7    Georgia  1.75      0.5714286
cor(RWCperGame$tries, RWCperGame$conversionRate)
## [1] 0.3820937

  • Comparing one country with another.

    • Do the visual features interact or are they independent?
    • Are there any additional visual features?
    • Are there any visual summaries?

  • Is there a correlation with the number of tries?

    • Do the visual features interact or are they independent?
    • Are there any additional visual features?
    • Are there any visual summaries?

Exercise

  • Can you write {ggplot2} code to produce the previous two data visualisations?

Exercise

  • Can you identify:
    • A redundant mapping?
    • An interaction between visual features?

New Zealand Listener August 19-25 2023.