Effective Data Visualisation with R
Combining Visual Features

Paul Murrell
The University of Auckland

Review

A data visualisation consists of data symbols (and guides) and labels.
We create a data symbol by mapping data values to the visual features of a shape.
Which visual feature we choose depends on:
- Whether we have quantitative or qualitative data.
- The accuracy and capacity of the visual feature.
- The question we are interested in answering.

Review

We have so far focused on a single mapping from data values to a visual feature (and back).

Combining Visual Features

All data visualisations employ multiple visual features, but we have so far assumed that those features operate independently.
We have also largely assumed that we are only interested in mapping back to the raw data.
In this section we will begin to consider more complex mappings between data values and visual features.

Combining Visual Features

Describing Mappings
Visual Summaries
Redundant Mappings
Independent Visual Features
Visual Features that Interact
Other Types of Interaction

Describing Mappings

Rugby World Cup

The RWCperGame data frame contains measures of performance at the Rugby World Cup of 2023 for different countries, plus the hemisphere that each country is from.

head(tibble(RWCperGame))

## # A tibble: 6 × 11
##   country   hemisphere yellowcards redcards cleanbreaks
##   <chr>     <fct>            <dbl>    <dbl>       <dbl>
## 1 Namibia   South             1        0.5         2.5 
## 2 Romania   North             1.25     0           2.75
## 3 Chile     South             1.25     0           3.75
## 4 Samoa     South             1.25     0.25        3.75
## 5 Australia South             0.5      0           5.25
## 6 Georgia   North             0.5      0           5.25
## # ℹ 6 more variables: tackles <dbl>, points <dbl>,
## #   conversions <dbl>, offloads <dbl>, tries <dbl>,
## #   runs <dbl>

A Simple Mapping

The simplest case involves mapping one data value from one variable to one visual feature of one data symbol.
The number of cleanbreaks for one country maps to the position of one point.

A Simple Mapping

The number of cleanbreaks for one country maps to the position of one point.

ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y=""),
               position=position_jitter(width=0, height=.1, seed=123))

A Simple Mapping

The simplest case involves mapping one data value from one variable (a) to one visual feature (x) of one data symbol.

A Simple Mapping

The simplest case involves mapping back from the visual feature to the raw data.

A Simple Mapping

The position of each point allows us to compare the cleanbreaks of different countries.
- One country makes 2.5 cleanbreaks per game.
- One country makes 10 more cleanbreaks.
- Some countries make twice the number of cleanbreaks compared to others.

Visual Summaries

We can also perceive visual summaries from a visual feature:
- The range of cleanbreaks.
- The average number of cleanbreaks.
- The mode of cleanbreaks.

Visual Summaries

We can map back from a visual feature to the raw data.
We can also map back from visual summaries to data statistics.

An effective data visualisation will produce visual summaries of the data.

An effective visual summary relies on rapid parallel processing of multiple basic data symbols and features.
An effective visual summary does not require conscious attention to each individual data symbol.

Visual summaries provide less precision, but more detail than a typical mathematical summary.

Redundant Mappings

We can map one data value from one variable to two visual features of one data symbol.
The number of cleanbreaks for one country maps to the position and colour of one point.

Redundant Mappings

The number of cleanbreaks for one country maps to the position and colour of one point.

ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y="", colour=cleanbreaks),
               position=position_jitter(width=0, height=.1, seed=123))

Redundant Mappings

We can map one data value from one variable (a) to two visual features (x and y) of one data symbol.

Redundant Mappings

A redundant mapping provides multiple paths from the visual features back to the raw data.

It can be effective to map a data value to multiple visual features.

A colour-blind viewer who is unable to map colour back to data values may be able to map another visual feature.

Independent Visual Features

We can map data values from two variables to two visual features of a data symbol.
The cleanbreaks and the hemisphere from one country maps to the position and colour of one point.

Independent Visual Features

The cleanbreaks and the hemisphere from one country maps to the position and colour of one point.

ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y="", colour=hemisphere),
               position=position_jitter(width=0, height=.1, seed=123))

Independent Visual Features

We can map data values from two variables (a and b) to two visual features (x and y) of one data symbol.

Independent Visual Features

Position and colour do not interact.
We can perceive the positions of points independently of the colour of the points.

Independent Visual Features

This is also true for a combination of position and length.

An effective data visualisation allows each individual visual feature to map back to the corresponding data values.

Visual Features that Interact

We can map data values from two variables to two visual features of a data symbol.

Visual Features that Interact

The cleanbreaks and the tries from one country map to the horizontal and vertical position of one point.

The cleanbreaks and the tries from one country map to the horizontal and vertical position of one point.

ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y=tries))

Visual Features that Interact

The horizontal and vertical position interact to produce position in space (2D).

Visual Features that Interact

Position in space allows us to perceive interactions between the original data values.

Position in space provides visual summaries.
- Clusters or groups of points (proximity).
- The correlation between cleanbreaks and tries.

Some combinations of visual features can be effective by producing additional visual features.

The additional visual features can map back to information about interactions between data variables.

Youth Crime

The crimeEthnicity data frame contains the crime rate and the population for different ethnicitys over multiple years.

head(crimeEthnicity)

##   ethnicity year count      prop   yearDate
## 1     Māori 2011  5957 46.451965 2011-06-30
## 2  Pasifika 2011  1092  8.515284 2011-06-30
## 3     Asian 2011   243  1.894885 2011-06-30
## 4     MELAA 2011    99  0.771990 2011-06-30
## 5     Other 2011    58  0.452277 2011-06-30
## 6  European 2011  5375 41.913600 2011-06-30

Visual Features that Interact

We can map the pop and rate to the horizontal and vertical length (width and height) of rectangles.

We can map the pop and rate to the horizontal and vertical length (width and height) of rectangles.

ggplot(crimeGroup) +
    geom_tile(aes(x=year, y=group, width=pop/200000, height=rate/1500))

Horizontal and vertical length interact to produce area and shape.

In this case, area allows us to perceive count.

In this case, the additional shape feature makes it harder to perceive the separate length features.

Some combinations of visual features can be effective by producing additional visual features.

Additional visual features map back to data statistics.
We may not be able to map back to the raw data.

Other Types of Interaction

Accidental Interaction

Is there a correlation between country and cleanbreaks?

An effective data visualisation combines visual features that generate additional visual features only on purpose.

We want to combine visual features that interact when the interaction is useful.
We want to combine visual features that are independent when interaction is not useful.

Visual Features that Interact

The perception of correlation is affected by the amount of white space.

Visual Features that Interact

The perception of correlation is affected by the amount of white space.

Interaction Between Data Symbols

The interaction between visual features can be weakened or broken by the presence of other data symbols.

Proximity

Gestalt proximity says that items that are close together are seen as a group.
Discretised data values can create columns or rows of data symbols.

Summary

We can perceive visual summaries that map back to data statistics.
When we combine visual features, the features may act independently, or the features may interact.
Interactions between features can produce additional visual features.
We want to select visual features that interact only when we want to produce the additional visual effects.
The interaction between visual features can be sensitive to context.

Exercises

Exercise

The RWCperGame data frame contains measures of performance at the Rugby World Cup of 2023 for different countries, plus the hemisphere that each country is from.

We can also calculate the conversionRate for each country.

RWCperGame$conversionRate <- 
    RWCperGame$conversions/RWCperGame$tries

Exercise

We want to explore the conversionRate
- Comparing one country with another.
- Is there a correlation with the number of tries?

head(RWCperGame[c("country", "tries", "conversionRate")])

##      country tries conversionRate
## 11   Namibia  0.75      0.6666667
## 14   Romania  1.00      0.7500000
## 3      Chile  1.00      0.5000000
## 15     Samoa  2.75      0.7272727
## 2  Australia  2.75      0.6363636
## 7    Georgia  1.75      0.5714286

cor(RWCperGame$tries, RWCperGame$conversionRate)

## [1] 0.3820937

Comparing one country with another.
- Do the visual features interact or are they independent?
- Are there any additional visual features?
- Are there any visual summaries?

Is there a correlation with the number of tries?
- Do the visual features interact or are they independent?
- Are there any additional visual features?
- Are there any visual summaries?

Exercise

Can you write {ggplot2} code to produce the previous two data visualisations?

Exercise

Can you identify:
- A redundant mapping?
- An interaction between visual features?

New Zealand Listener August 19-25 2023.

Effective Data Visualisation with RCombining Visual Features

Paul MurrellThe University of Auckland

Review

Review

Combining Visual Features

Combining Visual Features

Rugby World Cup

A Simple Mapping

A Simple Mapping

A Simple Mapping

A Simple Mapping

A Simple Mapping

Visual Summaries

Visual Summaries

Redundant Mappings

Redundant Mappings

Redundant Mappings

Redundant Mappings

Independent Visual Features

Independent Visual Features

Independent Visual Features

Independent Visual Features

Independent Visual Features

Visual Features that Interact

Visual Features that Interact

Visual Features that Interact

Visual Features that Interact

Youth Crime

Visual Features that Interact

Accidental Interaction

Visual Features that Interact

Visual Features that Interact

Interaction Between Data Symbols

Proximity

Summary

Exercise

Exercise

Exercise

Exercise

Effective Data Visualisation with R
Combining Visual Features

Paul Murrell
The University of Auckland