Effective Data Visualisation with R
Qualitative Data

Paul Murrell
The University of Auckland

Review

A data visualisation consists of data symbols, guides, and labels.
A data visualisation can help to answer questions.
- An effective data visualisation will pose questions that the visual system is good at answering.
We need to choose a mapping from data values to data symbols.
- An effective data visualisation will have good mappings from data to data symbols.

Visual Features

The combination of geom and aesthetic that we choose maps data values to a visual feature.
Visual features are the properties of data symbols that our visual system identifies very rapidly, in parallel, and without conscious effort.
Our task is to identify a good mapping from data values to visual features.

Effective Visual Features

One idea is to select geoms and aesthetics so that data values are mapped to visual features that have the same type as the data.
For visualising qualitative data, we need to identify which visual features are qualitative visual features.

Qualitative Data

Qualitative features
Ordinal features
Qualitative scales
Colour
Feature mismatch
Gestalt Principles

Qualitative Features

Colour and shape are examples of visual features that can represent qualitative values.
Position is also very effective for qualitative data.

Qualitative Features

By contrast, length, area, and angle are not appropriate for representing qualitative values.

Qualitative Data

The RWCperGame data frame contains measures of performance at the Rugby World Cup of 2023 for different countries, plus the hemisphere that each country is from.

hemisphere is an example of qualitative data.

# A tibble: 6 × 11
  country   hemisphere yellowcards redcards cleanbreaks
  <chr>     <fct>            <dbl>    <dbl>       <dbl>
1 Namibia   South             1        0.5         2.5 
2 Romania   North             1.25     0           2.75
3 Chile     South             1.25     0           3.75
4 Samoa     South             1.25     0.25        3.75
5 Australia South             0.5      0           5.25
6 Georgia   North             0.5      0           5.25
# ℹ 6 more variables: tackles <dbl>, points <dbl>,
#   conversions <dbl>, offloads <dbl>, tries <dbl>,
#   runs <dbl>

Qualitative Data

We map qualitative data to a visual feature in order to be able to identify values that belong to different groups or categories.
If we map hemisphere to a qualitative visual feature, we can answer questions like:
- Do teams from the Northern hemisphere make more clean breaks than teams from the South (on average)?
- The crucial visual task is to distinguish data for Northern teams from data for Southern teams.

We can visualise qualitative data using the shape of points.

Do teams from the Northern hemisphere make more clean breaks than teams from the South (on average)?

Qualitative Features in {ggplot2}

geom_point() and the shape aesthetic map data values to shape.

ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y="", shape=hemisphere),
               position=position_jitter(width=0, height=.1, seed=123))

We can visualise qualitative data using the position of points.

Do teams from the Northern hemisphere make more clean breaks than teams from the South (on average)?

Qualitative Features in {ggplot2}

The x and y aesthetics map qualitative data values to position for geom_point() and geom_col().

ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y=hemisphere),
               position=position_jitter(width=0, height=.1, seed=123))

We can visualise qualitative data using the colour of points.

Do teams from the Northern hemisphere make more clean breaks than teams from the South?

Qualitative Features in {ggplot2}

The colour aesthetic maps data values to colour for geom_point(), geom_col(), and geom_line().

ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y="", colour=hemisphere),
               position=position_jitter(width=0, height=.1, seed=123))

Qualitative Features in {ggplot2}

When the geom has an interior, the colour aesthetic maps data values to the border colour and the fill aesthetic maps data values to the fill colour.

ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y="", fill=hemisphere), pch=21,
               position=position_jitter(width=0, height=.1, seed=123))

Ordinal Features

Ordinal Data

The crimeLevelTotal data frame contains the total number of offences broken down by level of crime.
Are less severe crimes more common than more severe crimes?

crimeLevelTotal

# A tibble: 5 × 3
  level       total   prop
  <ord>       <int>  <dbl>
1 Low         24242 0.283 
2 Low-Medium  21513 0.251 
3 Medium      13823 0.161 
4 Medium-High 18315 0.214 
5 High         7793 0.0909

We can visualise ordinal data using the colour of bars.

Are less severe crimes more common than more severe crimes?

Ordinal Features

We perceive darker colours as “greater than” lighter colours.

Qualitative Scales

The mapping from data values to a visual feature also depends on the scale of the mapping.
How are data values transformed to values on the visual feature?

The Principle of Unambiguity

The scale should be chosen so that differences in the data are visible in the data visualisation.

Qualitative Scales in {ggplot2}

We can control the set of shapes that data values are mapped to using scale_shape_manual().

ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y="", shape=hemisphere),
               position=position_jitter(width=0, height=.1)) +
    scale_shape_manual(values=c(16, 4))

Qualitative Scales in {ggplot2}

We can control how qualitative values are mapped to position with scale_y_discrete().

ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y=hemisphere),
               position=position_jitter(width=0, height=.1, seed=123)) +
    scale_y_discrete(expand=expansion(c(.2, .2)))

Colour

Qualitative Scales

The mapping from data values to a visual feature also depends on the scale of the mapping.
How are data values transformed to values on the visual feature?

Qualitative Scales

scale_colour_manual() and scale_fill_manual() allow us to select the colours that data values map to.

Qualitative Scales

However, there are a LOT of colours to choose from!

Colour

We have been talking about colour as if it is a single visual feature, but it can be split into three distinct features: hue, chroma (colourfulness), and luminance (light/dark).

Colour Palettes

Varying hue creates a qualitative colour palette.

ggplot(crimeLevelTotal) +
    geom_col(aes(x=total, y="", fill=level), colour="black") +
    scale_fill_hue()

Colour palettes

Varying chroma or luminance (or both) creates an ordinal (or sequential) colour palette.

ggplot(crimeLevelTotal) +
    geom_col(aes(x=total, y="", fill=level), colour="black") +
    colorspace::scale_fill_discrete_sequential()

Colour palettes

A diverging colour palette combines two sequential palettes with a neutral centre.

ggplot(crimeLevelTotal) +
    geom_col(aes(x=total, y="", fill=level), colour="black") +
    colorspace::scale_fill_discrete_diverging()

Selecting a colour palette is a complex task, with competing constraints.
On one hand there is a desire for visual balance in order to avoid emphasising one category over another.

But that does not translate well to greyscale for printing.

There is also a need to account for colour-vision deficiency.

This shows what the previous colour palette might look like to a colour blind viewer.

There are also situations where a deliberate emphasis is desired in order to highlight one particular category.

Colour Palettes in {ggplot2}

There are a number of predefined colour scales in {ggplot2}.
The R Journal article Coloring in R’s Blind Spot also offers some advice.

Feature Mismatch

If the visual feature has a “higher” type than the data, we risk adding superfluous or incorrect information.

Gestalt Principles

An effective qualitative mapping maps different data values to visually different data symbols.

Gestalt Principles

An effective qualitative mapping maps the same data values to visually similar data symbols.

Gestalt Principles

An effective qualitative mapping maps groups of data values to visual groups of data symbols.

Gestalt Principles

The Gestalt psychologists identified several principles of visual grouping.
These are examples of visual processing that happen very rapidly and early in our visual system.

Gestalt Principles: Similarity

Mapping the same qualitative data to the same colour or shape produces visual groups through similarity.

Gestalt Principles: Proximity

Mapping the same qualitative data to the same position produces visual groups through proximity.

Gestalt Principles: Proximity

Mapping the same qualitative data to the same line produces visual groups through connectivity.

Gestalt Principles: Enclosure

Mapping the same qualitative data to the same polygon produces visual groups through enclosure

Gestalt Principles: Enclosure

Mapping the same qualitative data to the same polygon produces visual groups through enclosure

Gestalt Principles

Some of the Gestalt laws are stronger than others.

Summary

It makes sense to map qualitative data to qualitative visual features.
- Colour, shape, and position are qualitative visual features.
- Qualitative data can be visualised with:
  - points with different colours, positions, or shapes.
  - bars with different colours or positions.

Summary

Colour, shape, and position are appropriate visual features for representing qualitative data.
Colour can be broken into hue, chroma, and luminance.
Hue is only useful for nominal data.
Chroma and luminance can be used for ordinal data.
Position is a very effective and flexible visual feature.

Summary

The goal when mapping qualitative data is to create visual groups through:
- position (proximity)
- shape and colour (similarity)
- connectivity (lines)
- enclosure (polygons)

Exercise

We want to compare the count of offences for different genders:

Are the counts for Male higher or lower than the counts for Female (on average)?

head(crimeGender)

  gender year count   pop     rate genderFactor   yearDate
1   Male 2011  8808 95040 926.7677         Male 2011-06-30
2 Female 2011  4207 90410 465.3246       Female 2011-06-30
4   Male 2012  8031 95260 843.0611         Male 2012-06-30
5 Female 2012  3513 90050 390.1166       Female 2012-06-30
7   Male 2013  6768 94440 716.6455         Male 2013-06-30
8 Female 2013  2976 89260 333.4080       Female 2013-06-30

Exercise

What type of data is gender?
Which visual features would be appropriate for representing gender?
Write {ggplot2} code to produce data visualisations that allow us to answer the questions of interest.

Exercise

The following code produces a simple bar chart

ggplot(crimeGenderTotal) +
    geom_col(aes(total, gender))

Modify the code so that the Male bar is light blue and the Female bar is pink.

Exercise

Can you see anything wrong with this data visualisation?

Effective Data Visualisation with RQualitative Data

Paul MurrellThe University of Auckland

Review

Visual Features

Effective Visual Features

Qualitative Data

Qualitative Features

Qualitative Features

Qualitative Data

Qualitative Data

Qualitative Features in {ggplot2}

Qualitative Features in {ggplot2}

Qualitative Features in {ggplot2}

Qualitative Features in {ggplot2}

Ordinal Data

Ordinal Features

Qualitative Scales

The Principle of Unambiguity

Qualitative Scales in {ggplot2}

Qualitative Scales in {ggplot2}

Qualitative Scales

Qualitative Scales

Qualitative Scales

Colour

Colour Palettes

Colour palettes

Colour palettes

Colour Palettes in {ggplot2}

Feature Mismatch

Gestalt Principles

Gestalt Principles

Gestalt Principles

Gestalt Principles

Gestalt Principles: Similarity

Gestalt Principles: Proximity

Gestalt Principles: Proximity

Gestalt Principles: Enclosure

Gestalt Principles: Enclosure

Gestalt Principles

Summary

Summary

Summary

Exercise

Exercise

Exercise

Exercise

Effective Data Visualisation with R
Qualitative Data

Paul Murrell
The University of Auckland