Effective Data Visualisation with R
Qualitative Data

Paul Murrell
The University of Auckland

Review

  • A data visualisation consists of data symbols, guides, and labels.

  • A data visualisation can help to answer questions.

    • An effective data visualisation will pose questions that the visual system is good at answering.
  • We need to choose a mapping from data values to data symbols.

    • An effective data visualisation will have good mappings from data to data symbols.

Visual Features

  • The combination of geom and aesthetic that we choose maps data values to a visual feature.

  • Visual features are the properties of data symbols that our visual system identifies very rapidly, in parallel, and without conscious effort.

  • Our task is to identify a good mapping from data values to visual features.

Effective Visual Features

  • One idea is to select geoms and aesthetics so that data values are mapped to visual features that have the same type as the data.

  • For visualising qualitative data, we need to identify which visual features are qualitative visual features.

Qualitative Data

  • Qualitative features

  • Ordinal features

  • Qualitative scales

  • Colour

  • Feature mismatch

  • Gestalt Principles

Qualitative Features

Qualitative Features

  • Colour and shape are examples of visual features that can represent qualitative values.

  • Position is also very effective for qualitative data.

Qualitative Features

  • By contrast, length, area, and angle are not appropriate for representing qualitative values.

Qualitative Data

  • The RWCperGame data frame contains measures of performance at the Rugby World Cup of 2023 for different countries, plus the hemisphere that each country is from.

  • hemisphere is an example of qualitative data.

    # A tibble: 6 × 11
      country   hemisphere yellowcards redcards cleanbreaks
      <chr>     <fct>            <dbl>    <dbl>       <dbl>
    1 Namibia   South             1        0.5         2.5 
    2 Romania   North             1.25     0           2.75
    3 Chile     South             1.25     0           3.75
    4 Samoa     South             1.25     0.25        3.75
    5 Australia South             0.5      0           5.25
    6 Georgia   North             0.5      0           5.25
    # ℹ 6 more variables: tackles <dbl>, points <dbl>,
    #   conversions <dbl>, offloads <dbl>, tries <dbl>,
    #   runs <dbl>

Qualitative Data

  • We map qualitative data to a visual feature in order to be able to identify values that belong to different groups or categories.

  • If we map hemisphere to a qualitative visual feature, we can answer questions like:

    • Do teams from the Northern hemisphere make more clean breaks than teams from the South (on average)?

    • The crucial visual task is to distinguish data for Northern teams from data for Southern teams.

We can visualise qualitative data using the shape of points.

  • Do teams from the Northern hemisphere make more clean breaks than teams from the South (on average)?

Qualitative Features in {ggplot2}

  • geom_point() and the shape aesthetic map data values to shape.
ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y="", shape=hemisphere),
               position=position_jitter(width=0, height=.1, seed=123))

We can visualise qualitative data using the position of points.

  • Do teams from the Northern hemisphere make more clean breaks than teams from the South (on average)?

Qualitative Features in {ggplot2}

  • The x and y aesthetics map qualitative data values to position for geom_point() and geom_col().
ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y=hemisphere),
               position=position_jitter(width=0, height=.1, seed=123))

We can visualise qualitative data using the colour of points.

  • Do teams from the Northern hemisphere make more clean breaks than teams from the South?

Qualitative Features in {ggplot2}

  • The colour aesthetic maps data values to colour for geom_point(), geom_col(), and geom_line().
ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y="", colour=hemisphere),
               position=position_jitter(width=0, height=.1, seed=123))

Qualitative Features in {ggplot2}

  • When the geom has an interior, the colour aesthetic maps data values to the border colour and the fill aesthetic maps data values to the fill colour.
ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y="", fill=hemisphere), pch=21,
               position=position_jitter(width=0, height=.1, seed=123))

Ordinal Features

Ordinal Data

  • The crimeLevelTotal data frame contains the total number of offences broken down by level of crime.

  • Are less severe crimes more common than more severe crimes?

crimeLevelTotal
# A tibble: 5 × 3
  level       total   prop
  <ord>       <int>  <dbl>
1 Low         24242 0.283 
2 Low-Medium  21513 0.251 
3 Medium      13823 0.161 
4 Medium-High 18315 0.214 
5 High         7793 0.0909

We can visualise ordinal data using the colour of bars.

  • Are less severe crimes more common than more severe crimes?

Ordinal Features

  • We perceive darker colours as “greater than” lighter colours.

Qualitative Scales

Qualitative Scales

  • The mapping from data values to a visual feature also depends on the scale of the mapping.

  • How are data values transformed to values on the visual feature?

The Principle of Unambiguity

  • The scale should be chosen so that differences in the data are visible in the data visualisation.

Qualitative Scales in {ggplot2}

  • We can control the set of shapes that data values are mapped to using scale_shape_manual().

    ggplot(RWCperGame) + 
        geom_point(aes(x=cleanbreaks, y="", shape=hemisphere),
                   position=position_jitter(width=0, height=.1)) +
        scale_shape_manual(values=c(16, 4))

Qualitative Scales in {ggplot2}

  • We can control how qualitative values are mapped to position with scale_y_discrete().
ggplot(RWCperGame) + 
    geom_point(aes(x=cleanbreaks, y=hemisphere),
               position=position_jitter(width=0, height=.1, seed=123)) +
    scale_y_discrete(expand=expansion(c(.2, .2)))

Colour

Qualitative Scales

  • The mapping from data values to a visual feature also depends on the scale of the mapping.

  • How are data values transformed to values on the visual feature?

Qualitative Scales

  • scale_colour_manual() and scale_fill_manual() allow us to select the colours that data values map to.

Qualitative Scales

  • However, there are a LOT of colours to choose from!

Colour

  • We have been talking about colour as if it is a single visual feature, but it can be split into three distinct features: hue, chroma (colourfulness), and luminance (light/dark).

Colour Palettes

  • Varying hue creates a qualitative colour palette.

    ggplot(crimeLevelTotal) +
        geom_col(aes(x=total, y="", fill=level), colour="black") +
        scale_fill_hue()

Colour palettes

  • Varying chroma or luminance (or both) creates an ordinal (or sequential) colour palette.

    ggplot(crimeLevelTotal) +
        geom_col(aes(x=total, y="", fill=level), colour="black") +
        colorspace::scale_fill_discrete_sequential()

Colour palettes

  • A diverging colour palette combines two sequential palettes with a neutral centre.

    ggplot(crimeLevelTotal) +
        geom_col(aes(x=total, y="", fill=level), colour="black") +
        colorspace::scale_fill_discrete_diverging()

  • Selecting a colour palette is a complex task, with competing constraints.

  • On one hand there is a desire for visual balance in order to avoid emphasising one category over another.

  • But that does not translate well to greyscale for printing.

  • There is also a need to account for colour-vision deficiency.

  • This shows what the previous colour palette might look like to a colour blind viewer.

  • There are also situations where a deliberate emphasis is desired in order to highlight one particular category.

Colour Palettes in {ggplot2}

Feature Mismatch

Feature Mismatch

  • If the visual feature has a “higher” type than the data, we risk adding superfluous or incorrect information.

Gestalt Principles

Gestalt Principles

  • An effective qualitative mapping maps different data values to visually different data symbols.

Gestalt Principles

  • An effective qualitative mapping maps the same data values to visually similar data symbols.

Gestalt Principles

  • An effective qualitative mapping maps groups of data values to visual groups of data symbols.

Gestalt Principles

  • The Gestalt psychologists identified several principles of visual grouping.

  • These are examples of visual processing that happen very rapidly and early in our visual system.

Gestalt Principles: Similarity

  • Mapping the same qualitative data to the same colour or shape produces visual groups through similarity.

Gestalt Principles: Proximity

  • Mapping the same qualitative data to the same position produces visual groups through proximity.

Gestalt Principles: Proximity

  • Mapping the same qualitative data to the same line produces visual groups through connectivity.

Gestalt Principles: Enclosure

  • Mapping the same qualitative data to the same polygon produces visual groups through enclosure

Gestalt Principles: Enclosure

  • Mapping the same qualitative data to the same polygon produces visual groups through enclosure

Gestalt Principles

  • Some of the Gestalt laws are stronger than others.

Summary

  • It makes sense to map qualitative data to qualitative visual features.

    • Colour, shape, and position are qualitative visual features.

    • Qualitative data can be visualised with:

      • points with different colours, positions, or shapes.
      • bars with different colours or positions.

Summary

  • Colour, shape, and position are appropriate visual features for representing qualitative data.

  • Colour can be broken into hue, chroma, and luminance.

  • Hue is only useful for nominal data.

  • Chroma and luminance can be used for ordinal data.

  • Position is a very effective and flexible visual feature.

Summary

  • The goal when mapping qualitative data is to create visual groups through:

    • position (proximity)
    • shape and colour (similarity)
    • connectivity (lines)
    • enclosure (polygons)

Exercise

  • We want to compare the count of offences for different genders:

    • Are the counts for Male higher or lower than the counts for Female (on average)?
    head(crimeGender)
      gender year count   pop     rate genderFactor   yearDate
    1   Male 2011  8808 95040 926.7677         Male 2011-06-30
    2 Female 2011  4207 90410 465.3246       Female 2011-06-30
    4   Male 2012  8031 95260 843.0611         Male 2012-06-30
    5 Female 2012  3513 90050 390.1166       Female 2012-06-30
    7   Male 2013  6768 94440 716.6455         Male 2013-06-30
    8 Female 2013  2976 89260 333.4080       Female 2013-06-30

Exercise

  • What type of data is gender?

  • Which visual features would be appropriate for representing gender?

  • Write {ggplot2} code to produce data visualisations that allow us to answer the questions of interest.

Exercise

  • The following code produces a simple bar chart

    ggplot(crimeGenderTotal) +
        geom_col(aes(total, gender))

  • Modify the code so that the Male bar is light blue and the Female bar is pink.

Exercise

  • Can you see anything wrong with this data visualisation?