Effective Data Visualisation with R
Multiple Visual Features

Paul Murrell
The University of Auckland

Review

  • A data visualisation involves mapping data values to visual features.

  • Which visual feature we choose depends on:

    • Whether we have quantitative or qualitative data.
    • The accuracy and capacity of the visual feature.
    • The question we are interested in answering.
  • We can map back from visual features to the raw data and to data statistics.

  • When we map to more than one visual feature, we need to consider possible interactions between features.

Multiple Visual Features

  • We have only considered mappings that produce simple data symbols

    • Bars
    • Points
    • Wedges
  • In this section we will begin to consider more complex data symbols.

Multiple Visual Features

  • Multi-Value Data Symbols

  • Visual Shapes

  • Visual Shape Caveats

  • Multivariate Data Symbols

  • Reusing Visual Features

  • Visual Objects

Review

Multi-Value Data Symbols

Youth Crime

  • The crimeGender data frame contains information on the crime rate for each gender across multiple years.
head(crimeGender)
##   gender year count   pop     rate genderFactor   yearDate
## 1   Male 2011  8808 95040 926.7677         Male 2011-06-30
## 2 Female 2011  4207 90410 465.3246       Female 2011-06-30
## 4   Male 2012  8031 95260 843.0611         Male 2012-06-30
## 5 Female 2012  3513 90050 390.1166       Female 2012-06-30
## 7   Male 2013  6768 94440 716.6455         Male 2013-06-30
## 8 Female 2013  2976 89260 333.4080       Female 2013-06-30

Simple Data Symbols

  • We can map one data value from each of two variables to two visual feature of one data symbol.

  • The gender and rate values from one year map to the horizontal and vertical position of one point.

Simple Data Symbols

  • We can map one data value from each of two variables to two visual feature of one data symbol.

  • Two data values map to one symbol.

Multi-Value Data Symbols

  • We can map two data values from each of two variables to two visual features of one data symbol.

  • Two rate values and two genders map to the start and end of one line (for each year).

Multi-Value Data Symbols

  • Two rate values from two genders map to the start and end of one line.
ggplot(crimeGender) +
    geom_line(aes(x=gender, y=rate, group=year))

Multi-Value Data Symbols

  • We can map two data values from each of two variables to two visual features of one data symbol.

  • Four data values map to one data symbol.

Multi-Value Data Symbols

  • The mapping is to the position of the ends of the lines, but additional visual features, the length and angle of the lines, are produced.

Multi-Value Data Symbols

  • The angle of the lines allows us to directly perceive differences in the raw data.

Multi-Value Data Symbols

  • The angle of the lines allows us to directly perceive differences in the raw data.

Visual Shapes

Complex Multi-Value Data Symbols

  • We can map multiple values from each of two variables to two visual features of one data symbol.

  • Multiple rate values from multiple years map to the positions for one polyline.

Complex Multi-Value Data Symbols

  • Multiple rate values from multiple years map to the positions of one polyline.

    ggplot(crimeGender) +
        geom_line(aes(x=year, y=rate, group=gender))

Complex Multi-Value Data Symbols

  • We can map multiple values from each of two variables to two visual features of one data symbol.

  • Multiple data values map to one data symbol.

Complex Multi-Value Data Symbols

  • The mapping is to the points that the polyline passes through, but we no longer have a simple data symbol.

  • Instead we have a visual shape.

Complex Multi-Value Data Symbols

  • The complex data symbol allows us to perceive statistical summaries of the data:

    • minima and maxima (peaks and troughs)
    • trends

  • The complex data symbol allows us to perceive statistical summaries of the data

  • It is no longer easy to perceive the raw data.

When we map data values to the position of a line, we get additional visual features.

  • With only a few data values, we get length and angle, which map back to differences in the raw data.

  • With many values, we get visual shapes that map back to data statistics.

Visual Shape Caveats

Aspect Ratio

  • The shapes of lines are dependent on the aspect ratio of a plot.

Aspect Ratio in {ggplot2}

  • The aspect ratio is controlled by theme(aspect.ratio) (coming soon!)

  • The ggthemes::bank_slopes() function calculates an aspect ratio so that the median absolute slope is 45 degrees.

Accidental Visual Shapes

  • The two lines below appear to converge; we see a curved wedge that narrows.

  • However, in terms of y-positions, the vertical distance between the lines is constant.

Visual Congruence

  • A connecting line implies continuity (and constant change).

  • This is not always appropriate.

Multivariate Data Symbols

Rugby World Cup

  • The RWCperGame data frame contains measures of performance at the Rugby World Cup of 2023 for different countries, plus the hemisphere that each country is from.

    head(tibble(RWCperGame))
    ## # A tibble: 6 × 11
    ##   country   hemisphere yellowcards redcards cleanbreaks
    ##   <chr>     <fct>            <dbl>    <dbl>       <dbl>
    ## 1 Namibia   South             1        0.5         2.5 
    ## 2 Romania   North             1.25     0           2.75
    ## 3 Chile     South             1.25     0           3.75
    ## 4 Samoa     South             1.25     0.25        3.75
    ## 5 Australia South             0.5      0           5.25
    ## 6 Georgia   North             0.5      0           5.25
    ## # ℹ 6 more variables: tackles <dbl>, points <dbl>,
    ## #   conversions <dbl>, offloads <dbl>, tries <dbl>,
    ## #   runs <dbl>

Multivariate Data Symbols

  • We can map data values from three variables to three visual features of a data symbol.

Multivariate Data Symbols

  • The cleanbreaks and tries and hemisphere from one country map to the horizontal and vertical position and colour of one point.

Multivariate Data Symbols

  • We can map data values from three variables to three visual features of a data symbol.

Multivariate Data Symbols

  • We can continue this approach to more variables, but we run out of different visual features and it becomes difficult to perceive separate visual features.

Reusing Visual Features

Reusing Visual Features

  • An alternative to mapping n data variables to n different visual features is to map n data variables to the same visual feature.

Avoiding Overplotting

  • Mapping identical values to position results in overplotting.

    ggplot(crimeGender) +
        geom_col(aes(x=year, y=count, fill=gender), position="identity")

  • One option is to stack bars.

  • Both year and gender are mapped to the vertical position of the bars.

    ggplot(crimeGender) +
        geom_col(aes(x=year, y=count, fill=gender), position="stack")

  • One option is to dodge bars.

  • Both year and gender are mapped to the horizontal position of the bars.

    ggplot(crimeGender) +
        geom_col(aes(x=year, y=count, fill=gender), position="dodge")

Facetting

  • We can map one or two variables to position in space and then make each data symbol a data visualisation.

  • All visual features become available again, including position.

Facetting

  • We can map one or two variables to position in space and then make each data symbol a data visualisation.

  • All visual features become available again, including position.

Facetting in {ggplot2}

  • The facet_wrap() and facet_grid() functions generate separate panels for each level of one or more variables.

    ggplot(crimeGender) +
        geom_col(aes(x=year, y=count, fill=gender)) +
        facet_wrap(vars(gender))

Non-Cartesian Coordinates

  • With cartesian coordinates, we are limited to horizontal and vertical position.

Parallel Coordinates

  • We can map continuous variables from RWCperGame to the positions of a line through parallel coordinates.

Parallel Coordinates

  • The order of the axes can make a big difference.

Small Multiples

  • We can map continuous variables from RWCperGame to the length of lines radiating from the same centre.

Small Multiples

  • The mapping is to the points that the polyline passes through, but we no longer have a simple data symbol.

  • Instead we have a visual shape.

  • The complex data symbol allows us to perceive statistical summaries of the data:

    • Similar shapes suggest high-dimensional groups
    • Similar shapes suggest high-dimensional correlations

  • The complex data symbol allows us to perceive statistical summaries of the data

  • It is no longer easy to perceive the raw data.

Visual Objects

Visual Objects

  • We can map data values to the visual features, e.g., size, of data symbols that resemble a familiar object.

Visual Objects

  • We can map data values to the visual features, e.g., count, of data symbols that resemble a familiar object.

  • We map qualitative data to the visual object.

  • We map quantitative data to the visual features of the object.

Visual Objects

  • We can map data values to more abstract features of data symbols that resemble familiar objects

    • Similar shapes suggest high-dimensional groups
    • Similar shapes suggest high-dimensional correlations

  • The visual object allows us to perceive statistical summaries of the data.

  • It is very difficult to perceive the raw data.

Complex Data Symbols in {ggplot2}

Parallel Coordinates in {ggplot2}

  • The ggparcoord() function from the {GGally} package can be used to produce a parallel coordinates plot.

  • The data argument is a data frame.

  • The columns argument selects the variables of data to plot.

  • The scale argument provides options for scaling the parallel axes.

  • The order argument provides ways to automatically select the order of the parallel axes.

Small Multiples in {ggplot2}

  • The geom_starglyph() function from the {gglyph} package can be used to produce small multiples.

  • The cols argument selects the variables to plot.

Chernoff Faces in {ggplot2}

  • The geom_chernoff() function from the {ggChernoff} package can be used to produce Chernoff faces.

  • The smile, brow, nose, and eyes aesthetics can be used to map data values to properties of the faces.

Summary

Summary

  • Multi-value data symbols represent multiple data values in a single data symbol.

    • Multi-value data symbols can generate visual shapes.
  • Multivariate data symbols represent more than two variables in a single data symbol.

    • Multivariate data symbols can generate visual objects.
  • Visual shapes and visual objects allow us to perceive higher-level data statistics.

    • Visual shapes and visual objects make it harder to perceive the raw data.

Summary

Exercises

Exercise

  • The crimeDistrict data frame contains the count of offenders per district over years.

    • How do the overall trends differ between districts?
    head(crimeDistrict)
    ##           district year count      pop     rate   yearDate
    ## 1        Northland 2011   599  6829.58 877.0671 2011-06-30
    ## 2        Waitematā 2011  1072 23290.05 460.2824 2011-06-30
    ## 3    Auckland City 2011   765 16867.14 453.5446 2011-06-30
    ## 4 Counties/Manukau 2011  1494 24418.10 611.8412 2011-06-30
    ## 5          Waikato 2011  1055 14869.22 709.5194 2011-06-30
    ## 6    Bay of Plenty 2011  1329 14158.22 938.6773 2011-06-30

  • Does the data visualisation below answer the questions?

  • What mappings/visual features are involved?

  • Can you write appropriate {ggplot2} code?
    (grey lines an optional challenge)

Exercise

  • The crimeYear data frame contains the total number of offenders per year.

    • What was the overall trend across years?
    • Which year had the largest change in total?
    head(crimeYear)
    ## # A tibble: 6 × 3
    ##   yearDate   total  year
    ##   <date>     <int> <int>
    ## 1 2011-06-30 13018  2011
    ## 2 2012-06-30 11545  2012
    ## 3 2013-06-30  9747  2013
    ## 4 2014-06-30  7965  2014
    ## 5 2015-06-30  6843  2015
    ## 6 2016-06-30  6253  2016

  • Does the data visualisation below answer the questions?

  • What mappings/visual features are involved?

  • Can you write appropriate {ggplot2} code?

  • Does the data visualisation below answer the questions?

  • What mappings/visual features are involved?

  • Can you write appropriate {ggplot2} code?

  • Does the data visualisation below answer the questions?

  • What mappings/visual features are involved?

  • Can you write appropriate {ggplot2} code?

  • Does the data visualisation below answer the questions?

  • What mappings/visual features are involved?

  • Can you write appropriate {ggplot2} code?

  • Is this a sensible use of visual objects?

    New Zealand Listener October 14-20 2023.