Effective Data Visualisation with R
Graphic Design

Paul Murrell
The University of Auckland

Review

  • A data visualisation consists of data symbols, guides, and labels.

  • We produce data symbols by mapping data values to the visual features of a shape.

  • Which visual feature we choose depends on:

    • Whether we have quantitative or qualitative data.
    • The accuracy and capacity of the visual feature.
    • The question we are interested in answering.

What is Data Visualisation?

  • We have so far looked at the components that make up a data visualisation - data symbols, guides, and labels - in isolation from one another.

  • In this topic, we consider the overall arrangement and layout of a data visualisation.

  • Design ideas are aimed at improving aesthetic appeal.

  • Design ideas also help to produce more consistent, coherent, and comprehensible data visualisations.

Graphic Design

  • CRAP design

  • {ggplot2} themes

  • Fancier Text

  • Clutter

CRAP Design

CRAP Design

  • Robin Williams popularised four standard design principles: contrast, repetition, alignment, and proximity.

Contrast

  • The principle of contrast states that different elements of an image should either be obviously similar or obviously different.

Contrast

  • Contrast should help the viewer to focus on the important elements of the data visualisation.

  • Contrast is related to pre-attentive pop out.

Contrast

  • An example of contrast is the use of colourful lines against the dull grey background.

  • This helps the viewer to focus on the main data elements of the plot.

Repetition

  • The principle of repetition states that related elements of an image should have similar visual features.

Repetition

  • Repetition provides a structure and organisation that helps the viewer to navigate and comprehend an image.

  • Repetition relates to Gestalt grouping.

Repetition

  • An example of repetition is the use of consistent colours between the lines in the main plot and the lines in the legend.

Alignment

  • The principle of alignment states that every element of an image should be aligned in some way with another element of the image.

Alignment

  • Alignment helps to make an image look cleaner and more orderly, which again helps with navigation and reduces the cognitive load.

Alignment

  • Examples of alignment are the left-justification of the plot title relative to the grey background and the legend title relative to the legend background.

Proximity

  • The principle of proximity states that related elements of an image should be placed close together.

  • Proximity relates to Gestalt grouping.

Proximity

  • Proximity also helps with structure and organisation, particularly to help the viewer to identify elements that belong together as a group.

Proximity

  • An example of proximity is the placement of the elements of the plot legend

  • The elements of the legend are close to each other, with an empty margin between them and the main plot.

CRAP Design

  • We can apply each of the CRAP principles of graphic design to both critically assess an existing data visualisation and to suggest ideas for improvement.

Contrast

  • We can increase the contrast between elements:

    • The title and the lines and labels that represent the data.
    • The background, grid lines, and axis labels.
  • There is now more emphasis on the title and the data.

Alignment

  • We can align multiple elements with both the left and right ends of the data lines.

  • There is now a simpler and cleaner structure.

Repetition

  • We can repeat the colours in lines and labels.

  • This creates clear groupings of these elements.

Proximity

  • We can increase the proximity of the labels and the lines.

  • This removes the need for a separate legend, further simplifying the plot.

{ggplot2} themes

Themes in {ggplot2}

  • Producing a complete data visualisation involves making a lot of design decisions.

  • Part of the convenience of {ggplot2} comes from the fact that it makes a lot of decisions for us.

  • If we want to control the contrast, repetition, alignment, and proximity of visual elements, we need to take control of these decisions.

  • Themes provide a way to modify the default {ggplot2} decisions.

  • {ggplot2} makes a lot of decisions about scales, colours, and labels.

    ggplot(crimeGender) +
        geom_line(aes(yearDate, count, colour=gender)) +
        labs(title="Number of Incidents") 

Themes in {ggplot2}

  • The theme() function allows individual details to be adjusted.

  • Functions like theme_bw() and theme_dark() change a whole host of defaults at once.

  • Functions like theme_bw() and theme_dark() change a whole host of defaults at once.

    ggplot(crimeGender) +
        geom_line(aes(yearDate, count, colour=gender)) +
        labs(title="Number of Incidents") +
        theme_bw()

The theme() function

  • The arguments to theme() are different theme settings, e.g., plot.title to control the style of the plot title.

  • The value supplied for each setting depends on the type of setting, e.g., plot.title controls a text element so we provide a value using element_text().

  • Functions like element_text() have arguments to control text settings, like family amd size.

  • Functions like element_text() have arguments to control text settings, like family and size.

    ggplot(crimeGender) +
        geom_line(aes(yearDate, count, colour=gender)) +
        labs(title="Number of Incidents") +
        theme(plot.title=element_text(size=20, face="bold"))

  • element_rect() provides settings for borders and backgrounds, like panel.background.

    ggplot(crimeGender) +
        geom_line(aes(yearDate, count, colour=gender)) +
        labs(title="Number of Incidents") +
        theme(panel.background=element_rect(colour="black", fill=NA))

  • element_line() provides settings for lines, like panel.grid.

    ggplot(crimeGender) +
        geom_line(aes(yearDate, count, colour=gender)) +
        labs(title="Number of Incidents") +
        theme(panel.grid=element_line(colour="black", linewidth=1))

  • element_blank() is used to remove a plot element entirely.

    ggplot(crimeGender) +
        geom_line(aes(yearDate, count, colour=gender)) +
        labs(title="Number of Incidents") +
        theme(panel.background=element_blank())

The theme() function

  • Some themes settings are more general and provide defaults for more specific settings.

    • For example, there are axis.title, axis.title.x, and axis.title.x.bottom settings.
  • Some theme settings are neither text, nor lines, nor rectangles.

    • For example, legend.position is a character vector that specifies the location of the plot legend: "left", "right", "bottom", "top", or "none".

line axis.ticks.theta axis.line.x.top legend.position.inside plot.background
rect axis.ticks.r axis.line.x.bottom legend.direction plot.title
text axis.minor.ticks.x.top axis.line.y legend.byrow plot.title.position
title axis.minor.ticks.x.bottom axis.line.y.left legend.justification plot.subtitle
aspect.ratio axis.minor.ticks.y.left axis.line.y.right legend.justification.top plot.caption
axis.title axis.minor.ticks.y.right axis.line.theta legend.justification.bottom plot.caption.position
axis.title.x axis.minor.ticks.theta axis.line.r legend.justification.left plot.tag
axis.title.x.top axis.minor.ticks.r legend.background legend.justification.right plot.tag.position
axis.title.x.bottom axis.ticks.length legend.margin legend.justification.inside plot.tag.location
axis.title.y axis.ticks.length.x legend.spacing legend.location plot.margin
axis.title.y.left axis.ticks.length.x.top legend.spacing.x legend.box strip.background
axis.title.y.right axis.ticks.length.x.bottom legend.spacing.y legend.box.just strip.background.x
axis.text axis.ticks.length.y legend.key legend.box.margin strip.background.y
axis.text.x axis.ticks.length.y.left legend.key.size legend.box.background strip.clip
axis.text.x.top axis.ticks.length.y.right legend.key.height legend.box.spacing strip.placement
axis.text.x.bottom axis.ticks.length.theta legend.key.width panel.background strip.text
axis.text.y axis.ticks.length.r legend.key.spacing panel.border strip.text.x
axis.text.y.left axis.minor.ticks.length legend.key.spacing.x panel.spacing strip.text.x.bottom
axis.text.y.right axis.minor.ticks.length.x legend.key.spacing.y panel.spacing.x strip.text.x.top
axis.text.theta axis.minor.ticks.length.x.top legend.frame panel.spacing.y strip.text.y
axis.text.r axis.minor.ticks.length.x.bottom legend.ticks panel.grid strip.text.y.left
axis.ticks axis.minor.ticks.length.y legend.ticks.length panel.grid.major strip.text.y.right
axis.ticks.x axis.minor.ticks.length.y.left legend.axis.line panel.grid.minor strip.switch.pad.grid
axis.ticks.x.top axis.minor.ticks.length.y.right legend.text panel.grid.major.x strip.switch.pad.wrap
axis.ticks.x.bottom axis.minor.ticks.length.theta legend.text.position panel.grid.major.y complete
axis.ticks.y axis.minor.ticks.length.r legend.title panel.grid.minor.x validate
axis.ticks.y.left axis.line legend.title.position panel.grid.minor.y
axis.ticks.y.right axis.line.x legend.position panel.ontop

Example

  • The code for the modified line plot controls many theme settings.

ggplot(crimeGender) +
    geom_line(aes(yearDate, count, colour=gender), linewidth=1) +
    labs(title="Number of Incidents") +
    scale_x_date(expand=expansion(0)) +
    geom_label(data=subset(crimeGender, year == 2021),
               aes(label=gender, x=yearDate, y=count, colour=gender),
               hjust=1, vjust=-.5, fontface="bold", label.size=NA,
               label.padding=unit(0, "mm"), fill=rgb(.1, .1, .1)) +
    geom_text(data=subset(crimeGender, gender == "Male"),
              aes(label=year, x=yearDate, y=-1000),
              hjust=1, colour="grey40", size=3) +
    theme(plot.background=element_rect(colour=NA, fill=rgb(.1,.1,.1)),
          panel.background=element_rect(colour=NA, 
                                        fill=rgb(.1,.1,.1)),
          panel.grid.major.y=element_line(colour="grey40", 
                                          linewidth=.2),
          panel.grid.major.x=element_blank(),
          panel.grid.minor=element_blank(),
          legend.position="none",
          text=element_text(colour="grey"),
          plot.title=element_text(colour="white", size=16, 
                                  face="bold"),
          axis.title=element_blank(),
          axis.text.y=element_text(colour="grey40", vjust=0),
          axis.text.x=element_blank(),
          axis.ticks=element_blank(),
          plot.margin=unit(c(.5, 1, .5, 1), "cm"))

Fancier Text

Fancier Text in {ggplot2}

  • Standard text labels in R can only have a single font and style.

  • The {ggtext} package provides support for changing the style within a label.

Two things are required:

  • basic markup (markdown or HTML/CSS) in the text.

  • element_markdown() within theme().

    ggplot(crimeGender) +
        geom_line(aes(yearDate, count, colour=gender)) +
        labs(title=paste0('<span style="color: ', cols[2], 
                          '">**Males**</span>',
                          " Responsible for More Offences than ",
                          '<span style="color: ', cols[1], 
                          '">**Females**</span>'),
             caption=paste0('Total number of offences by youth ',
                            'aged between 14 and 16 from 2010 ',
                            'to 2020 in New Zealand.<br>',
                            '**Source:** New Zealand Ministry ',
                            'of Justice *"Youth Justice Indicator ',
                            'Report"*, 2021.')) +
        theme(plot.title=element_markdown(),
              plot.caption=element_markdown(hjust=0),
              legend.position="none")

Clutter

Effective Data Visualisation

  • We have emphasised the importance of mapping data values to visual features that facilitate mapping back to the data values.

Clutter

  • Visual elements that have no connection to the data create clutter.

  • Clutter may distract attention from the data symbols.

  • Clutter may add to the complexity of the data visualisation (the number of elements to fixate on).

Clutter

  • Not all embellishments are necessarily evil.

An effective data visualisation will have minimal clutter.

  • Tufte calls clutter chart junk and suggests maximising the data-ink ratio.

  • Many other data visualisation guidelines advocate minimising clutter.

  • There are some dissenting voices.

    • There is some evidence that embellishments may improve engagement and recall.

    • There is also some evidence that making the viewer work harder may improve accuracy and retention.

Summary

Summary

  • Simple applications of contrast, repetition, alignment, and proximity can have a significant impact on the overall effectiveness of a data visualisation.

    • Reduce cognitive load.
    • Reduce clutter.
    • Direct attention.
  • {ggplot2} themes allow control over many of the details of a data visualisation.

  • The {ggtext} package allows finer control over text labels.

  • Less clutter is generally better.

Exercises

Exercise

Exercise

  • Can you identify any weaknesses in the data visualisation below in terms of contrast, repetition, alignment, and proximity?

    ggplot(crimeLevel) +
        geom_line(aes(year, count, colour=level))

Exercise

  • Can you create this modified version of the data visualisation?

    HINT: These are mostly just theme() settings.

Exercise

  • Can you see anything wrong with this data visualisation?

    New Zealand Listener December 9-15 2023.