Effective Data Visualisation with R
The Visual System

Paul Murrell
The University of Auckland

Review

  • A data visualisation consists of data symbols, guides, and labels.

  • A data visualisation can help to answer questions.

    • An effective data visualisation will pose questions that the visual system is good at answering.
  • We need to choose a mapping from data values to data symbols.

    • An effective data visualisation will have good mappings from data to data symbols.

What To Draw

  • We want a data visualisation to take advantage of the strengths of the visual system (and avoid the weaknesses of the visual system).

  • In this section we will describe a very simple model of human visual perception and use that to identify some strengths and weaknesses.

  • This will lead to some basic guidelines for creating an effective data visualisation.

The Visual System

  • The eye.

  • A very simple model of the visual system.

  • Attention.

  • Visual illusions.

The eye

The Eye

  • Light enters the eye through the pupil and is focused by the lens onto the retina at the back of the eye.

  • Around 100 million retinal nerve cells are combined into the optic nerve, about 1 million nerve fibres, that connects directly to the brain.

An effective data visualisation is one that we can see.

  • Seems obvious, right?

  • What can we not see in the data visualisation below?

  • What can we not see in the data visualisation below?

                  mpg cyl disp  hp drat    wt  qsec vs am gear carb
    Mazda RX4      21   6  160 110  3.9 2.620 16.46  0  1    4    4
    Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4

Overplotting

  • Possible solutions include semitransparency and jittering (adding a small random value).

Visual Impairment

  • Another consideration is that the audience for a data visualisation may have some form of visual impairment.

  • Examples of measures that can be taken include:

    • Providing alternative text
      (plus text-reader-friendly formats like R Markdown or HTML).

      See Liz Hare’s presentation to R-Ladies New York

    • Selecting colour-blind safe colours
      (we will return to this later).

An effective data visualisation is one that we can see.

  • The viewer needs to be able to map back from the data symbols to the data values.

  • Avoid overplotting.

  • Think about accessibility.

The Eye

  • Cones detect colour and are packed densely at the centre of the retina (the fovea).

  • Rods detect light/dark, are spread elsewhere, less densely.

  • Foveal vision is very detailed (peripheral much less so).

  • We view an image through a series of fixations at specific locations with rapid movements (saccades) between.

  • We only get a detailed view at the fovea for each fixation.

  • The fovea only covers 1-2cm of view (at screen distance).

  • We view an image through a series of fixations at specific locations with rapid movements (saccades) between.

  • We only get a detailed view at the fovea for each fixation.

  • The fovea only covers 1-2cm of view (at screen distance).

  • Does the image change when you focus on different corners of the image below (at screen distance)?

An effective data visualisation should not contain too much detail or too many separate components.

  • We cannot perceive all of the detail in an image all at once.

A very simple model of the visual system

The Visual System

  • Basic features within an image (colours, borders, orientations) are identified very rapidly, in parallel, without conscious effort, and stored in iconic memory.

  • Iconic memory is very transient; it essentially reflects where we are currently looking.

The Visual System

  • Visual information is held for longer in (short-term, visual) working memory. Basic features are combined and identified as shapes and patterns.

  • The capacity of working memory is severely limited; only between 4 and 7 “items” can be held at once.

The Visual System

  • Prior experience and knowledge is merged from long term memory to identify higher-level shapes and meaningful objects. Some objects may be stored as long-term memories.

  • Long term memory is (remarkably) persistent and limitless.

Iconic Memory

Iconic Memory

  • A data visualisation can be effective if it makes use of simple features that are identified early in visual processing.

An effective data visualisation should ensure that the important elements are visible to iconic memory

  • Iconic memory feeds later stages of visual processing.

  • Important elements should employ basic features.

Working Memory

  • For 2015, which ethnic group has the highest count?

Working Memory

  • For 2015, which ethnic group has the highest count?

An effective data visualisation will not overload working memory.

  • There must not be too many features to remember at once.

Long-Term Memory

  • We can recognise familiar shapes very easily, without requiring any labels.

Long-Term Memory

  • Maps are an example of familiar shapes that can convey information without labels.
head(offendersByDistrict)
          district     youth     minor     court
1    Auckland City 0.1992738 0.4175744 0.6432825
2    Bay of Plenty 0.1954578 0.4119823 0.6669407
3       Canterbury 0.1643473 0.4109379 0.6946841
4          Central 0.2057527 0.4226398 0.5997448
5 Counties/Manukau 0.1988442 0.3835616 0.5984589
6          Eastern 0.1745014 0.4277066 0.6696937

Long-Term Memory

  • Unfamiliar visual representations may require training (formation of new knowledge).

An effective data visualisation will only make use of existing knowledge.

  • Familiar shapes and structures will be processed more rapidly; otherwise new associations must be created.

  • Watch out for regional and cultural biases!

A Simple Model of the Visual System

Attention

Attention

  • Where we look first, and where we look most, is not random.

  • Bottom-up: some visual differences grab attention very rapidly and without conscious effort.

  • Top-down: goals and tasks direct attention.

    This affects both what we look for and what we see
    (what gets filtered out).

Pre-Attentive Pop Out

  • This is particularly true of changes in colour, but size and shape are also effective.

Pre-Attentive Pop Out

  • Colour is effective for highlighting because it grabs attention and is identified very early in the visual system.

An effective data visualisation should ensure that the important elements are attention-grabbing

  • Important elements should be a specific colour and/or size and/or shape.

Visual illusions

Visual Illusions

  • The visual system is designed for the natural environment.

    • We can cope with a wide range of ambient luminance (inside a dark cave vs. bright sunshine) because our perception of light/dark is relative rather than absolute.

    • We naturally interpret a scene as three dimensional.

  • Data visualisations are artificial images that can confuse and betray our visual system.

Visual Illusions

  • Visual illusions demonstrate that, even if we map identical values to identical visual features, what we perceive may be misleading.

The Contrast Effect

The Contrast Effect

The Contrast Effect

  • Is the crime rate for 13-year-olds in 2018 the same as the crime rate for 12-year-olds in 2011?

The Contrast Effect

The Contrast Effect

Moire Vibrations

  • Patterns consisting of fine grids or parallel lines can appear to move or vibrate.

  • This means that we should be very cautious about using pattern fills.

Accidental 3D

An effective data visualisation does not contain visual illusions

  • Some features of the visual system that help us to navigate the real world can hinder our ability to correctly perceive artificial images like data visualisations.

  • An awareness of visual illusions can help us to avoid producing misleading data visualisations.

Perception is Relative

  • Weber’s Law suggests that perception of differences is relative to the absolute intensity of stimulus; we can only detect larger differences in more intense stimuli.

  • For each pair of bars, which bar is bigger?

Reference points, such as grid lines, can help with the perception of differences in positions and lengths.

  • Reference points reduce the absolute size of lengths that are to be compared.

Summary

Summary

  • An effective data visualisation:

    • Makes the data visible to the viewer and is aware of possible visual impairments.
    • Maps data values to basic features.
    • Has a low cognitive load.
    • Does not require training.
    • Directs attention to important elements.
    • Avoids unwanted visual illusions.
  • If in doubt, keep it simple and familiar.

Exercises

Exercise

  • Is the offending rate higher or lower for older children?

Exercise

  • Identify two examples of contrast effects in the data visualisation below.

    How would you fix the problem?

  • What is happening to the difference between the offending rate for 14-year-olds and the offending rate for 16-year-olds over time?

  • In what way(s) is this an (in)effective data visualisation?

  • In what way(s) is this an (in)effective data visualisation?

    New Zealand Listener January 13-19 2024.