A data visualisation consists of data symbols, guides, and labels.
A data visualisation can help to answer questions.
We need to choose a mapping from data values to data symbols.
We want a data visualisation to take advantage of the strengths of the visual system (and avoid the weaknesses of the visual system).
In this section we will describe a very simple model of human visual perception and use that to identify some strengths and weaknesses.
This will lead to some basic guidelines for creating an effective data visualisation.
The eye.
A very simple model of the visual system.
Attention.
Visual illusions.
The eye
Light enters the eye through the pupil and is focused by the lens onto the retina at the back of the eye.
Around 100 million retinal nerve cells are combined into the optic nerve, about 1 million nerve fibres, that connects directly to the brain.
An effective data visualisation is one that we can see.
What can we not see in the data visualisation below?
What can we not see in the data visualisation below?
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
Possible solutions include semitransparency and jittering (adding a small random value).
Another consideration is that the audience for a data visualisation may have some form of visual impairment.
Examples of measures that can be taken include:
Providing alternative text
(plus text-reader-friendly formats
like R Markdown or HTML).
See Liz Hare’s presentation to R-Ladies New York
Selecting colour-blind safe colours
(we will return to this
later).
An effective data visualisation is one that we can see.
The viewer needs to be able to map back from the data symbols to the data values.
Avoid overplotting.
Think about accessibility.
Cones detect colour and are packed densely at the centre of the retina (the fovea).
Rods detect light/dark, are spread elsewhere, less densely.
Foveal vision is very detailed (peripheral much less so).
We view an image through a series of fixations at specific locations with rapid movements (saccades) between.
We only get a detailed view at the fovea for each fixation.
The fovea only covers 1-2cm of view (at screen distance).
We view an image through a series of fixations at specific locations with rapid movements (saccades) between.
We only get a detailed view at the fovea for each fixation.
The fovea only covers 1-2cm of view (at screen distance).
An effective data visualisation should not contain too much detail or too many separate components.
A very simple model of the visual system
Basic features within an image (colours, borders, orientations) are identified very rapidly, in parallel, without conscious effort, and stored in iconic memory.
Iconic memory is very transient; it essentially reflects where we are currently looking.
Visual information is held for longer in (short-term, visual) working memory. Basic features are combined and identified as shapes and patterns.
The capacity of working memory is severely limited; only between 4 and 7 “items” can be held at once.
Prior experience and knowledge is merged from long term memory to identify higher-level shapes and meaningful objects. Some objects may be stored as long-term memories.
Long term memory is (remarkably) persistent and limitless.
An effective data visualisation should ensure that the important elements are visible to iconic memory
Iconic memory feeds later stages of visual processing.
Important elements should employ basic features.
An effective data visualisation will not overload working memory.
We can recognise familiar shapes very easily, without requiring any labels.
district youth minor court
1 Auckland City 0.1992738 0.4175744 0.6432825
2 Bay of Plenty 0.1954578 0.4119823 0.6669407
3 Canterbury 0.1643473 0.4109379 0.6946841
4 Central 0.2057527 0.4226398 0.5997448
5 Counties/Manukau 0.1988442 0.3835616 0.5984589
6 Eastern 0.1745014 0.4277066 0.6696937
Unfamiliar visual representations may require training (formation of new knowledge).
An effective data visualisation will only make use of existing knowledge.
Familiar shapes and structures will be processed more rapidly; otherwise new associations must be created.
Watch out for regional and cultural biases!
Attention
Where we look first, and where we look most, is not random.
Bottom-up: some visual differences grab attention very rapidly and without conscious effort.
Top-down: goals and tasks direct attention.
This affects both what we look for and what we see
(what gets
filtered out).
An effective data visualisation should ensure that the important elements are attention-grabbing
Visual illusions
The visual system is designed for the natural environment.
We can cope with a wide range of ambient luminance (inside a dark cave vs. bright sunshine) because our perception of light/dark is relative rather than absolute.
We naturally interpret a scene as three dimensional.
Data visualisations are artificial images that can confuse and betray our visual system.
Visual illusions demonstrate that, even if we map identical values to identical visual features, what we perceive may be misleading.
Patterns consisting of fine grids or parallel lines can appear to move or vibrate.
This means that we should be very cautious about using pattern fills.
An effective data visualisation does not contain visual illusions
Some features of the visual system that help us to navigate the real world can hinder our ability to correctly perceive artificial images like data visualisations.
An awareness of visual illusions can help us to avoid producing misleading data visualisations.
Weber’s Law suggests that perception of differences is relative to the absolute intensity of stimulus; we can only detect larger differences in more intense stimuli.
For each pair of bars, which bar is bigger?
Reference points, such as grid lines, can help with the perception of differences in positions and lengths.
Summary
An effective data visualisation:
If in doubt, keep it simple and familiar.
Exercises
Is the offending rate higher or lower for older children?
Identify two examples of contrast effects in the data visualisation below.
How would you fix the problem?
What is happening to the difference between the offending rate for 14-year-olds and the offending rate for 16-year-olds over time?
In what way(s) is this an (in)effective data visualisation?
In what way(s) is this an (in)effective data visualisation?
New Zealand Listener January 13-19 2024.