A data visualisation consists of data symbols, guides, and labels.
We produce data symbols by mapping data values to the visual features of a shape.
Which visual feature we choose depends on:
We have so far looked at the components that make up a data visualisation - data symbols, guides, and labels - in isolation from one another.
In this topic, we consider the overall arrangement and layout of a data visualisation.
Design ideas are aimed at improving aesthetic appeal.
Design ideas also help to produce more consistent, coherent, and comprehensible data visualisations.
CRAP design
{ggplot2} themes
Fancier Text
Clutter
CRAP Design
Robin Williams popularised four standard design principles: contrast, repetition, alignment, and proximity.
The principle of contrast states that different elements of an image should either be obviously similar or obviously different.
Contrast should help the viewer to focus on the important elements of the data visualisation.
Contrast is related to pre-attentive pop out.
An example of contrast is the use of colourful lines against the dull grey background.
This helps the viewer to focus on the main data elements of the plot.
The principle of repetition states that related elements of an image should have similar visual features.
Repetition provides a structure and organisation that helps the viewer to navigate and comprehend an image.
Repetition relates to Gestalt grouping.
An example of repetition is the use of consistent colours between the lines in the main plot and the lines in the legend.
The principle of alignment states that every element of an image should be aligned in some way with another element of the image.
Alignment helps to make an image look cleaner and more orderly, which again helps with navigation and reduces the cognitive load.
Examples of alignment are the left-justification of the plot title relative to the grey background and the legend title relative to the legend background.
The principle of proximity states that related elements of an image should be placed close together.
Proximity relates to Gestalt grouping.
Proximity also helps with structure and organisation, particularly to help the viewer to identify elements that belong together as a group.
An example of proximity is the placement of the elements of the plot legend
The elements of the legend are close to each other, with an empty margin between them and the main plot.
We can apply each of the CRAP principles of graphic design to both critically assess an existing data visualisation and to suggest ideas for improvement.
We can increase the contrast between elements:
There is now more emphasis on the title and the data.
We can align multiple elements with both the left and right ends of the data lines.
There is now a simpler and cleaner structure.
We can repeat the colours in lines and labels.
This creates clear groupings of these elements.
We can increase the proximity of the labels and the lines.
This removes the need for a separate legend, further simplifying the plot.
{ggplot2} themes
Producing a complete data visualisation involves making a lot of design decisions.
Part of the convenience of {ggplot2} comes from the fact that it makes a lot of decisions for us.
If we want to control the contrast, repetition, alignment, and proximity of visual elements, we need to take control of these decisions.
Themes provide a way to modify the default {ggplot2} decisions.
{ggplot2} makes a lot of decisions about scales, colours, and labels.
The theme()
function allows individual details to be
adjusted.
Functions like theme_bw()
and
theme_dark()
change a whole host of defaults at
once.
Functions like theme_bw()
and
theme_dark()
change a whole host of defaults at once.
theme()
functionThe arguments to theme()
are different theme
settings, e.g., plot.title
to control the style of the plot
title.
The value supplied for each setting depends on the type of
setting, e.g., plot.title
controls a text element so we
provide a value using element_text()
.
Functions like element_text()
have arguments to
control text settings, like family
amd
size
.
Functions like element_text()
have arguments to
control text settings, like family
and
size
.
element_rect()
provides settings for borders and
backgrounds, like panel.background
.
element_line()
provides settings for lines, like
panel.grid
.
element_blank()
is used to remove a plot element
entirely.
theme()
functionSome themes settings are more general and provide defaults for more specific settings.
axis.title
,
axis.title.x
, and axis.title.x.bottom
settings.Some theme settings are neither text, nor lines, nor rectangles.
legend.position
is a character vector that
specifies the location of the plot legend: "left"
,
"right"
, "bottom"
, "top"
, or
"none"
.line | axis.ticks.theta | axis.line.x.top | legend.position.inside | plot.background |
rect | axis.ticks.r | axis.line.x.bottom | legend.direction | plot.title |
text | axis.minor.ticks.x.top | axis.line.y | legend.byrow | plot.title.position |
title | axis.minor.ticks.x.bottom | axis.line.y.left | legend.justification | plot.subtitle |
aspect.ratio | axis.minor.ticks.y.left | axis.line.y.right | legend.justification.top | plot.caption |
axis.title | axis.minor.ticks.y.right | axis.line.theta | legend.justification.bottom | plot.caption.position |
axis.title.x | axis.minor.ticks.theta | axis.line.r | legend.justification.left | plot.tag |
axis.title.x.top | axis.minor.ticks.r | legend.background | legend.justification.right | plot.tag.position |
axis.title.x.bottom | axis.ticks.length | legend.margin | legend.justification.inside | plot.tag.location |
axis.title.y | axis.ticks.length.x | legend.spacing | legend.location | plot.margin |
axis.title.y.left | axis.ticks.length.x.top | legend.spacing.x | legend.box | strip.background |
axis.title.y.right | axis.ticks.length.x.bottom | legend.spacing.y | legend.box.just | strip.background.x |
axis.text | axis.ticks.length.y | legend.key | legend.box.margin | strip.background.y |
axis.text.x | axis.ticks.length.y.left | legend.key.size | legend.box.background | strip.clip |
axis.text.x.top | axis.ticks.length.y.right | legend.key.height | legend.box.spacing | strip.placement |
axis.text.x.bottom | axis.ticks.length.theta | legend.key.width | panel.background | strip.text |
axis.text.y | axis.ticks.length.r | legend.key.spacing | panel.border | strip.text.x |
axis.text.y.left | axis.minor.ticks.length | legend.key.spacing.x | panel.spacing | strip.text.x.bottom |
axis.text.y.right | axis.minor.ticks.length.x | legend.key.spacing.y | panel.spacing.x | strip.text.x.top |
axis.text.theta | axis.minor.ticks.length.x.top | legend.frame | panel.spacing.y | strip.text.y |
axis.text.r | axis.minor.ticks.length.x.bottom | legend.ticks | panel.grid | strip.text.y.left |
axis.ticks | axis.minor.ticks.length.y | legend.ticks.length | panel.grid.major | strip.text.y.right |
axis.ticks.x | axis.minor.ticks.length.y.left | legend.axis.line | panel.grid.minor | strip.switch.pad.grid |
axis.ticks.x.top | axis.minor.ticks.length.y.right | legend.text | panel.grid.major.x | strip.switch.pad.wrap |
axis.ticks.x.bottom | axis.minor.ticks.length.theta | legend.text.position | panel.grid.major.y | complete |
axis.ticks.y | axis.minor.ticks.length.r | legend.title | panel.grid.minor.x | validate |
axis.ticks.y.left | axis.line | legend.title.position | panel.grid.minor.y | |
axis.ticks.y.right | axis.line.x | legend.position | panel.ontop |
The code for the modified line plot controls many theme settings.
ggplot(crimeGender) +
geom_line(aes(yearDate, count, colour=gender), linewidth=1) +
labs(title="Number of Incidents") +
scale_x_date(expand=expansion(0)) +
geom_label(data=subset(crimeGender, year == 2021),
aes(label=gender, x=yearDate, y=count, colour=gender),
hjust=1, vjust=-.5, fontface="bold", label.size=NA,
label.padding=unit(0, "mm"), fill=rgb(.1, .1, .1)) +
geom_text(data=subset(crimeGender, gender == "Male"),
aes(label=year, x=yearDate, y=-1000),
hjust=1, colour="grey40", size=3) +
theme(plot.background=element_rect(colour=NA, fill=rgb(.1,.1,.1)),
panel.background=element_rect(colour=NA,
fill=rgb(.1,.1,.1)),
panel.grid.major.y=element_line(colour="grey40",
linewidth=.2),
panel.grid.major.x=element_blank(),
panel.grid.minor=element_blank(),
legend.position="none",
text=element_text(colour="grey"),
plot.title=element_text(colour="white", size=16,
face="bold"),
axis.title=element_blank(),
axis.text.y=element_text(colour="grey40", vjust=0),
axis.text.x=element_blank(),
axis.ticks=element_blank(),
plot.margin=unit(c(.5, 1, .5, 1), "cm"))
Fancier Text
Standard text labels in R can only have a single font and style.
The {ggtext} package provides support for changing the style within a label.
Two things are required:
basic markup (markdown or HTML/CSS) in the text.
element_markdown()
within theme()
.
ggplot(crimeGender) +
geom_line(aes(yearDate, count, colour=gender)) +
labs(title=paste0('<span style="color: ', cols[2],
'">**Males**</span>',
" Responsible for More Offences than ",
'<span style="color: ', cols[1],
'">**Females**</span>'),
caption=paste0('Total number of offences by youth ',
'aged between 14 and 16 from 2010 ',
'to 2020 in New Zealand.<br>',
'**Source:** New Zealand Ministry ',
'of Justice *"Youth Justice Indicator ',
'Report"*, 2021.')) +
theme(plot.title=element_markdown(),
plot.caption=element_markdown(hjust=0),
legend.position="none")
Clutter
We have emphasised the importance of mapping data values to visual features that facilitate mapping back to the data values.
Visual elements that have no connection to the data create clutter.
Clutter may distract attention from the data symbols.
Clutter may add to the complexity of the data visualisation (the number of elements to fixate on).
Not all embellishments are necessarily evil.
An effective data visualisation will have minimal clutter.
Tufte calls clutter chart junk and suggests maximising the data-ink ratio.
Many other data visualisation guidelines advocate minimising clutter.
There are some dissenting voices.
There is some evidence that embellishments may improve engagement and recall.
There is also some evidence that making the viewer work harder may improve accuracy and retention.
Summary
Simple applications of contrast, repetition, alignment, and proximity can have a significant impact on the overall effectiveness of a data visualisation.
{ggplot2} themes allow control over many of the details of a data visualisation.
The {ggtext} package allows finer control over text labels.
Less clutter is generally better.
Exercises
Can you identify any problems in the design of this visualisation?
Can you suggest any ways to improve it?
Figure 1 from “Entropy Ordered Shapes as Bivariate Glyphs”, Halliman et al (2023).
Can you identify any weaknesses in the data visualisation below in terms of contrast, repetition, alignment, and proximity?
Can you create this modified version of the data visualisation?
HINT: These are mostly just theme()
settings.
Can you see anything wrong with this data visualisation?
New Zealand Listener December 9-15 2023.