A data visualisation consists of data symbols, guides, and labels.
A data visualisation can help to answer questions.
We need to choose a mapping from data values to data symbols.
The combination of geom and aesthetic that we choose maps data values to a visual feature.
Visual features are the properties of data symbols that our visual system identifies very rapidly, in parallel, and without conscious effort.
Our task is to identify a good mapping from data values to visual features.
One idea is to select geoms and aesthetics so that data values are mapped to visual features that have the same type as the data.
For visualising qualitative data, we need to identify which visual features are qualitative visual features.
Qualitative features
Ordinal features
Qualitative scales
Colour
Feature mismatch
Gestalt Principles
Qualitative Features
Colour and shape are examples of visual features that can represent qualitative values.
Position is also very effective for qualitative data.
By contrast, length, area, and angle are not appropriate for representing qualitative values.
The RWCperGame
data frame contains measures of
performance at the Rugby World Cup of 2023 for different countries, plus
the hemisphere
that each country is from.
hemisphere
is an example of
qualitative data.
# A tibble: 6 × 11
country hemisphere yellowcards redcards cleanbreaks
<chr> <fct> <dbl> <dbl> <dbl>
1 Namibia South 1 0.5 2.5
2 Romania North 1.25 0 2.75
3 Chile South 1.25 0 3.75
4 Samoa South 1.25 0.25 3.75
5 Australia South 0.5 0 5.25
6 Georgia North 0.5 0 5.25
# ℹ 6 more variables: tackles <dbl>, points <dbl>,
# conversions <dbl>, offloads <dbl>, tries <dbl>,
# runs <dbl>
We map qualitative data to a visual feature in order to be able to identify values that belong to different groups or categories.
If we map hemisphere
to a
qualitative visual feature, we can answer questions
like:
Do teams from the Northern hemisphere
make more
clean breaks than teams from the South (on average)?
The crucial visual task is to distinguish data for Northern teams from data for Southern teams.
We can visualise qualitative data using the shape of points.
hemisphere
make more clean
breaks than teams from the South (on average)?geom_point()
and the shape
aesthetic map
data values to shape.We can visualise qualitative data using the position of points.
hemisphere
make more clean
breaks than teams from the South (on average)?x
and y
aesthetics map qualitative
data values to position for geom_point()
and geom_col()
.We can visualise qualitative data using the colour of points.
hemisphere
make more clean
breaks than teams from the South?colour
aesthetic maps data values to
colour for geom_point()
,
geom_col()
, and geom_line()
.colour
aesthetic
maps data values to the border colour and the
fill
aesthetic maps data values to the fill
colour.Ordinal Features
The crimeLevelTotal
data frame contains the
total
number of offences broken down by level
of crime.
Are less severe crimes more common than more severe crimes?
# A tibble: 5 × 3
level total prop
<ord> <int> <dbl>
1 Low 24242 0.283
2 Low-Medium 21513 0.251
3 Medium 13823 0.161
4 Medium-High 18315 0.214
5 High 7793 0.0909
We can visualise ordinal data using the colour of bars.
Are less severe crimes more common than more severe crimes?
We perceive darker colours as “greater than” lighter colours.
Qualitative Scales
The mapping from data values to a visual feature also depends on the scale of the mapping.
How are data values transformed to values on the visual feature?
The scale should be chosen so that differences in the data are visible in the data visualisation.
We can control the set of shapes that data
values are mapped to using scale_shape_manual()
.
scale_y_discrete()
.Colour
The mapping from data values to a visual feature also depends on the scale of the mapping.
How are data values transformed to values on the visual feature?
scale_colour_manual()
and
scale_fill_manual()
allow us to select the colours that
data values map to.
However, there are a LOT of colours to choose from!
We have been talking about colour as if it is a single visual feature, but it can be split into three distinct features: hue, chroma (colourfulness), and luminance (light/dark).
Varying hue creates a qualitative colour palette.
Varying chroma or luminance (or both) creates an ordinal (or sequential) colour palette.
A diverging colour palette combines two sequential palettes with a neutral centre.
Selecting a colour palette is a complex task, with competing constraints.
On one hand there is a desire for visual balance in order to avoid emphasising one category over another.
But that does not translate well to greyscale for printing.
There is also a need to account for colour-vision deficiency.
This shows what the previous colour palette might look like to a colour blind viewer.
There are also situations where a deliberate emphasis is desired in order to highlight one particular category.
There are a number of predefined colour scales in {ggplot2}.
The R Journal article Coloring in R’s Blind Spot also offers some advice.
Feature Mismatch
If the visual feature has a “higher” type than the data, we risk adding superfluous or incorrect information.
Gestalt Principles
An effective qualitative mapping maps different data values to visually different data symbols.
An effective qualitative mapping maps the same data values to visually similar data symbols.
An effective qualitative mapping maps groups of data values to visual groups of data symbols.
The Gestalt psychologists identified several principles of visual grouping.
These are examples of visual processing that happen very rapidly and early in our visual system.
Mapping the same qualitative data to the same colour or shape produces visual groups through similarity.
Some of the Gestalt laws are stronger than others.
It makes sense to map qualitative data to qualitative visual features.
Colour, shape, and position are qualitative visual features.
Qualitative data can be visualised with:
Colour, shape, and position are appropriate visual features for representing qualitative data.
Colour can be broken into hue, chroma, and luminance.
Hue is only useful for nominal data.
Chroma and luminance can be used for ordinal data.
Position is a very effective and flexible visual feature.
The goal when mapping qualitative data is to create visual groups through:
We want to compare the count of offences for different
gender
s:
Male
higher or lower than the counts
for Female
(on average)? gender year count pop rate genderFactor yearDate
1 Male 2011 8808 95040 926.7677 Male 2011-06-30
2 Female 2011 4207 90410 465.3246 Female 2011-06-30
4 Male 2012 8031 95260 843.0611 Male 2012-06-30
5 Female 2012 3513 90050 390.1166 Female 2012-06-30
7 Male 2013 6768 94440 716.6455 Male 2013-06-30
8 Female 2013 2976 89260 333.4080 Female 2013-06-30
What type of data is gender
?
Which visual features would be appropriate for representing
gender
?
Write {ggplot2} code to produce data visualisations that allow us to answer the questions of interest.
The following code produces a simple bar chart
Modify the code so that the Male bar is light blue and the Female bar is pink.
Can you see anything wrong with this data visualisation?