A data visualisation consists of data symbols, guides, and labels.
A data visualisation can help to answer questions.
We need to choose a mapping from data values to data symbols.
The combination of geom and aesthetic that we choose maps data values to a visual feature.
Visual features are the properties of data symbols that our visual system identifies very rapidly, in parallel, and without conscious effort.
Our task is to identify a good mapping from data values to visual features.
We can select geoms and aesthetics so that data values are mapped to visual features that are perceived more accurately.
We can select geoms and aesthetics so that data values are mapped to visual features that have the capacity to represent all data values.
Quantitative accuracy
Quantitative caveats
Qualitative capacity
Visual Congruence
Quantitative Accuracy
For quantitative comparisons, we need to be able to perceive the size of differences between values.
It is important that we can accurately perceive the size of differences and ratios.
Experimental studies have established an agreed ordering of visual dimenions in terms of accuracy.
An effective data visualisation will map quantitative data to length or position.
Quantitative Caveats
Comparing two positions or lengths is less effective when they are on unaligned scales.
Ordering the bars in a bar plot can help to make comparisons are more accurate.
Non-linear scales can be useful to make differences visible
The accuracy of position and length only holds for linear scales.
Using a non-linear scaling restricts these features to just ordinal comparisons.
The transform
argument to
scale_x_continuous()
allows non-linear mappings
There are also convenience functions like
scale_x_log10()
.
If we only require ordinal comparisons for quantitative data, then even luminance and/or chroma may suffice.
Which level
of crime is most common?
Which is
least common?
It may be necessary to fall back on luminance and/or chroma if position has already been used for other things.
Qualitative Capacity
For qualitative data, we only need to be able to perceive that two values are different.
Given N different categories, we require N different visual values that are clearly distinguishable from each other.
Visual Congruence
congruent
/ˈkɒŋɡrʊənt/
adjective: congruent
1.
in agreement or harmony.
We can select geoms and aesthetics so that data values are mapped to visual features that are congruent with the data.
Familiar connections between visual features and data values make it easier and faster to comprehend a data visualisation.
A proportion represents a part of a whole.
## # A tibble: 5 × 3
## level total prop
## <ord> <int> <dbl>
## 1 Low 24242 0.283
## 2 Low-Medium 21513 0.251
## 3 Medium 13823 0.161
## 4 Medium-High 18315 0.214
## 5 High 7793 0.0909
An angle of a circle represents a part of a whole.
A length of a rectangle can represent a part of a whole.
Dates are cyclic.
## month year Freq
## 1 1 2019 0
## 2 2 2019 0
## 3 3 2019 0
## 4 4 2019 0
## 5 5 2019 0
## 6 6 2019 0
Angles around a circle are cyclic.
Note that these are overall reports of crimes
(not youth
crime).
We have already seen that some visual features have no representation of zero (e.g., hue).
Features that can represent zero still may not be able to
represent negative values.
Some hues are congruent with particular objects or concepts.
Blue is cold and red is hot.
Bananas are yellow.
Some associations are very culture-specific and/or era-specific.
A larger length or area should correspond to a larger data value.
A darker colour should correspond to a
larger data value.
Summary
For quantitative data, accuracy matters.
For qualitative data, capacity matters.
colour and shape are limited to 5-7 different values.
position has a large capacity.
Congruent visual features make it easier to comprehend a data visualisation.
Exercises
We want to compare the total
of offenders for
different ethnic group
s.
## # A tibble: 3 × 4
## group total groupFactor groupNumeric
## <chr> <int> <fct> <dbl>
## 1 European/Other 31274 European/Other 1
## 2 Māori 41787 Māori 2
## 3 Pasifika 6993 Pasifika 3
Identify the the mappings in the data visualisation below.
Write {ggplot2} code to draw the data visualisation.
Write {ggplot2} code to draw an improved version.
We want to compare the number of cleanbreaks
for
countries from different different hemispheres
.
## country hemisphere yellowcards redcards cleanbreaks tackles points
## 11 Namibia South 1.00 0.5 2.50 102.00 9.25
## 14 Romania North 1.25 0.0 2.75 142.50 8.00
## 3 Chile South 1.25 0.0 3.75 132.25 6.75
## conversions offloads tries runs
## 11 0.50 2.25 0.75 92.00
## 14 0.75 1.50 1.00 81.00
## 3 0.50 5.25 1.00 102.25
Identify the the mappings in the data visualisation below.
Write {ggplot2} code to draw the data visualisation.
Write {ggplot2} code to draw an improved version.
Can you see anything wrong with this data visualisation?