A data visualisation involves mapping data values to visual features.
Which visual feature we choose depends on:
We can map back from visual features to the raw data and to data statistics.
When we map to more than one visual feature, we need to consider possible interactions between features.
We have only considered mappings that produce simple data symbols
In this section we will begin to consider more complex data symbols.
Multi-Value Data Symbols
Visual Shapes
Visual Shape Caveats
Multivariate Data Symbols
Reusing Visual Features
Visual Objects
Multi-Value Data Symbols
crimeGender
data frame contains information on the
crime rate
for each gender
across multiple
years
.## gender year count pop rate genderFactor yearDate
## 1 Male 2011 8808 95040 926.7677 Male 2011-06-30
## 2 Female 2011 4207 90410 465.3246 Female 2011-06-30
## 4 Male 2012 8031 95260 843.0611 Male 2012-06-30
## 5 Female 2012 3513 90050 390.1166 Female 2012-06-30
## 7 Male 2013 6768 94440 716.6455 Male 2013-06-30
## 8 Female 2013 2976 89260 333.4080 Female 2013-06-30
We can map one data value from each of two variables to two visual feature of one data symbol.
The gender
and rate
values from
one year
map to the horizontal and
vertical position of one point.
We can map one data value from each of two variables to two visual feature of one data symbol.
Two data values map to one symbol.
We can map two data values from each of two variables to two visual features of one data symbol.
Two rate
values and
two gender
s map to the start and end of
one line (for each year
).
We can map two data values from each of two variables to two visual features of one data symbol.
Four data values map to one data symbol.
The mapping is to the position of the ends of the lines, but additional visual features, the length and angle of the lines, are produced.
The angle of the lines allows us to directly perceive differences in the raw data.
Visual Shapes
We can map multiple values from each of two variables to two visual features of one data symbol.
Multiple rate values from multiple years map to the positions for one polyline.
Multiple rate values from multiple years map to the positions of one polyline.
We can map multiple values from each of two variables to two visual features of one data symbol.
Multiple data values map to one data symbol.
The mapping is to the points that the polyline passes through, but we no longer have a simple data symbol.
Instead we have a visual shape.
The complex data symbol allows us to perceive statistical summaries of the data:
The complex data symbol allows us to perceive statistical summaries of the data
It is no longer easy to perceive the raw data.
When we map data values to the position of a line, we get additional visual features.
With only a few data values, we get length and angle, which map back to differences in the raw data.
With many values, we get visual shapes that map back to data statistics.
Visual Shape Caveats
The shapes of lines are dependent on the aspect ratio of a plot.
The aspect ratio is controlled by
theme(aspect.ratio)
(coming soon!)
The ggthemes::bank_slopes()
function calculates an
aspect ratio so that the median absolute slope is 45 degrees.
The two lines below appear to converge; we see a curved wedge that narrows.
However, in terms of y-positions, the vertical distance between the lines is constant.
A connecting line implies continuity (and constant change).
This is not always appropriate.
Multivariate Data Symbols
The RWCperGame
data frame contains measures of
performance at the Rugby World Cup of 2023 for different countries, plus
the hemisphere
that each country is from.
## # A tibble: 6 × 11
## country hemisphere yellowcards redcards cleanbreaks
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 Namibia South 1 0.5 2.5
## 2 Romania North 1.25 0 2.75
## 3 Chile South 1.25 0 3.75
## 4 Samoa South 1.25 0.25 3.75
## 5 Australia South 0.5 0 5.25
## 6 Georgia North 0.5 0 5.25
## # ℹ 6 more variables: tackles <dbl>, points <dbl>,
## # conversions <dbl>, offloads <dbl>, tries <dbl>,
## # runs <dbl>
cleanbreaks
and tries
and
hemisphere
from one country map to the horizontal
and vertical position and
colour of one point.We can map data values from three variables to three visual features of a data symbol.
Reusing Visual Features
An alternative to mapping n data variables to n different visual features is to map n data variables to the same visual feature.
Mapping identical values to position results in overplotting.
One option is to stack bars.
Both year
and gender
are mapped to the
vertical position of the bars.
One option is to dodge bars.
Both year
and gender
are mapped to the
horizontal position of the bars.
We can map one or two variables to position in space and then make each data symbol a data visualisation.
All visual features become available again, including position.
We can map one or two variables to position in space and then make each data symbol a data visualisation.
All visual features become available again, including position.
The facet_wrap()
and facet_grid()
functions generate separate panels for each level of
one or more variables.
With cartesian coordinates, we are limited to horizontal and vertical position.
We can map continuous variables from RWCperGame
to
the positions of a line through
parallel coordinates.
The order of the axes can make a big difference.
We can map continuous variables from RWCperGame
to
the length of lines radiating from the same centre.
The mapping is to the points that the polyline passes through, but we no longer have a simple data symbol.
Instead we have a visual shape.
The complex data symbol allows us to perceive statistical summaries of the data:
The complex data symbol allows us to perceive statistical summaries of the data
It is no longer easy to perceive the raw data.
Visual Objects
We can map data values to the visual features, e.g., size, of data symbols that resemble a familiar object.
We can map data values to the visual features, e.g., count, of data symbols that resemble a familiar object.
We can map data values to more abstract features of data symbols that resemble familiar objects
The visual object allows us to perceive statistical summaries of the data.
It is very difficult to perceive the raw data.
Complex Data Symbols in {ggplot2}
The ggparcoord()
function from the {GGally} package
can be used to produce a parallel coordinates plot.
The data
argument is a data frame.
The columns
argument selects the variables of
data
to plot.
The scale
argument provides options for scaling the
parallel axes.
The order
argument provides ways to automatically
select the order of the parallel axes.
The geom_starglyph()
function from the {gglyph}
package can be used to produce small multiples.
The cols
argument selects the variables to
plot.
The geom_chernoff()
function from the {ggChernoff}
package can be used to produce Chernoff faces.
The smile
, brow
, nose
, and
eyes
aesthetics can be used to map data values to
properties of the faces.
Summary
Multi-value data symbols represent multiple data values in a single data symbol.
Multivariate data symbols represent more than two variables in a single data symbol.
Visual shapes and visual objects allow us to perceive higher-level data statistics.
Exercises
The crimeDistrict
data frame contains the
count
of offenders per district
over
year
s.
district
s?## district year count pop rate yearDate
## 1 Northland 2011 599 6829.58 877.0671 2011-06-30
## 2 Waitematā 2011 1072 23290.05 460.2824 2011-06-30
## 3 Auckland City 2011 765 16867.14 453.5446 2011-06-30
## 4 Counties/Manukau 2011 1494 24418.10 611.8412 2011-06-30
## 5 Waikato 2011 1055 14869.22 709.5194 2011-06-30
## 6 Bay of Plenty 2011 1329 14158.22 938.6773 2011-06-30
Does the data visualisation below answer the questions?
What mappings/visual features are involved?
Can you write appropriate {ggplot2} code?
(grey lines an
optional challenge)
The crimeYear
data frame contains the
total
number of offenders per year
.
year
s?year
had the largest change in
total
?## # A tibble: 6 × 3
## yearDate total year
## <date> <int> <int>
## 1 2011-06-30 13018 2011
## 2 2012-06-30 11545 2012
## 3 2013-06-30 9747 2013
## 4 2014-06-30 7965 2014
## 5 2015-06-30 6843 2015
## 6 2016-06-30 6253 2016
Does the data visualisation below answer the questions?
What mappings/visual features are involved?
Can you write appropriate {ggplot2} code?
Does the data visualisation below answer the questions?
What mappings/visual features are involved?
Can you write appropriate {ggplot2} code?
Does the data visualisation below answer the questions?
What mappings/visual features are involved?
Can you write appropriate {ggplot2} code?
Does the data visualisation below answer the questions?
What mappings/visual features are involved?
Can you write appropriate {ggplot2} code?
Is this a sensible use of visual objects?
New Zealand Listener October 14-20 2023.