Effective Data Visualisation with R
Text

Paul Murrell
The University of Auckland

Review

  • A data visualisation consists of data symbols, guides, and labels.

  • We produce data symbols by mapping data values to the visual features of a shape.

  • Which visual feature we choose depends on:

    • Whether we have quantitative or qualitative data.
    • The accuracy and capacity of the visual feature.
    • The question we are interested in answering.

Text

  • We can map data values to text data symbols and text on guides (axes and legends).

  • We can map metadata and data statistics to text labels (titles, captions, and annotations).

Text

  • The importance of text

  • Text data symbols

  • Text Features

  • Guides

  • Labels

The Importance of Text

The Importance of Text

  • Without text, it is difficult to extract any useful information from a data visualisation.

The Importance of Text

  • Viewers pay a lot of attention to the text on a data visualisation.

The Importance of Text

  • Viewers pay a lot of attention to the text on a data visualisation.

The Visual System

  • Text is initially processed as visual information, but is then processed as verbal information.

  • Text consists of one or more familiar visual objects.

A Simple Model of the Visual System

Text Data Symbols

Youth Crime

  • The crimeDistrictTotal data frame contains the total of offenders by district.

    head(crimeDistrictTotal)
    ## # A tibble: 6 × 3
    ##   district         total avgPop
    ##   <chr>            <int>  <dbl>
    ## 1 Auckland City     5583 17438.
    ## 2 Bay of Plenty     9369 15192.
    ## 3 Canterbury        8976 23550.
    ## 4 Central           8700 15491.
    ## 5 Counties/Manukau 10229 26388.
    ## 6 Eastern           6221  9382.

Text as a Data Symbol

  • We can map each total value to the shape of a text data symbol.

Text as a Data Symbol

  • We can map back from text data symbols to the raw data.

Text as a Data Symbol

  • Text is appropriate for representing quantitative data.

    • Text has excellent accuracy, for single data values.

Text as a Data Symbol

  • Differences and ratios can be calculated, but that involves much greater cognitive load.

Text as a Data Symbol

  • Text is appropriate for representing qualitative data.

    • Text has excellent capacity.

  • Text can express zero and negative values.

  • Text has spectacular congruence.

Text as a Data Symbol

  • Text data symbols are terrible for visual summaries/data statistics.

Text as a Data Symbol

  • Text data symbols are terrible for visual summaries/data statistics.

Text Data Symbols in {ggplot2}

  • We can map data values to text with geom_text().

    ggplot(crimeDistrictTotal) +
        geom_text(aes(x="", y=district, label=total), 
                  size=4, hjust=1, vjust=0)

     

    • size is in millimetres.
    • hjust is horizontal justification
      (0 = left, 1 = right)
    • vjust is vertical justification
      (0 = bottom, 1 = top)

Text Features

Text Features

  • We can also map data values to other visual features of text data symbols: position, size, angle, and colour

Text Features

  • A text data symbol often involves a redundant mapping.

Fonts

  • The shape of text is also affected by the font.

Text Features in {ggplot2}

  • The font family describes an overall style.

  • The fontface describes whether the text is bold, italic, or plain.

    ggplot(pets) + 
        geom_col(aes(pet, count)) +
        geom_text(aes(pet, 150, label=pet),
                  colour="white", size=c(24, 30), vjust=0,
                  family=c("Impact", "TeX Gyre Chorus"), 
                  fontface=c("bold", "italic")) 

Text Features in {ggplot2}

  • The font families "sans", "serif", and "mono" are always available.

  • Selecting a custom font family is easy as long as you are using the right graphics device:

    • Cairo-based devices.
    • {ragg} devices.
  • Further reading:

Guides

Guides

  • Guides are visual representations of scales, such as axes and legends.

Guides

  • Data values are mapped to data symbols such as lines to create tick marks, grid lines, and legends keys

Guides

  • Data values are mapped to text data symbols to create tick labels and legends labels

Guides in {ggplot2}

  • Axes and legends are generated automatically.

  • The scale functions like scale_x_continuous() and scale_colour_manual() allow control of the details.

    • breaks specifies the tick mark locations.
    • labels specifies the tick labels, either explicitly or with a function like scales::label_comma().
    • guide takes a function like guide_axis() or guide_legend() to control the placement and layout of the axis or legend.

Guides in {ggplot2}

ggplot(crimeDistrict) +
    geom_line(aes(x=year, y=count, colour=district)) +
    scale_y_continuous(breaks=seq(200, 1400, 200),
                       labels=scales::label_comma())

Guides in {ggplot2}

ggplot(crimeDistrict) +
    geom_line(aes(x=year, y=count, colour=district)) +
    scale_colour_discrete(guide=guide_legend(ncol=2))

Direct Labelling

  • Direct labelling draws text data symbols in proximity to other data symbols rather than on a separate guide.

Direct Labelling in {ggplot2}

ggplot(crimeDistrict) +
    geom_text(data=subset(crimeDistrict, year == 2011),
              aes(x=year - .1, y=count, label=district, colour=district),
              hjust=1) +
    geom_line(aes(x=year, y=count, colour=district))

{ggrepel}

ggplot(crimeDistrict) +
    geom_text_repel(data=subset(crimeDistrict, year == 2011),
                    aes(x=year-.1, y=count, label=district, colour=district),
                    hjust=1, direction="y") +
    geom_line(aes(x=year, y=count, colour=district)) 

{directlabel}

gg <- ggplot(crimeDistrict) +
    geom_line(aes(x=year, y=count, colour=district)) 
direct.label(gg, "first.points")

Labels

Labels

  • Labels include titles, captions, and annotations

Plot Title

  • A plot title may describe the overall purpose of the plot and may describe the overall message of the plot.

Captions

  • A caption may provide a more detailed description of the plot and may provide information about the data source.

Guide Titles

  • Axis titles and guide titles describe the variables that are being mapped and can include metadata such as the units of measurement.

Annotations

  • An annotation may be used to describe an important feature within a plot.

Labels

  • We can map back from labels to metadata.

  • Labels are excellent for visual summaries/data statistics.

  • Labels and data symbols often work to support each other.

Labels

  • We can map back from labels to data statistics.

Labels in {ggplot2}

  • The labs() function can be used to specify the plot title and caption.

    labels <- 
        labs(title=paste("Youth Crime has Declined",
                         "in all Districts over the last Decade"),
             subtitle="(with small uptick in the last two years)",
             caption=paste('Number of youth offenders aged between',
                           '14 and 16 from 2010 to 2020 in',
                           'New Zealand.\nSource: New Zealand',
                           'Ministry of Justice "Youth Justice',
                           ' Indicator Report", 2021.'))

Labels in {ggplot2}

  • xlab() and ylab() set the axis titles.

  • Use NULL to remove the title.

    ylab <- ylab("Number of Offenders (per 10,000 population)")
  • annotate() adds a geom.

    annotation <- annotate("text", label="COVID ", x=covid$x, 
                           y=Inf, hjust=1, vjust=1.5) 

Attention

  • Titles and/or captions may provide top-down goals and tasks that direct attention.

Attention

  • Titles and/or captions may provide top-down goals and tasks that direct attention.

Summary

Summary

  • Text is an essential component of any data visualisation.

  • We can use text data symbols to represent data values with great accuracy, but only for a small number of data values.

  • We can use text labels to represent much more complex and abstract information, including metadata and data statistics.

  • Labels are a very effective way to direct attention.

Exercises

Exercise

  • Identify the role of each text element within this data visualisation.

  • What {ggplot2} functions would you use to draw each text element?

  • Can you see anything wrong with this data visualisation?

Based on a Business Insider data visualisation; data from USA Facts

Exercise

  • Can you see anything wrong with this data visualisation?

    New Zealand Listener April 27-May 3 2024.