14 Composing Encodings

Figure 14.1 shows a line plot of the rate of youth crime in New Zealand for different ethnic groups and a pie chart showing the proportions of different types of crime.¹ There is a problem with this figure because the line plot and the pie chart share the same colour palette even though they are visualising different variables (ethnicity and crime type). For example, the ethnic group European/Other is encoded as the orange colour of a line and the crime type Theft is also encoded as an orange colour, in this case the colour of a wedge.

The conflicting encodings in Figure 14.1 create a confusing data visualisation because the same visual feature decodes to multiple data values (Figure 14.2). There is nothing wrong with the encodings in the line plot by itself or with the encodings in the pie chart by itself, but the overall combination of elements within Figure 14.1 is ineffective.

Figure 14.2: If we encode more than one data value to the same visual feature, it is unclear which is the correct decoding.

The problem in Figure 14.1 involves the coordination, or lack thereof, between multiple elements of a data visualisation. In this chapter we look at the arrangement and coordination of the elements of a data visualisation.²

14.1 Encoding structure

Figure 14.3 shows a scatter plot of the number of clean breaks and the number of tries for countries at the 2023 Rugby World Cup (Table 9.1). This data visualisation is similar to Figure 9.4, but with an additional encoding of the hemisphere for each country as the colour of the data points.

The legend in Figure 14.3 encodes the colour encoding: which colour is used to encode each hemisphere. In Section 13.9, we described how the proximity of the labels “South” and “North” to the coloured points in the legend allow us to associate the colours with the different hemispheres. The different values of hemisphere are encoded as the vertical positions of the points and the text labels; both the orange point and the label “South” have the same vertical position.

Figure 14.3: A scatter plot of the number of times a team breaks through the opposition defence and the number of tries that a team scores (both are per-game averages) for teams at the 2023 Rugby World Cup.

What about the horizontal position of the points and the text in the legend? These do not encode hemisphere values because they are the same for both North and South. What about the position of the legend title, “hemisphere”?

All of the elements of the legend are positioned together and to the right of the main plot. Their proximity to each other (and separation from the main plot) mean that we see the legend as a separate group (Section 2.7). This creates a visual organisation for the plot, which helps the viewer to navigate and comprehend the different elements of the plot.³ This is analogous to reading a body of text, which is split into paragraphs and sections with headings to help the reader navigate the text.

One way to express this is that the position of the elements of the legend encode the organisation of the overall data visualisation (Figure 14.4). Our visual system decodes the legend as a separate element based on the proximity of the text and points in the legend.

Figure 14.4: We can encode the structure of a data visualisation as position so that elements that are part of the same structure are close to each other.

One way in which a data visualisation can be effective is if it provides information about how to read the plot as well as information about the data itself.

Another way to say it is that each element of the data visualisation in Figure 14.3 belongs to one of two groups: the main plot or the legend. There is a qualitative value—plot or legend—associated with each element of the data visualisation. These qualitative values are encoded as the position of the elements within the data visualisation: the plot is on the left and the legend is on the right.

Although we are no longer talking about encoding data values as data symbols, we are still talking about encoding. We are just encoding and decoding a different sort of information—the organisation of a data visualisation.

Figure 14.5 shows a variation of Figure 14.3, with exactly the same plot elements, but in a different arrangement. There is greater separation of the legend from the main plot. The legend has also been moved so that its top is aligned with the top of the main plot. Similarly, the axis titles have been moved so that they align with the top and right sides of the plot, and the y-axis label has been rotated to horizontal. More subtly, the right edge of the y-axis label is aligned with the right edge of the y-axis tick labels and the points in the legend are aligned with the left edge of the legend title.

Figure 14.5: A scatter plot of the number of times a team breaks through the opposition defence and the number of tries that a team scores (both are per-game averages) for teams at the 2023 Rugby World Cup.

All of these changes are aimed at making the data visualisation more orderly (Section 2.8).⁴ The legend is more obviously a separate element from the main plot and the axis titles more obviously connect to the main plot. A more orderly arrangement of the visual elements will provide fewer distractions and lead to a more effective data visualisation because the important differences in the data will be more readily perceived (Section 2.5).

The strong alignment and regular patterns within facetted plots like Figure 11.4 are another effective example of keeping everything else the same in order to champion the changes in the data. Conversely, another weakness of placing data symbols on maps, like in Figure 12.16 (b), is that the lack of regular positioning and alignment makes it harder to attend to the important differences between data symbols.

14.2 Unintentional encodings

One important idea in this chapter is that we can encode more than just data values or data summaries as visual representations. For example, the previous section showed that we can encode different groups of elements within a data visualisation using position—elements that are positioned close together and/or aligned with each other form visual groups.

Another way to say this is that the we decode the differences in the positions of different elements to identify distinct groups of elements.

Yet another way to say this is that we decode visual differences regardless of whether they reflect differences in data values.

In other words, all visual differences in a data visualisation matter.

For example, consider the x-axis title in Figure 14.3. Although it has a similar horizontal position to the rest of the x-axis, it has a unique vertical position within the data visualisation. Similarly, the horizontal position of the x-axis title is unique, as are the horizontal and vertical positions of the legend. Because these positions are unique, we are tempted to decode those differences—we seek the information that is encoded in those differences. Put simply, the differences make the data visualisation more complex.

Figure 14.5 is a more orderly version of Figure 14.3 and, as a consequence, there are fewer visual differences, so there is less to decode. The data visualisation is less complex.

Figure 13.9 provides another simple example of unintentional encoding. In this bar plot there are text labels at different angles—the y-axis title is rotated 90 degrees and the x-axis tick labels are rotated 45 degrees, but these differences in angle do not reflect any differences in the data. Keeping all text in a data visualisation horizontal removes this source of distraction.

Figure 14.6 shows a more problematic example.⁵ This bar plot shows the number of ex-smokers in New Zealand over time, with each bar resembling a cigarette that is slowly burning down (the grey area represents ash at the end of the cigarette). The number of ex-smokers is encoded as the full height of each bar, so we can decode the number of ex-smokers. However, the grey area and the white area on each bar also changes each year and those changes are not associated with any data value. In other words, there are differences that we decode as a change in something, but nothing is really changing. This is at best confusing and could even lead to misinterpretation.⁶

Figure 14.6: A bar plot of the number of ex-smokers in New Zealand over time.

A choropleth map like Figure 12.14 provides a more subtle example of an unintentional encoding. Each region in the choropleth map is different in multiple ways. Each region has its own colour, which reflects the data values that are explicitly encoded and allows a decoding of the crime rate for each region. Each region also has its own position and shape, which reflect the geography of the physical world and allow a decoding of the region itself. However, each region also has its own size, which again reflects the geography of the physical world, but is less useful for decoding the region itself and permits an unintentional decoding of larger data values from larger regions. That is either a problem if it incorrectly confuses the decoding of crime rate from the colour of each region or it is a problem if it provides a distracting decoding of region size.

14.3 Consistent encodings

Figure 14.7 shows a variation on Figure 14.3 with a modification to the legend. In this plot, the hemisphere of a country is encoded using orange and green within the main plot, but then hemisphere is encoded using light blue and brown in the legend. This is clearly an ineffective data visualisation. It is possible to decode that there are two groups of data symbols in the plot, but it is impossible to decode which of those two groups is North and which is South.⁷ The point of Figure 14.7 is merely to highlight the critical importance of maintaining consistent encodings between different elements of a plot.⁸

Figure 14.7: A scatter plot of the number of times a team breaks through the opposition defence and the number of tries that a team scores (both are per-game averages) for teams at the 2023 Rugby World Cup.

Figure 14.8 demonstrates that we can harness consistent encodings to actually improve upon the original Figure 14.3. In Figure 14.8, the encoding of hemisphere as the colour of points is not only consistent between the main plot and the legend, but it has been extended to the colour of the legend text as well. The visual grouping of elements that relate to the same hemisphere is now stronger both within the legend and across the entire data visualisation (Section 2.7).

Figure 14.8: A scatter plot of the number of times a team breaks through the opposition defence and the number of tries that a team scores (both are per-game averages) for teams at the 2023 Rugby World Cup.

Another consistency in Figure 14.8 (and the original Figure 14.3) is the use of the same data symbol in both the main plot and the legend. Circular data symbols are used in both cases. That consistency, like the use of consistent colours, might again appear obvious, but it is not always true. For example, the data symbols in Figure 14.9 are circles, but the legend is a rectangular colour gradient.

Figure 14.9 shows a correlation matrix of seven performance measures for teams in the 2023 Rugby World Cup (Table 9.1). Variable names are encoded as the horizontal and vertical positions of circles and the correlations between variables are encoded as the colour of the circles. We can see that many measures have quite a strong positive correlation (dark red circles), but one measure (the number of tackles per game) is negatively correlated with all other measures (though often only very weakly).

Figure 14.9: Correlations between pairs of measures of performance for teams in the 2023 Rugby World Cup.

Figure 14.10 shows a variation of Figure 14.9 with a legend that not only shares the same colour encoding, but also shares the same data symbol. It is much easier to compare the colour of one of the circular data symbols with the colours of the circles in the legend because they are all circles.

Figure 14.10: Correlations between pairs of measures of performance for teams in the 2023 Rugby World Cup.

Figures Figure 6.15 and Figure 6.16 are other examples of this consistent encoding between legends and the main plot. The legends in those cases are drawn so that they mirror the positions of the bars in the main plot as well as the colours of the bars in the main plot.

14.4 Inconsistent encodings

Figure 14.11 shows a bar plot of the impact of a proposed tax bill during the second Trump administration.⁹ Each bar represents the predicted change in annual income for a particular income bracket. Lower incomes are to the left and higher incomes are to the right. There are labels to describe each income bracket and there are labels that describe each change in income. Where the change in income is large, the label for change in income is written within the bar (at the top of the bar), but for smaller changes in income, the bar is too small to fit the label, so the label is placed just above the bar. For the two leftmost bars, the change in income is actually negative and the change in income label is placed below the bar. The labels that describe the income brackets are mostly below the bars, but for the two leftmost bars the labels for the income brackets are placed above the bars.

Figure 14.11: A bar plot of the distributional effects of Trump’s “Big, Beautiful Bill”.

In summary, there are two labels for each bar, one below the bar and one above (or at the top of) the bar. This consistent positioning of the labels encourages the viewer to decode two groups of labels: one group above the bars and one group below the bars. However, this will mislead us into comparing, for example, “-1K” (a change in income) for the leftmost bar with “4.3M” (an income bracket) for the rightmost bar. The correct comparison is between “-1K”, which is the change in for the lowest income bracket, and “389.3K”, which is the change in income for the highest income bracket.¹⁰

Just as it makes no sense to change the encoding of data values within a data visualisation (Figure 14.7) or across multiple data visualisations within the same document (Figure 14.1), it makes no sense to change the encoding of structure within a data visualisation. In this case, all labels above the bars should belong to the same group of labels (changes in income) and all labels below the bars should belong to the same group of labels (income brackets).

14.5 Encoding importance

Figure 14.12 shows a variation on Figure 14.8 with many elements of the plot drawn in light grey. The effect of this change is to create two visual groups, one lighter and less saturated than the other. This emphasises the main data symbols in the main plot and in the legend, and de-emphasises the axes and labels. Our attention is drawn to the darker and more saturated elements first (Section 2.6), with the other elements in the background, which we can focus on if we need the additional information.¹¹

Figure 14.12: A scatter plot of the number of times a team breaks through the opposition defence and the number of tries that a team scores (both are per-game averages) for teams at the 2023 Rugby World Cup.

By comparison, in Figure 14.8 there is a similar visual impact from all elements of the plot, so it is less clear where to look first. All of the elements of the data visualisation compete for our attention. Figure 14.13 emphasises this point by making all axis and legend text black and adding black grid lines. The data symbols are now lost amongst the non-data elements of the plot.¹²

Figure 14.13: A scatter plot of the number of times a team breaks through the opposition defence and the number of tries that a team scores (both are per-game averages) for teams at the 2023 Rugby World Cup.

The use of colour in Figure 14.12 is another example of encoding (qualitative) non-data values as visual features. In this case, we are encoding the importance of different visual elements as visual features (Figure 14.14). We decode the differences in colour to a more important group of elements and a less important group of elements.

Figure 14.14: We can encode the importance of elements within a data visualisation as visual features so that elements that are more important are more visible.

Figure 14.15 shows a similar application of importance. In this case, we are highlighting two data points (New Zealand and South Africa). The distinct colour and larger size of the two points encodes the importance of those points and draws attention to those points, with everything else as background context.

Figure 14.15: A scatter plot of the number of times a team breaks through the opposition defence and the number of tries that a team scores (both are per-game averages) for teams at the 2023 Rugby World Cup.

14.6 Unimportant encodings

When we encode importance, we are relying on basic properties of the visual system to draw attention to specific elements of a plot (Section 2.6). As we saw in Section 2.10, those basic properties of the visual system can also work against us. Our attention will be drawn to large and bright elements of a data visualisation automatically.

For example, in Figure 14.16, our eye is drawn to the tiger’s eye, which distracts us from the more important heights of the bars. There are clear overlaps here with the ideas of clutter and chart junk (Section 12.6).

Figure 14.16: A bar plot of the estimated population of Bengal Tigers in Bhutan. The background image in this plot distracts attention from the most important data symbols.

14.7 Summary

Précis (click to expand/contract)

Data visualisations can be very effective for communicating information.

However, a data visualisation that is effective for communicating one type of information may be ineffective for communicating another type of information.

The goal of this book is to explain why some data visualisations are more effective than others at communicating different types of information—how data visualisation works.

We will focus on how information can be encoded to create a visual representation. We will characterise a data visualisation in terms of the encodings that it uses to convert data values into data symbols.

The effectiveness of an encoding will depend on how well we can decode the information that we want from a visual representation. We will judge a data visualisation in terms of how well data values can be recovered from the data symbols.

There are features of the human visual system that mean that we can decode some information extremely rapidly and without effort:

A very large amount of basic information is gathered at once about simple visual features like positions, lengths, and colours.
Large, bright, colourful items automatically attract attention.
We automatically identify groups of items within an image based on similarity of basic visual features like position and colour, plus connecting lines and enclosing borders.

On the other hand, there are limitations of the visual system that suggest encodings that we should avoid:

Detailed information is only available at the centre of the visual field.
Visual memory is extremely limited.

These features suggest that encoding data values as basic visual features and generating simple, orderly data visualisations will lead to rapid and effortless decoding of information.

A simple encoding of data values to data symbols involves encoding each data value to a separate data symbol. This allows the viewer to decode and compare individual data values from the data symbols.

A simple encoding of data values to data symbols also involves encoding each data value as a basic visual feature of the data symbol, e.g., position, length, area, angle, colour, or pattern.

Position, length, area, and angle are appropriate for encoding quantitative data because we can decode numeric values from these visual features. We can decode position and length more accurately than area and angle.

Position, colour, and pattern are appropriate for encoding qualitative data because we can decode groups from these visual features. We can represent a large number of categories if we use position, but only a few categories if we use colours and patterns.

Encoding data values as the position of data symbols is very effective for decoding of both quantitative and qualitative information. However …

For quantitative values, what we can accurately decode are comparisons between quantitative values, not absolute quantitative values.
The decoding is most accurate for positions that share a common baseline.
Encoding identical data values as the positions of data symbols means that the data symbols overlap, which compromises our ability to decode data values from the data symbols.
We can encode one set of data values as horizontal positions and another set of data values as vertical positions because we can decode horizontal and vertical positions separately.
Decoding quantitative data values from the positions of data symbols is only accurate if the encoding is linear.

Encoding data values as the length of data symbols is very effective for decoding quantitative information. However …

Comparisons between lengths are more difficult if the lengths are far apart, especially if there are distractors in between.
Comparisons between lengths are more difficult if the lengths do not have a common baseline.
Comparisons between lengths are easier for shorter lengths.

Colour is really three visual features: hue, chroma, and luminance.

Hue is excellent for encoding nominal data values, though it has a limited capacity.

Chroma and luminance can be used to encode ordinal data values (as well as nominal data values), but they have even lower capacity.

When we encode data values as colours there are several caveats:

The decoding of data values from colours is affected by surrounding colours and the size of the data symbol.
Approximately 10% of viewers are unable to differentiate between red and green hues with similar chroma and luminance.

Selecting which colours should be used to encode data values is difficult to get right and a good solution often involves varying all of hue, chroma, and luminance at once.

Consequently, it is usually a good idea to make use of pre-existing colour palettes that have been carefully designed to avoid most problems.

The effectiveness of a data visualisation may depend on more than just the accuracy and capacity of visual features.

Some visual features have an implicit decoding—we can decode information from the visual feature without any explicit encoding of information—for example, we can implicitly decode a ratio of 2 from two lines where one is twice the length of another.

A congruent encoding is one where data values are explicitly encoded in a way that is consistent with an implicit decoding of the visual feature.

A data visualisation will be more effective if it is visually congruent, for example, data symbols are larger for larger data values or data symbols only change if the data values change.

A dissonant encoding is one where data values are explicitly encoded in a way that is inconsistent with an implicit decoding.

A data visualisations will be less effective if it is visually dissonant.

A data summary transforms raw data values to descriptive statistics such as measures of central tendency, measures of variability, or simple tables of counts.

Some data visualisations, like box plots and histograms, are effective because they encode data summaries to visual features, rather than encoding raw data values to visual features.

Encoding data summaries makes it easy to decode and compare data summaries.

It is also sometimes possible to perform visual summaries. In this case, we encode raw data values to visual features, but our visual system allows us to decode data summaries, for example, the average position of many individual data points.

A box plot that encodes data summaries to visual features is more effective for perceiving data summaries than a dot plot that relies on visual summaries. However, a dot plot is more effective for perceiving raw data values.

When encoding data summaries, care must be taken to use data summaries that appropriately summarise the data values.

Almost all data visualisations involve combinations of encodings. More than one set of data values are encoded as more than one visual feature of data symbols.

The encodings involved in a bar plot—quantitative data values encoded as lengths and qualitative values encoded as position—are effective because we are able to perceive some combinations of visual features, such as length, position, and colour independently, which means that we can effectively decode both position and length from a bar plot.

A scatter plot is effective for perceiving relationships between variables because the encodings of quantitative data values to both horizontal and vertical positions interact to produce position in space and our visual system is capable of producing useful visual summaries from position in space, such as correlation.

Independence between visual features is useful when we want to decode separate data values. Interactions between visual features is useful when we want to produce an emergent feature that allows us to decode data summaries.

Conversely, independence between visual features is of no help if what we want is to decode a data summary from an emergent feature. Furthermore, interaction between visual features is unhelpful, or even misleading, if we cannot decode any meaningful information from the emergent feature that results from the interaction.

Encoding multiple rows of data values to a single data symbol produces a visual shape, like a line on a line plot.

The main benefit of a data symbol that is a visual shape is that data summaries, such as modes, skewness, local maxima and minima, and trends over time, can can be decoded from a visual shape. On the downside, decoding raw data values from a visual shape may be harder, compared to decoding raw data values from a simple data symbol like a bar.

One danger with visual shapes is that they depend on aspect ratio and scale. The same encodings can be made to decode to different data summaries, so we are responsible for selecting an appropriate aspect ratio and scale.

Another danger is that a visual shape may not necessarily convey a useful data summary. We need to only create visual shapes in a purposeful manner and avoid accidentally creating visual shapes that may confuse or mislead.

When we visualise multiple variables at once, it is not effective to encode each different variable to a different visual feature because that usually forces us to make use of a visual feature that is either inappropriate or inaccurate.

An alternative is to reuse the same visual feature for multiple different variables. In particular, we can reuse position for more than just two variables, for example, to create a scatter plot matrix or a facetted plot. This at least allows us to accurately decode individual variables.

Another alternative is to use 3D coordinates or non-cartesian coordinates, for example 3D plots or parallel coordinates plots. These generate visual shapes that allow us to decode multivariate data summaries, like multivariate clusters and multivariate correlations. However, in order to gain these multivariate data summaries, we typically have to sacrifice the ability to accurately decode individual data values.

No matter which approach we use, there is still a limit to how much information can effectively be displayed at once within a static data visualisation. This is one area where dynamic and interactive graphics can be useful because the viewer is able to rapidly switch between multiple views of the data.

Visual objects are relatively complex data symbols that have a learned decoding.

Encoding data values as visual objects is effective because the learned decoding provides a pre-existing decoding from the data symbol to data values. This can support an explicit encoding and can remove the need for axes and legends to explain the decoding.

However, visual objects may be distracting if they are not related to any encoding of data values and the increased complexity can make decoding more difficult.

It is also important not to create a conflict between an explicit encoding and a learned decoding. That will result in confusion or incorrect decoding of the data symbols.

Data values can be encoded as the visual features of text, such as position and colour, just like for other data symbols.

The pattern of text—the characters used in text—is particularly important because we have a learned decoding of text. We can read data values from text.

We can use text to encode any type of data, we can represent a very large number of different categories, and we can decode individual data values from text extremely accurately.

On the other hand, decoding a large number of data values from text is very slow and we cannot decode visual summaries from text.

The text elements within axes and legends are essential components of a data visualisation because they provide the only precise decoding of absolute data values. This is what allows the relative decoding of the main data symbols within a data visualisation.

Titles and annotations are very important components of a data visualisation because they are the only way to encode complex and higher-level information, such as metadata and the overall subject matter of a data visualisation.

Text is very valuable in a data visualisation for accurately encoding a small number of data values, or to express complex information. But text is not appropriate for representing a large number of raw data values.

Just as we can encode data values to visual features, in order to communicate information about the data, we can encode non-data information to visual features, in order to communicate important aspects of the structure of a data visualisation.

The structure or organisation of a data visualisation, or the importance of specific elements, can be effectively encoded as the position or colour of different elements of the data visualisation.

Communicating structure and importance helps the viewer to navigate within the data visualusation, which makes it easier to decode information from the data symbols within a data visualisation.

Cabouat, Anne-Flore, Lorenzo Ciccione, Samuel Huron, Tobias Isenberg, and Petra Isenberg. 2025.“Bridging Educational Theories of Cognitive Load to Visualization Design and Evaluation .” In 2025 IEEE VIS Workshop on Visualization Education, Literacy, and Activities (EduVIS), 33–64. Los Alamitos, CA, USA: IEEE Computer Society. https://doi.org/10.1109/EduVIS69391.2025.00009.

Tufte, Edward R. 1983. The Visual Display of Quantitative Information. Cheshire, Connecticut: Graphics Press.

Williams, Robin. 2014. The Non-Designer’s Design Book. 4th ed. Berkeley, CA: Peachpit Press.

This data visualisation was based on figures from the Youth Justice Indicators Report from 2021.↩︎
We might describe the topic of this chapter as the overall design of a data visualisation. Much of what we discuss will have a basis in the properties of the visual system, but much will also echo the basic design principles espoused in Robin Willams’ CRAP design principles (Williams 2014).↩︎
Cabouat et al. (2025) report that “if a learner found a visualization more readable, they felt it required less mental effort to parse relevant information from it for learning.”↩︎
Alignment is also one of the CRAP design principles (Williams 2014).↩︎
Figure 14.6 is based on a data visualisation in the New Zealand Listener January 11-17 2025.↩︎
There is a connection between unintentional encodings and chart junk (Section 12.6).

Figure 14.6 also contains a little easter-egg bonus problem: the x-axis is uneven, with a 7-year jump followed by two five-year jumps.↩︎
This inconsistent encoding could also be described as a dissonant encoding (Section 7.5).↩︎
A consistent encoding aligns with the CRAP design principle of repetition (Williams 2014).↩︎
This data visualisation is based on a plot from the Popular Information web site by Judd Legum. The data are from the Penn Wharton Budget Model.↩︎
This is precisely the mistake that was made when this data visualisation was discussed on The Majority Report (~6:00).↩︎
The idea of making a clear visual distinction between different elements of a data visualisation mirrors the CRAP design principle of contrast.↩︎
This also relates to Tukey’s data-ink ratio (Tufte 1983).↩︎