Topic 3 Data visualisation

Being able to look at the data is a key step in data exploration, during the analysis and in the communication of results. R has a range of powerful tools to create graphs quickly, and then to develop them into a publication-ready format where helpful.

For an introduction to the Grammar of Graphics and ggplot2, the package that we will be using for visualisation, watch this video:

Click here for the code used in the video

3.1 ggplot2

We use the ggplot2-package because it offers a consistent way to create anything from simple exploratory plots to complex data visualisation. Each graph command needs certain parts:

  • a call to the ggplot-function and the data as the first argument: ggplot(gapminder,
  • a mapping of the aesthetics, i.e. of variables to visual elements. This uses the aes-function that is given to ggplot as the second argument: aes(x=infant_mortality, y=fertility, col=continent)) (Note that two closing brackets are needed as this also completes the ggplot function call)
  • a geometry, i.e. a type of chart, that is added with a plus-symbol and the function call, e.g., + geom_point()
  • optional elements such as labels that are included again with plus-symbols and function calls, e.g., + labs(title = "Association of infant mortality and fertility", subtitle="2010 data from gapminder.org")

Multiple geometries can be layered on top of each other, for example to add trend lines to scatterplots. In that case, aes() functions can be included into the geom_x()-functions to make some of the mappings specific to certain geometries. This is done in the example below to color the points by continent without applying that aestethic to the line - if it was included in the main aes()-function, the plot would contain a separate coloured line for each continent.

Note: Line breaks are completely up to you and can be used to make the code readable as long as the command does not appear complete too early - to keep it simple, there should always be an open bracket, a comma or a + before a line break within the command to create a ggplot-chart

pacman::p_load(dslabs, dplyr, ggplot2)
gapminder2010 <- gapminder %>% filter(year==2010)

ggplot(gapminder2010, aes(x=infant_mortality, y=fertility)) +
  geom_point(aes(col=continent)) + geom_smooth() +
    ggtitle("Association of infant mortality and fertility", 
            subtitle="2010 data from gapminder.org")
## Warning: Removed 7 rows containing non-finite values (stat_smooth).
## Warning: Removed 7 rows containing missing values (geom_point).
A simple ggplot example

Figure 3.1: A simple ggplot example

3.2 esquisse: using ggplot2 with your mouse

The esquisse package provides an RStudio add-in that lets you create ggplot2-charts interactively, without having to know the code in advance. If that sounds You can check out the Getting Started guide to see what that would look like. To try it out, run install.packages("esquisse"), load the data you want to use, and type esquisse::esquisser() into your Console.

Try not to rely on esquisse instead of ggplot2 - the aim in this module is to learn how to write code that is reproducible, not how to click boxes. However, esquisse shows you the code it generates (check Export & Code at the bottom right), so that it can be very helpful when you are getting started, or when you are not quite sure how to do something.

3.3 Opinionated visualisation

It is easy to find examples of misleading charts that should never have been published. However, even when all the elements of a chart are legitimately presented, the same data can still be used to suggest radically different conclusions. See the following example and follow the source link to read more about “opinionated” data visualisations. Apart from being critical when seeing charts, the take-away message from this is to be conscious of the power of small design choices - that is a power you should wield consciously when creating charts.

Same data, two messages (Source: <a href="https://www.infoworld.com/article/3088166/why-how-to-lie-with-statistics-did-us-a-disservice.html">infoworld.com</a>)

Figure 3.2: Same data, two messages (Source: infoworld.com)

3.4 Further resources

  • The R Graphics Cookbook by Winston Chang is available online with 150 “recipes” that cover everything from basic exploratory charts to colour-coded maps.
  • The BBC graphics team has published their own R Cookbook with many tips for making charts that convey a clear message, as well as some custom functions for making clean publication-ready charts.
  • Irizarry’s Introduction to Data Science has a good chapter on data visualisation principles
  • I am a big fan of the gapminder bubble chart. In the video (from about 5:00), I show how to create a static version, but R can also create the dynamic version that shows global development over time. For that, you can check out this code.