7 Writing Reports with R Markdown

R Markdown is a tool for creating documents that combine text, R code, and the results of that R code. It simplifies the process of incorporating graphs and other data outputs into a document, removing the need for separate R and word processing operations. It allows for the automation of data retrieval and updating, making it useful for maintaining up-to-date financial reports, among other applications. With R Markdown, you can produce documents in various formats, including HTML, PDF, and Word, directly from your R code. Markdown facilitates the formatting of text in a plain text syntax, while embedded R code chunks ensure the reproducibility of analysis and reports.

7.1 Creating an R Markdown Document

Here is the step-by-step guide to create a new R Markdown document in RStudio:

  1. Click on the top-left plus sign plus, then select R Markdown...
  2. In the dialog box that appears, select Document and choose PDF, then click OK.
New R Markdown

Figure 7.1: New R Markdown

  1. You should now see a file populated with text and code. Save this file by clicking File -> Save As... and select an appropriate folder.
  2. To generate a document from your R Markdown file, click Knit: knit (or use the shortcut Ctrl+Shift+K or Cmd+Shift+K).
  3. Lastly, the Knit drop-down menu knitdown lets you export your file in different formats, such as HTML or Word, in addition to PDF.

The R Markdown template includes:

  • A YAML header, enclosed by ---, which holds the document’s metadata, such as the title, author, date, and output format.
  • Examples of Markdown syntax, demonstrating how to use it.
  • Examples of R code chunks, showing how to write and utilize them in your document.

The R code chunks are enclosed by ```{r} at the beginning and ``` at the end, such as:

```{r cars}
summary(cars)
```

Anything written within these markers is evaluated as R code. On the other hand, anything outside these markers is considered text, formatted using Markdown syntax, and, for mathematical expressions, LaTeX syntax.

7.2 YAML Header

The YAML header at the top of the R Markdown document, enclosed in ---, specifies high-level metadata and options that influence the whole document. It might look like this:

---
title: "My Document"
author: "Your Name"
date: "1/1/2023"
output: html_document
---

In this YAML header, the title, author, and date fields define the title, author, and date of the document. The output field specifies the output format of the document (which can be html_document, pdf_document, or word_document, among others).

7.3 Markdown Syntax

Markdown is a user-friendly markup language that enables the addition of formatting elements to plain text documents. The following are some fundamental syntax elements:

  • Headers: # can be used for headers. For instance, # Header 1 is used for a primary header, ## Header 2 for a secondary header, and ### Header 3 for a tertiary header, and so forth.
  • Bold: To make text bold, encapsulate it with **text** or __text__.
  • Italic: To italicize text, use *text* or _text_.
  • Lists: For ordered lists, use 1., and for unordered lists, use - or *.
  • Links: Links can be inserted using [Link text](url).
  • Images: To add images, use ![alt text](url) for online images or ![alt text](path) for local images, where path is the folder path to an image saved on your computer.

7.4 R Chunks

In R Markdown, you can embed chunks of R code. These chunks begin with ```{r} and end with ```. The code contained in these chunks is executed when the document is rendered, and the output (e.g., plots, tables) is inserted into the final document.

Following the r in the chunk declaration, you can include a variety of options in a comma-separated list to control chunk behavior. For instance, ```{r, echo = FALSE} runs the code in the chunk and includes its output in the document, but the code itself is not printed in the rendered document.

Here are some of the most commonly used chunk options:

  • echo: If set to FALSE, the code chunk will not be shown in the final output. The default is TRUE.
  • eval: If set to FALSE, the code chunk will not be executed. The default is TRUE.
  • include: If set to FALSE, neither the code nor its results are included in the final document. The default is TRUE.
  • message: If set to FALSE, suppresses all messages in the output. The default is TRUE.
  • warning: If set to FALSE, suppresses all warnings in the output. The default is TRUE.
  • fig.cap: Adds a caption to graphical results. For instance, fig.cap="My Plot Caption".
  • fig.align: Aligns the plot in the document. For example, fig.align='center' aligns the plot to the center.
  • out.width: Controls the width of the plot output. For example, out.width="50%" will make the plot take up 50% of the text width.
  • collapse: If TRUE, all the code and results in the chunk are rendered as a single block. If FALSE, each line of code and its results are rendered separately. The default is FALSE.
  • results: The results argument provides options to control the display of chunk output in the final document. When set to results='hide', the text output is concealed, while results='hold' displays the output after the code. Additionally, results='asis' allows direct inclusion of unmodified output, ideal for text or tables. results='markup' formats output as Markdown, for seamless integration into surrounding text, particularly useful when the R output is written in Markdown syntax. results='verbatim' displays the output as plain text, which is useful when the text includes special characters.
  • fig.path: Specifies the directory where the figures produced by the chunk should be saved.
  • fig.width and fig.height: Specifies the width and height of the plot, in inches. For example, fig.width=6, fig.height=4 will make the plot 6x4 inches.
  • dpi: Specifies the resolution of the plot in dots per inch. For example, dpi = 300 will generate a high-resolution image.
  • error: If TRUE, any error that occurs in the chunk will stop the knitting process. If FALSE, errors will be displayed in the output but will not stop the knitting process.

Here’s an example:

```{r, echo=FALSE, fig.cap="Title", out.width = "50%", fig.align='center', dpi = 300}
plot(cars)
```

This chunk will create a plot, add a caption to it, set the width of the plot to 50% of the text width, align the plot to the center of the document, and output the plot with a resolution of 300 DPI. The actual R code will not be displayed in the final document.

Instead of specifying options for each code chunk, you can modify the default settings for all code chunks in your document using the knitr::opts_chunk$set() function. For instance, I often include the following code at the start of an R Markdown document, right after the YAML header:

```{r}
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE, 
                      fig.align = "center", out.width = "60%")
```

The aforementioned code modifies the default settings for all chunks in the document, as described below:

  • echo = FALSE: Each chunk’s code will be omitted from the final document, a sensible practice for official documents, as recipients don’t require visibility of code used for graph creation.
  • message = FALSE: All messages generated by code chunks will be muted.
  • warning = FALSE: Warnings produced by code chunks will be silenced.
  • fig.align = "center": All generated figures will be centrally aligned.
  • out.width = "60%": The width of any generated figures will be set to 60% of the text width.

7.5 Embedding R Variables into Text

A key strength of R Markdown is the ability to incorporate R variables directly within the Markdown text. This enables a dynamic text where the values are updated as the variables change. You can accomplish this by using the `r variable` syntax. Furthermore, you can format these numbers for enhanced readability.

To insert the value of an R variable into your text, you encase the variable name in backticks and prepend it with r. Here’s an illustration:

# R variable defined inside R chunk
my_var <- 123234.53983

To refer to this variable in your Markdown text, you can write the following text (outside of an R chunk):

The total amount is `r my_var` USD.

The output will be: “The total amount is 1.2323454^{5} USD.”

That’s because when the R Markdown document is knitted, `r my_var` will be replaced by the current value of my_var in your R environment, dynamically embedding the value of my_var into your text.

Additionally, you can format numbers for better readability by avoiding scientific notation, rounding, and adding a comma as a thousands separator. To do this, you can use the formatC() function in R as follows:

# R variable with formatting, defined inside R chunk
my_var_formatted <- formatC(my_var, format = "f", digits = 2, big.mark = ",")

Then, in your text:

The total amount is `r my_var_formatted` USD.

The output will be: “The total amount is 123,234.54 USD.”

In this case, format = "f" ensures fixed decimal notation, digits = 2 makes sure there are always two decimal places, and big.mark = "," adds comma as the thousand separator.

By properly formatting your numbers in your R Markdown documents, you enhance their clarity and make your work more professional and easier to read.

7.6 LaTeX Syntax for Math

LaTeX is a high-quality typesetting system that is widely used for scientific and academic papers, particularly in mathematics and engineering. LaTeX provides a robust way to typeset mathematical symbols and equations. Thankfully, R Markdown supports LaTeX notation for mathematical formulas, which is rendered in the HTML output.

In R Markdown, you can include mathematical notation within the text by wrapping it with dollar signs ($). For example, $a^2 + b^2 = c^2$ will be rendered as \(a^2 + b^2 = c^2\).

Here are some basic LaTeX commands for mathematical symbols:

  • Subscripts: To create a subscript, use the underscore (_). For example, $a_i$ is rendered as \(a_i\).
  • Superscripts: To create a superscript (useful for exponents), use the caret (^). For example, $e^x$ is rendered as \(e^x\).
  • Greek letters: Use a backslash (\) followed by the name of the letter. For example, $\alpha$ is rendered as \(\alpha\), $\beta$ as \(\beta\), and so on.
  • Sums and integrals: Use \sum for summation and \int for integration. For example, $\sum_{i=1}^n i^2$ is rendered as \(\sum_{i=1}^n i^2\) and $\int_a^b f(x) dx$ is rendered as \(\int_a^b f(x) dx\).
  • Fractions: Use \frac{numerator}{denominator} to create a fraction. For example, $\frac{a}{b}$ is rendered as \(\frac{a}{b}\).
  • Square roots: Use \sqrt for square roots. For example, $\sqrt{a}$ is rendered as \(\sqrt{a}\).

If you want to display an equation on its own line, you can use double dollar signs ($$). For example:

$$
\% \Delta Y_t 
\equiv 100 \left( \frac{Y_t - Y_{t-1}}{Y_{t-1}}\right)   \%
\approx 100 \left( \ln Y_t - \ln Y_{t-1} \right) \%
$$

This will be rendered as: \[ \% \Delta Y_t \equiv 100 \left(\frac{Y_t - Y_{t-1}}{Y_{t-1}}\right) \% \approx 100 \left( \ln Y_t - \ln Y_{t-1} \right) \% \tag{7.1} \]

LaTeX and R Markdown together make it easy to include mathematical notation in your reports. With practice, you can write complex mathematical expressions and equations using LaTeX in your R Markdown documents.

7.7 Printing Tables

The R packages kable and kableExtra are great tools for creating professionally formatted tables in your R Markdown documents. Directly printing data without any formatting is not usually advisable as it lacks professionalism and can often be challenging to read and interpret. By contrast, these packages allow you to control the appearance of your tables, leading to better readability and aesthetics.

You’ll first need to install and load the necessary packages. You can do so by executing install.packages(c("knitr", "kableExtra")) in your console and then load the two packages in the beginning of your code:

library("knitr")
library("kableExtra")

Let’s assume we have a simple dataframe df that we want to print:

df <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(24, 30, 18),
  Gender = c("Female", "Male", "Male")
)
df
##      Name Age Gender
## 1   Alice  24 Female
## 2     Bob  30   Male
## 3 Charlie  18   Male

You can create a basic table using the kable function from the knitr package:

kable(df)
Name Age Gender
Alice 24 Female
Bob 30 Male
Charlie 18 Male

This will generate a simple, well-formatted table. However, you can further customize the table’s appearance using functions from the kableExtra package:

df %>%
  kable() %>%
  kable_styling("striped", full_width = FALSE)
Name Age Gender
Alice 24 Female
Bob 30 Male
Charlie 18 Male

This code generates a striped table, which alternates row colors for easier reading. The full_width = FALSE argument ensures the table only takes up as much width as necessary.

Adding a caption to your table is straightforward. Simply provide the caption argument to the kable function:

df %>%
  kable(caption = "A Table of Sample Data") %>%
  kable_styling("striped", full_width = FALSE)
Table 7.1: A Table of Sample Data
Name Age Gender
Alice 24 Female
Bob 30 Male
Charlie 18 Male

This code generates the same striped table, but now with a caption: “A table of sample data.”

These are just the basics. Both kable and kableExtra provide numerous options for customizing your tables. I encourage you to explore their documentation and experiment with different settings.

7.8 Summary and Resources

R Markdown provides a powerful framework for dynamically generating reports in R. The “dynamic” part of “dynamically generating reports” means that the document is able to update automatically when your data changes. By understanding and effectively using Markdown syntax, R code chunks, chunk options, and YAML headers, you can create sophisticated, reproducible documents with ease like the document you are currently reading.

For an in-depth understanding of R Markdown, you may want to delve into R Markdown: The Definitive Guide, an extensive resource on the topic. Additionally, DataCamp’s course Reporting with R Markdown provides practical lessons on how to create compelling reports using this tool.