2 R Markdown components

Generally, R Markdown files consist of two parts. The first is the YAML front matter that contains the documents meta information and rendering options. It is located at the top of the R Markdown document. The second part is the main body that contains R code chunks and prose in Markdown format.

2.1 YAML front matter

YAML is a human-readable and easy to write language to define data structures.

You don’t need to familiarize yourself intimately with YAML to use R Markdown or papaja and I suggest you try to just use it. The following is the default YAML front matter; it starts and ends with ---:

---
title             : "The title"
shorttitle        : "Title"

author: 
  - name          : "First Author"
    affiliation   : "1"
    corresponding : yes    # Define only one corresponding author
    address       : "Postal address"
    email         : "my@email.com"
    role:         # Contributorship roles (e.g., CRediT, https://casrai.org/credit/)
      - Conceptualization
      - Writing - Original Draft Preparation
      - Writing - Review & Editing
  - name          : "Ernst-August Doelle"
    affiliation   : "1,2"
    role:
      - Writing - Review & Editing

affiliation:
  - id            : "1"
    institution   : "Wilhelm-Wundt-University"
  - id            : "2"
    institution   : "Konstanz Business School"

authornote: |
  Add complete departmental affiliations for each author here. Each new line herein must be indented, like this line.

  Enter author note here.

abstract: |
  One or two sentences providing a **basic introduction** to the field,  comprehensible to a scientist in any discipline.
  
  Two to three sentences of **more detailed background**, comprehensible  to scientists in related disciplines.
  
  One sentence clearly stating the **general problem** being addressed by  this particular study.
  
  One sentence summarizing the main result (with the words "**here we show**" or their equivalent).
  
  Two or three sentences explaining what the **main result** reveals in direct comparison to what was thought to be the case previously, or how the  main result adds to previous knowledge.
  
  One or two sentences to put the results into a more **general context**.
  
  Two or three sentences to provide a **broader perspective**, readily comprehensible to a scientist in any discipline.
  
  <!-- https://tinyurl.com/ybremelq -->
  
keywords          : "keywords"
wordcount         : "X"

bibliography      : ["r-references.bib"]

floatsintext      : no
figurelist        : no
tablelist         : no
footnotelist      : no
linenumbers       : yes
mask              : no
draft             : no

documentclass     : "apa6"
classoption       : "man"
output            : papaja::apa6_pdf
---

2.1.1 Manuscript metadata

Most of the information contained in the YAML front matter is information about the document.

YAML field Metadata
author list of author information (e.g., name and affiliation;
start each new author with -)
affiliation list of institutional information (id and institution)
authornote aka. acknowledgements; automatically contains corresponding author line
keywords article keywords
wordcount article word count
note text to add above author note on the title page
(e.g. “Preprint submitted for publication”)

title is the full name of the manuscript that is printed on the title page and at the beginning of the introduction. shorttitle is a short variant of the title and is printed in the running head on every page.

author is a structured list. Each author’s information is given in a separate element that is indented and starts with -. A full name and an affiliation symbol need to be provided for every author. The symbol is printed as superscript following the author’s name on the manuscripts title page and can be a combination of multiple symbols if the author has multiple affiliations (e.g., 1,2 yields Ernst-August Doelle1,2). Exactly one author has to be declared as corresponding author by adding corresponding: yes and the fields address and email to the author’s information. The latter two fields are used to automatically generate the last sentence of the author note that—as required by the APA guidelines—provides the corresponding author’s contact information. role is a list of contributorship roles that can be used to provide details on each authors contributions beyond the mere rank order of authors on the paper. It is recommended that you specify roles from one of the established contributorship role taxonomies (e.g., CRediT or CRO). Each author’s list of roles is used to automatically generate a paragraph in the author note that lists each authors contributions. Authors who contributed equally to the manuscript and share first authorship can be declared as such by setting equal_contrib: yes. When equal contributions are declared a superscript is added to the author names and a corresponding explanation is added to the author note.

For convenience, you can generate the YAML-code for the author-metadata with most of the above-mentioned information using the web application tenzing (Holcombe, Kovacs, Aust, & Aczel, 2020). Simply collect the metadata in an Excel or GoogleSheet, upload it to the web application and copy the generated YAML into the front-matter of your document.

affiliation is another structured list that contains the institutions’ information; again each new element is indented and starts with -. id is printed as superscript in front of the institution name below the author names on the title page (e.g., 2 yields 2 Konstanz Business School).

authornote and abstract contain the text for the corresponding sections of the manuscript. The | indicate that all following indented lines, including line breaks and new paragraphs, belong to this field. Thus, abstract and authornote can be structured into multiple paragraphs as shown above. keywords and wordcount1 are self-explanatory and the corresponding information is printed below the abstract on the second page of the manuscript.

bibliography contains a list of one or more Bib(La)TeX files that contain the references cited in the manuscript. Additional files can be added by appending them to the list separated by commata, e.g. bibliography: ["my_bib1.bib", "my_bib2.bib"]. By default, papaja assumes that you will cite R and all R packages used for the analysis and that the corresponding bibliographic information is contained in r-references.bib. See Citations for instructions on how to automate the citation of R packages.

note is an add additional metadata field that are not part of the default YAML front matter. It’s contents is printed on title page in between author information and the author note; it can be used to add remarks, such as “Preprint submitted to peer-review on June, 29th, 2016.”

2.1.2 Rendering options

The first set of rendering options breaks with the guiding principal that papaja is convertible because they have no effect on Microsoft Word documents. We decided to implement these options nonetheless because they are required by some journals and can be used for submissions where (initially) PDF/LaTeX documents are accepted.

YAML field Effect
mask Omit identifying information from the title page
appendix List of R Markdown files to include as appendices
numbersections Number sections headings
disambiguate_authors Disambiguate author names in citations by adding given names
annotate_references Add annotation prefix to bibliography entries, e.g. * for meta-analytic references
linenumbers* Add line numbers in the margins
draft* Add “DRAFT” watermark to every page
floatsintext* Place figures and tables in the text rather than at the end
figurelist*
tablelist*
footnotelist*
Create lists of figure captions, table captions, or footnotes
classoptions* control the style of the document
(e.g., man or doc, see apa6 \(\LaTeX\) class options)

* Only available for PDF documents

floatsintext indicates whether figures and tables are to be positioned at the end of the document—as required by APA guidelines—or included in the body of the document. If figurelist, tablelist, or footnotelist are set to yes lists of figure captions, table captions, or footnotes are created and placed after the reference section. linenumbers indicates whether lines should be continuously numbered throughout the manuscript. draft indicates whether a “DRAFT” watermark should be added to every page. classoptions are pased to the underlying apa6 \(\LaTeX\) document class and define the layout and style of the manuscript. For example, setting classoptions to man produces the default APA manuscript style, doc produces a more typical document (single-spaced, single-column).

The remaining rendering options affect both PDF and Microsoft Word documents. mask can be set to yes to prepare the manuscript for double-blind peer review by removing author names, affiliations, and author note from the title page. There are several additional options that are no not listed in the vanilla document template. To add one or more appendices to the manuscript, a list of R Markdown files can be passed to appendix. For details see Appendices. numbersections controls whether section headers are numbered or not. The remaining to options control formatting of citations and references. As per APA style, citations in papaja documents will be disambiguated by adding given names if the bibliography file contains multiple authors with the same family but different given names. This author name disambiguation can deactivated by setting disambiguate_authors: no, for details see Author name disambiguation. When reporting meta-analyses, APA guidelines require that studies included in the meta analysis are included in the reference section and preceeded by an asterisk (p. 138, American Psychological Association, 2010). To do this, set annotate_references: yes and add * (or other indicators) to the annote field of the corresponding bibliography entries, for details see Meta-analysis references.

The output field determines the type of document that is created and can be either papaja::apa6_pdf or papaja::apa6_word. Additional arguments can be passed to these functions, e.g. to delete the LaTeX source file after the PDF has been rendered (see ?apa6_pdf for details):

output: 
  papaja::apa6_pdf:
    keep_tex: FALSE
  • Extend bookdown::pdf_document2() and bookdown::word_document2()
    • Options of these functions should also work for papaja documents

2.1.2.1 Additional rendering options

Any additional rendering options supported by rmarkdown::pdf_document() or rmarkdown::word_document() can be added to the YAML front matter. For example, a different citation style can be used by setting csl to the location of a CSL style definition file. For an overview refer see the list of default variables, language variables, and LaTeX variables supported by pandoc and the filter modes supported by pandoc-citeproc.

papaja provides one additional rendering option that is not included in the default YAML header. replace_ampersands indicates whether to replace & by and in all in-text citations as required by the APA citation style. The option defaults to yes if no custom CSL file is specified but to no otherwise. To replace ampersands in in-text citations with a custom CSL file add replace_ampersands: yes to the YAML front matter.

2.2 Body

The body of R Markdown documents consists of prose written in Markdown interspersed with R code.

2.2.1 Markdown

A general principle in typesetting is to separate content and style. Separation is commonly achieved through the use of a markup language, which is a system of document annotations. These annotations declare portions of text as, for example, title, section headings, or list items but, crucially, they are agnostic to what this means visually (e.g., **text** instead of text). Separation of content and style is useful because it enables swift changes to an entire documents’ style or structure and facilitates applying a common style across multiple documents. Microsoft Word implements this strategy in their so-called styles, which are collections of formatting instructions that can be assigned to portions of the text.

papaja uses the Markdown syntax to separate content and style. Markdown was desigend to be an easy-to-read and -write formatting syntax. The key design goal was to create a syntax that is readable as-is. The following is an excerpt from the APA example manuscript written in Markdown.

# Methods
## Participants
Younger adults (14 women, 10 men, $M_{age} = 19.5$ years, age
range: 18–22 years) were recruited with flyers posted on the
Boston College campus. <!–– TODO: Add flyer to appendix! ––>

Older adults (15 women, nine men, $M_{age} = 76.1$ years, age
range: 68–84 years) were recruited through the Harvard
Cooperative on Aging (see Table 1, for demographics and test
scores).[^p]

[^p]: Analyses of covariance were conducted with these
covariates,with no resulting influences of these variables on
the pattern or magnitude of the results.

Without any prior knowledge of Markdown it should be easy to guess what most of this syntax does: # and ## denote first and second order section headings (and so on), <!-- and --> surround comments that will not be displayed in the rendered document, and [^p] adds a reference to a footnote that is written out following [^p]:. The equations surrounded by $ may be somewhat less intuitive. In R Markdown, equations are written in LaTeX’s powerful mathematics syntax (also see Equations). For an overview of the Markdown syntax see Markdown text formatting.

2.2.2 R code

There are two ways to include R code in R Markdown documents. The primary method is to use so-called code chunks.

```{r, code-chunk, echo = FALSE}
age_mean <- mean(demographics$age)
```

Code chunks are surrounded by ``` and should be preceeded and followed by an empty line. The behavior of code chunks can be controlled by passing comma-separated options surrounded by braces to the chunk. The first option is compulsory and specifies what langauge the code chunk contains and usually be r, but other languages, such as python, stan, or SQL are supported. RStudio’s knitr language engines documentation provides an overview of all supported languages.

The second option of a code chunk is its label or name, here code-chunk. It is good practice to provide a meaningful name for each chunk because chunk names

  1. briefly describe the following codes purpose and, hence, are a means of structuring and documenting the code
  2. serve as meaningful names for graphic files that the chunks produce
  3. are required to control when the code of a cached chunk has to be rerun because a preceeding chunk has changed (see Caching expensive computations).

By default, papaja sets the chunk options echo = FALSE and message = FALSE for all chunks. echo = FALSE means that the code chunk is run but hides the code and message = FALSE hides any messages generated by the code so that they are not included in the manuscript. Both settings can be overwritten if desired. See the knitr chunk options and package options for a overview of available settings.

R code can also be inserted into in a line of text, for example `r mean(demographics$age)`. Writing R code inline is a convenient way of reporting results of computations in the running text. For example, the mean and standard deviation of participants’ age are stored in variables named age_mean and age_sd. These numbers can be reported by printing the variables in inline R code.

Participants mean age was `r age_mean` years ($SD = `r age_sd`$).

As in any other R script, R code is executed from top to bottom. Thus, any variables that are called in code chunks or inline R code must be defined above.

Reporting numerical values frequently requires some formatting, such as rounding or filling with trailing zeros (e.g., 24.10). In the section Numerical values, we provide an overview of papaja functions that can be used to easily format simple numbers or results from statistical analyses according to APA guidelines.


  1. We hope to automate the word count in the future. For now, the current word count is printed to the R Markdown pane in RStudio or the console. See Counting words for details.↩︎

Comments and Questions


Icons by Icons8