Writing your manuscript in R Markdown

I really don’t like the Microsoft Office Suite. With the exception of PowerPoint (which I can almost always get to do what I want), the remainder of the suite, especially Word and Excel (NO, THAT COLUMN IS NOT A DATE, and even if it is, DO NOT REFORMAT IT), can be quite frustrating to work with.

For the last 6-7 years, I was beholden to doing most of my writing inside the Microsoft environment. While it got the job done, necessary tasks for scientific manuscripts (e.g., inserting citations and building bibliographies) were time consuming and cumbersome. Once I started getting more savvy in R and started using Markdown to keep a daily log/journal of my programming, I pillaged the web for ways to write inside an R environment to help streamline my workflow - keeping all of my code, data, figures, and writing in the same space. So far, it has worked great for me, and I was able to write my PhD dissertation and two published manuscripts using RMarkdown. This post is a sort of distillation of what I have learned, and what my workflow currently looks like.

Keep in mind, it works for me, but it may not work for you. You’ll have to try it out and see what does!

Additionally, folks are free to use the RMarkdown manuscript template that I have created for their own use.

Why RMarkdown?

The more and more I needed to write, the more and more I would get frustrated with Word. Pagination. Inserting figures. Dealing with formatting. When presented with a “page layout” view that I think most of us love, it’s easy to get lost in how your writing “looks” rather than what you are writing. This is particularly true for me as I tend to enjoy visual art/design quite a bit.

Writing in Markdown changed all of that. The ability to write in your text editor of choice strips away all of the nonsense of formatting and lets you deal with how your work looks on the backend. Editors such as Atom have “Zen” plugins that hide all the crap you don’t need to see (Git panes, project folders) and let you focus on what you need to get out of your head.

By far the greatest advantage to using a markdown-based writing workflow is dealing with references. Provided you have a well-curated citation manager (Zotero, Papers, Mendeley), providing in-text citations is incredibly easy, and compiling your formatted reference list involves writing literally 12 characters.

As far as why to chose RMarkdown, R and RStudio have all of the tools and packages you need to make the setup and transition to writing in markdown easy.

The workflow

Every time I start a new project in R, I do so in a skeleton directory that contains standardized folders to keep my business organized. An ongoing project looks something like this:

my_r_project
  - project.Rproj
  - analysis_archive
  - code
    * data_cleanup.Rmd
    * data_analysis.Rmd
  - data
    - tabular_data
      * mydata.csv
    - spatial_data
  - data_archive
  - meeting_notes
  - model_output
  - model_plots
  - ms_body
    - ms_figs
    * manuscript_draft.Rmd
  - project_notes.Rmd
  - project_readme.Rmd

This directory structure lets me keep all things pertinent to a project in a single folder. Important to us right now, is the folders ms_body. Here, I keep all writing files (e.g., manuscript_draft.Rmd) and manuscript figures that get inserted into my Rmd documents.

Generally, the process of writing a manuscript is pretty straight forward. Once you have drafts completed, you are able to compile your RMarkdown document (called “knitting” using the R package knitr) into a PDF, Word file, or HTML.

Formatting your RMarkdown manuscript

I won’t go into too much detail of how to format your writing using Markdown, but rest assured it’s extremely easy. Check out the following:

https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf

First, create your manuscript RMarkdown document in your R project by selecting “File -> New File -> RMarkdown.” Before you edit anything, save it immediately to your manuscripts folder, whatever that may be called (in my case, ms_body). You can enter a file title and your name as the author, and select PDF as the default output format. Don’t worry, all of this can be changed later.

When you create this, the first thing that pops up is a RMarkdown template file:

rmarkdown_template

This file shows you generally how you can structure an RMarkdown file. We can get rid of all of this except the top portion that is delimited by three dashes (—). This section is called the YAML (Yet another markup language) header.

What’s in your YAML?

Generally, the YAML header contains a bunch of settings to help control how your finished product is rendered, where your bibliography file lives, what citation style you’re using, etc., etc. Below is the header from one of my current manuscripts.

---
title: |
  **Patterns of agricultural intensification drive bumble bee occurrence in the North American Midwest**  
author:
- Jeremy Hemberger
- Michael Crossley
- Claudio Gratton
date: "`r format(Sys.time(), '%B %d, %Y')`"
output:
  word_document: default
  pdf_document:
    fig_caption: yes
    keep_tex: yes
    latex_engine: pdflatex
  html_document:
    df_print: paged
fontfamily: mathpazo
fontsize: 11pt
geometry: margin = 1in
header-includes:
- \usepackage{setspace}\doublespacing
- \usepackage[left]{lineno}
- \linenumbers
- \usepackage{dcolumn}
- \usepackage{caption}
- \usepackage{float}
- \usepackage{afterpage}
- \usepackage{siunitx}
- \usepackage{amsmath}
keywords: Agricultural intensification, Bumble bees, bee decline, agroecosystems
bibliography: ./library.bib
csl: ./nature-conservation.csl
abstract: |
  Recently documented insect declines suggest that agricultural expansion and productivity-focused intensification are a putative driver for the observed declines. However, this hypothesis has yet to be formally tested due to a lack of historical data on agricultural land-use change and insect taxa. Using large, spatiotemporal sets of bumble bee and agricultural records, we show that agricultural intensification, specifically an increase cropland extent and insecticides, as well as a decrease in crop richness and perennial habitat, are associated with a decrease in the occurrence of bumble bee species in the agriculturally intensive Midwest. Yet, we found positive associations between bumble bees and crop richness and pasture, even in areas where agriculture is spatially extensive. Our findings suggest that insect conservation and agricultural production may be compatible, with increasing in on-farm and landscape-level crop diversity, pasture, and reduction in insecticides predicted to have positive effects on bumble bees.  
---

In this, I am telling R a bunch of information including:

  • Title: what the manuscript title is
  • Author: Who wrote the damn thing
  • Output: in what format the rendered RMarkdown file is rendered. You can also include template Word files to tell R how you want your document rendered if you choose to export it into Word format.
  • fontfamily: what font I am using
  • fontsize: you guessed it, font size!
  • geometry: formatting of the page itself, in this case the margins
  • header-includes: these are a bunch of LaTeX libraries that I tell R to use to help format various aspects of the paper. For a basic RMarkdown manuscript, this can be ignored. I’m just really picky.
  • keywords: if your template uses them, keywords for the manuscript
  • bibliography: this one is important, and it tells R where your bibliography file, called a bibtex file, lives in your project. In this case, I am telling it to look in the current folder (the “./”) for a file called library.bib.
  • link-citations:
  • csl: this is also important. It is the path to the file that specifies which citation style you wish to use. You can download these from: https://www.zotero.org/styles
  • abstract here’s our first bit of writing! An option the YAML gives you is to include your abstract here. It isn’t necessary, you could simply include it as part of your RMarkdown text, but here it is, block indented following a bar ("|") to tell R that it’s one paragraph.

To start off, I would suggest making your YAML simple and including just the title, authors, output (use a simple output: pdf).

Writing

After you get your YAML constructed, it’s time to get writing! There are many ways to format different manuscript sections, etc., I stick to using headings which in Markdown look like this:

# Introduction
## Subsection level 1
### Subsection level 2
#### Subsection level 3

Renders to…

Introduction

Subsection level 1

Subsection level 2

Subsection level 3

etc.

Citations

This is typically where Microsoft Word underperforms the most when writing. Fortunately, it is super easy to provide citations writing in Markdown using BibTeX.

Most citation managers organize a BibTeX file for your library and store it along with your PDFs. This file contains all of the necessary information for building a citation in a standardized fashion. Each item in a BibTeX library contains a citation key; essentially a short code that you can insert into your writing to tell R when to include a particular citation formatted to your choosing when you knit your document. This key is what you’ll need to find. It’s usually automatically generated and stored within the paper metadata in your reference manager.

For example, my short codes contain the authors last name and date of publication (e.g., Hemberger.2020). When writing, you can insert these citations by first inserting an @ symbol, and placing them into square brackets for end-of-sentence citations (e.g., [@Hemberger.2020]). You can also use them in context by simply using the @ symbol. (e.g., “@Hemberger.2020 found that…”).

To include all of your citations as a formatted reference section, you need simply write # References as a section of your paper. R (and some LaTeX on the backend) will automatically collect your references and place them there in your format of choosing (the .csl file you specified in your YAML header).

The best part about this is when you inevitably get rejected from your first journal of choice, reformatting your references is as simple as downloading the appropriate .csl file and changing the citation style specification in your YAML.

Inserting images

Inserting figures is also super easy. To do so, use the following markdown code:

![Image legend](path/to/image.png)

I typically write my figure captions using standard markdown formatting below the image. Additionally, I usually wrap my images with \newpage and \clearpage, which is some LaTeX that creates page breaks around my images/captions so they each live on their own page. I also will use a width parameter to control the rough size of the image on the page. Usually, I keep it at 100% of the width so the figure is as big as possible. Altogether, an figure and caption in my RMarkdown document will look like this:

\newpage
![](../ms_figs/fig_s2.png){ width=100% }
**Figure S2:** Interaction plots for other study species. Each line represents the expected trend (with 95% confidence interval) of probability of occurrence over time in a county given: a value of proportion cropland (panel columns: mean ± 1 standard deviation) and number of crops (line color and type: mean ± 1 standard deviation) in that county.
\clearpage

Knitting your document

Once you are ready to compile your markdown code into your output of choice, you need simply click the “knit” button in the code pane and wait for the magic to happen. Any errors will be visible in the R Markdown tab in your console pane (including missing references, LaTeX errors, etc.).

comments powered by Disqus