Chris Grieser

Researcher in sociology & software developer

Comprehensive Academic Workflow from Reading to Writing in Markdown

Last update: 20. December 2021

Table of Contents

Introduction

This is an overview over a comprehensive academic workflow, from getting annotations into Obsidian to writing a publication with Pandoc citations in Obsidian as well. There has already been a detailed description of how to achieve this with Zotero and MDnotes.

The workflow I describe here deals with very much the same problems, but takes a different direction. The goal of this post is, however, not to provide a step-by guide for implementing a workflow with certain apps or plugins. Rather, this post will discuss a general workflow for academic work which can be implemented with a variety of tools. Basically, one goal of this post is not to explain how to use certain tools, but rather to explain what the tasks are you should be using software tools for. And, of course, how to structure the individual tasks in such a way that everything works with minimal friction. The other goal of this post is to provide an overview of tools that can be used to accomplish those tasks. The overall intention is to give readers the autonomy to choose their own set of tools to customize their individual workflow.

First, I will outline the Academic Workflow from Reading to Writing in general and how to how it is implemented in different workflows by the average user and by the advanced Zotero user. Then, I will briefly outline my implementation and discuss why I do not use Zotero for it.(Disclaimer: I haven’t used Zotero 6.) I will end with an extensive overview of several tools (apps, plugins) that together provide comparable features.

The Academic Workflow from Reading to Writing

The generalized version

This basic academic workflow looks something like this: You find an article on a website, and save it in your reference library (i.e., reference manager). The library entries should be linked to the PDF file of the paper, from which we want to extract notes to our knowledge base (i.e., Obsidian). Ideally, we want to keep the link between library entry, knowledge base, and PDF.

Later on, you use your knowledge base to write an article of your own. When you cite a paper you have read, you basically create a link between the Draft and the entry in your reference library. During the final compiling of finished draft, the citations are used to add the bibliography to the finished manuscript. Ideally, you also use a template to save the time of creating a new layout every time you write a paper.

abstract academic workflow

The simple Word-Zotero version

However, the outlined workflow is not only a very generalized version, but quite idealized. The “average” researcher, mostly far from being power users, will often have a much simpler version. Let’s imagine such a less tech-savvy academic, who still uses Microsoft Word. In addition, that person uses a reference manager (let’s say Zotero), and uses the builtin plugin for Word Citations and the Zotero Browser extension, because those two tools are quite easy to use and are “officially” provided/advertised by Zotero. In their case, the workflow from reading to writing should look something like the graph to the left.

Apart from the lack of a dedicated note-taking system, quotes and information from PDFs are mostly copy-pasted, and the unorganized notes (e.g. a long list of bullet points in a separate document 🥶) will contain author and year of the reference. While this is an “implicit” link to the entry in the reference library, this is far from perfect (e.g., ambiguity when there are two papers by the same author in the same year).

Another thing to note is that the missing link between the unorganized scrawl and the PDF makes going back to the context of a quote rather tedious: our “average” researcher will open up Zotero, search for the respective entry, and then open the PDF. Also, working on multiple devices gets quite complicated since Zotero and PDFs may both for themselves sync between devices, but keeping the link between them is definitely a struggle without further plugins.

Nevertheless, this workflow does accomplish some things well: using the first-party Zotero tools, articles are easily saved and citations are easily inserted. The compiling of the final document is particularly easy, since using the Zotero-Word-workflow enables to export the draft as PDF with just a few clicks.

unorganized academic workflow

The Zotero-ZotFile-BetterBibTeX-MDNotes-Obsidian-Pandoc version

Now many people here and especially power users dislike Word and would prefer to write in Markdown. As (on average) technologically more adept users, they want to automate tedious tasks like copy-pasting quotes. In addition, they use have a note-taking system like Obsidian. Following the much-mentioned Zotero-ZotFile-MDNotes-Workflow and adding Pandoc, their system is more sophisticated and should look like this.

Compared to the simpler Word-Zotero-Workflow, this has many advantages:

However, there are also some disadvantages:

The Zotero-MDNotes-Obsidian Workflow. (The graph is unfortunately mirrored, since mermaid.js weighs stuff differently here.)*

Decomposing the workflow into subtasks

As I stated before, my solution to several problems is/was to leave Zotero. Leaving Zotero can be explained by the fact, that for a lot of functionality, you actually do not use Zotero, but rather the layer on top of it created by its plugins.

So, by working directly with the .bib file, you would practically “cut out the middleman” (Zotero) in this scenario. But as the aforementioned tools form an integrated ecosystem around Zotero, replacing the weaker parts (BetterBibTeX and Zotero) means that you must also look for replacements the better parts (MDNotes, Zotero Browser Extension).

So to approach this challenge, it is basically necessary to decompose the generalized workflow into subtasks, for which you can find individual tools. The goal would be to find a set of tools that works with little friction, but also does not lock you into one ecosystem, as any lock-in would leave you with sub-optimal solution to certain subtasks.

*The BiBTeX Library is the central point of interaction – not Zotero.

The way I see it, the generalized workflow from reading to writing includes the following subtasks:

  1. Organizing references – usually solved by a reference management app
  2. Saving references – a.k.a. “get the paper from the browser page into the reference manager without entering everything manually”.
  3. Linking PDFs, reference entries, and citations – ideally in the most future-proof way
  4. Extracting Annotations/Notes from PDFs – the effort of copy-pasting only facilitates unorganized scrawls instead of organized notes.
  5. Citation Picker for automatic citations – we really do not want to type out everything or switch between reference manager and writing app.
  6. Bibliography Creation – we want to automatically generate the bibliography found at the end of a paper.
  7. Compiling Markdown with Citations – basically Pandoc with --citeproc, but we would very much like a simpler solution than a CLI.

Overview: tools for the individual sub-tasks

Instead of simply presenting my set of plugins/tools, I will rather lay out an overview of tools for every subtask. The main reasoning behind this, is that I use the launcher app Alfred for the most part, and Alfred is unfortunately a Mac only app. In addition, I think you everyone would get far better results when customizing the tool set to their own individual needs anyway. For those interested, I will conclude this post with a brief outline of my own workflow.

1. Organizing references

Reference managers are the most straightforward part, since basically all researchers should already be familiar with them. For this reason, I will not repeat the overviews of reference management apps already done elsewhere, but focus on reference managers that work with .bib files needed by Pandoc. Reason is, that Pandoc is pretty much the only compiling tool we have when we want to convert Markdown with citations to Word or PDF.

I will also forego the solutions that only export to the .bib format, since exports without live-syncing of a library with the BibTeX-File (like BetterBibTeX does) makes it impossible to use a citation picker’s with a .bib that is not reflecting “live” your library (point #4). And if you use use a citation picker that does not work on .bib files (e.g. Zotero’s Word Plugin), you won’t be able to use Markdown/Pandoc (point #7). So for a workflow with automatic citations/bibliography in markdown more or less requires a reference manager working on BibTeX (or at least one live-syncing to a .bib – although this will have the pitfalls like BetterBibTeX’ syncing issues).

Regarding reference managers working on BibTeX files, the field is actually rather small. To my knowledge there are:

Note that both, BibDesk and JabRef can automatically rename and move PDFs. This means either of them are basically equivalent to Zotero + BetterBibTex + Zotfile.

2. Saving References

There are also tons of other solutions, but again, I will limit the list to the ones that can be used with the BibTeX format.

curl -sLH "Accept: application/x-bibtex" 

3. Linking PDFs, Reference Entries, and Citations

The simple solution for this is to simply use author & year manually. The proper solution to avoid complications (e.g. multiple publications by the same author in the same year), is to either use some sort of hyperlink, or to use a unique identifier like the citekey.

4. Extracting Annotations/Notes From PDFs

As this step does not require the BibTeX-format, there are far more options here. The only requirement for the apps and plugins listed below was that they can export the annotations as Markdown.

5. Citation Picker for Automatic Citations

Citation Pickers should be compared based on two criteria 1. from which library format (Zotero, BibTeX/.bib, etc.) you can call the picker, and 2. into which app you can insert the citations.

http://127.0.0.1:23119/better-bibtex/cayw

6. Bibliography Creation

The options here are fairly limited, since the only method of getting citations from a markdown document into a compiled Word or PDF file is Pandoc – the alternative being the Word processor plugins from reference managers like Zotero, which requires writing in Word, OpenOffice, or GoogleDocs (which we didn’t want to do in the first place).

However, there are two very noteworthy tools that approach bibliography creation from a totally different angle, namely the two Pandoc Filters url2cite and Manubot-Pandoc-Filter (part of the Manubot Project). Instead of saving an article in the reference library and then using a citation picker to cite them in the writing app, you directly use an identifier like the DOI or URL with a pseudo-pandoc-markdown-syntax to insert them directly into the draft. The promo screenshot from Manubot to the right explains the approach probably better than words can.

Showcase manubot

manubot/url2cite as a shortcut get citations into your
manuscript

Regarding the generalized academic workflow, this direct-identifier-approach basically works by skipping reference managers entirely, skipping subtask #1 (saving reference), subtask #2 (organizing references), #3 (linking references), and #5 (citation picking).

However, there is also the drawback that you use reference managers for a reason: organizing references for later use. When you want to use a reference two papers later again, a reference manager is still the way to go. Furthermore, identifiers in itself aren’t very useful for disciplines where author identities matter (mostly social sciences & humanities). Nevertheless, the extreme ease of use of this approach makes it too tempting to not at least use it for one-off-citations for which you are certain that you will only need them for one paper.

7. Compiling Markdown with Citations (Pandoc)

As mentioned before, there is no way around Pandoc for compiling Markdown documents with citations. There are, however, multiple methods of actually using Pandoc, many of which are more user-friendly than using the Terminal.

pandoc "path/to/input.md" -o "path/to/input.docx" --citeproc --bibliography "path/to/library.bib" --csl "path/to/citation_style.csl"

8. Bonus: organizing longform writing

Not strictly part of the workflow discussed here, but a much related issue is the question of how to organize longform writing process itself. The basic approach (“dump everything into one document”) is of course unsatisfactory. However, there are several very different solutions to this:

My own Implementation of the Workflow

This leaves me to conclude with one last mermaid diagram, depicting my very own workflow. As all the tools used have been mentioned in the sections before, little additional explanation should be necessary.

my own workflow