Pandoc is a document converter for multiple type of files.
Converting example
Format conversion is a chore that often takes a lot of time. I recently wanted to convert a long MSWord document into a version of Markdown that I could use to update a software documentation.
Pandoc can convert many documents types into many others, here we’ll see how we can convert into Markdown using a docker version of the software so that we don’t even need to install Pandoc. For more on Docker see Docker – Beginner for Biologists. I used the docker image called pandoc/latex.
First, download the docker image: docker pull pandoc/latex
The we can create an alias as suggested by the docker page:
alias pandock=\
'docker run --rm -v "$(pwd):/data" -u $(id -u):$(id -g) pandoc/latex'
so now we can simply use the alias name.
To convert a “generic” MSWord file into a Markdown file it would be as simple as:
pandock -s example.docx -t markdown -o
However, my documents had a few images, and I wanted to convert to a Markdown version that is “GitHub friendly” (called gfm
.) In addition I wanted to make sure that no files were truncated, so I added the option --wrap=none
. The final command looked like this:
pandock -s my.docx --wrap=none -t gfm --extract-media=images -o
In this process the MSWord file my.docx
gets converted to the Markdown file
without limiting lines to the default of 80 characters and at the same time extracting the images into a directory called images
While it was not perfect, this was very useful to creating the final documentation that is now here: htcondor_biochem_v1.5.5
Other examples
These online examples were useful in crafting the final command:
– Convert Docx To Markdown With Pandoc – [Archived]
– Convert Word documents to Markdown, HTML or any other format – [Archived]
The latter documents shows examples of the large scope of pandoc with examples converting from:
– Markdown to HTML
– HTML to Markdown
– Word to Markdown
– Word to HTML
– Markdown to PDF
– Markdown to plain text
There are 39 more examples on the Pandoc demos page.