the anthony dobranski blog

Pandoc makes for easy document conversion

Leave a comment

NaNoWriMo 2015, Day 01by the site’s count, 1966 words, seventy more than my goal. It went smoothly, without many breaks. When I was within 300 words of the goal I was more interested in using them to flesh out what I had already written than to push forward. This kind of internal editing is what they all tell you not to do, but it helped me find places I was repeating myself and places where I hadn’t gotten the point across.

==

My serial novel developed an idiosyncratic workflow for each week’s chapter:

  • I wrote plain text files using Markdown syntax which I discussed here
  • I send my editors Word docx format
  • My final version goes out in HTML

It works for the project. I like to compose drafts in plain text, with no settings to change or distracting questions from the software. Using Markdown syntax, I tag words or lines for later formatting. It’s also easier to write on the go. Markdown-aware smartphone text editors are nimbler tools on phones and tablets than full word-processors.

In Word docx format, my editors can use the Track Changes feature, letting me accept the edits into the final text with a click. Since the final destination is a WordPress website (and, an ebook), it “goes to press” as HTML.

Thanks to a nimble command-line document converter called Pandoc, I can get clean trustworthy conversions between different formats. I can use each app for what it does best, and maintain a smooth process.

Pandoc is quick and it writes plain files, making the switch between formats hassle-free. It runs from the command line, AKA the DOS prompt, ideally from the folder where your files reside.

Pandoc has quirky syntax but it’s very well-documented. The simplest command is to “output” (-o) a copy of an existing file, but in a different format:

pandoc -o output.format input.format

You can use additional commands to specify how Pandoc converts files, but usually the file extensions give enough information. Take my workflow as an example. To Word docx, from Markdown text:

pandoc -o chapterXdraft.docx chapterXdraft.md

After edits in Word, it goes out in HTML:

pandoc -o chapterX.html chapterX.docx 

By default, Pandoc copies text-level formatting (headings, emphasis, lists, links) but not document-level formatting (margins, fonts, styles). When I make a Word docx from a Markdown text, I assign the Pandoc-created Word file to a manuscript-format template before sending it. When that becomes HTML, the lack of internal formatting makes it easier to paste into the site’s own style and layout.

Pandoc will make ebooks, but the simple Pandoc output limits its use, and it doesn’t produce Kindle formats at this writing. If you want a professional-looking ebook, use tools like Calibre or Sigil, and bone up on your HTML. But, if you want to make a prototype, or for convenient reading of documents on a smartphone, Pandoc is quick and fast:

pandoc -o chapterX.epub chapterX.docx 

I hope you can make use of Pandoc’s speed and versatility in your own work.

Author: Anthony Dobranski

I'm a fiction writer, mostly.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s