Is there a way to get a word count of natural language words in Markdown (or better, Pandoc Markdown), via the command line? It's possible to just use wc
to get a very rough estimate, but wc
is naive, and counts anything surrounded by white space as a word. This includes things like header formatting, bullet points, and URLs in links.
What would be ideal would be to remove all markdown formatting, (including Pandoc citations, if possible), and then pass that through wc
, but I can't find a way to do that, as the pandoc
plaintext output format still includes a lot of markdown styling.
Best Answer
There is a new lua filter for that: https://pandoc.org/lua-filters.html#counting-words-in-a-document
Save the following code as
wordcount.lua
and call pandoc like this: