Markdown – How to Convert Files to Dokuwiki on PC

dokuwikifile conversionmarkdown

I'm looking for a tool or script to convert Markdown files to Dokuwiki format, that will run on a PC.

This is so that I can use MarkdownPad on a PC to create initial drafts of documents, and then convert them to Dokuwiki format, to upload to a Dokuwiki installation that I have no control over. (This means that the Markdown plugin is no use to me.)

I could spend time writing a Python script to do the conversion myself, but I'd like to avoid spending time on this, if such a thing exists already.

The Markdown tags I'd like to have supported/converted are:

Heading levels 1 – 5
Bold, italic, underline, fixed width font
Numbered and unnumbered lists
Hyperlinks
Horizontal rules

Does such a tool exist, or is there a good starting point available?

Things I've found and considered

I initially thought that txt2tags would be helpful, but although it can write both markdown and Dokuwiki, it is very tied to its own specific input format
I've also seen Markdown2Dokuwiki, and although I'd certainly be willing to use a sed script, even on a PC, this only supports a tiny, tiny part of Markdown's syntax.
python-markdown2 also sounded promising, but it only writes out HTML.
pandoc – but it doesn't support Dokuwiki output
MultiMarkdown – does not appear to support Dokuwiki output

Best Answer

Stop-Press - August 2014

Since Pandoc 1.13, Pandoc now contains my implementation of DokuWiki writing - and many more features are implemented there than in this script. So this script is now pretty-much redundant.

Having originally said I didn't want to write a Python script to do the conversion, I ended up doing just that.

The real time-saving step was to use Pandoc to parse the Markdown text, and write out a JSON representation of the document. This JSON file was then mostly fairly easy to parse, and write out in DokuWiki format.

Below is the script, which implements the bits of Markdown and DokuWiki that I cared about - and a few more. (I've not uploaded the corresponding test suite that I wrote)

Requirements to use it:

Python (I was using 2.7 on Windows)
Pandoc installed, and pandoc.exe in your PATH (or edit the script to put in the full path to Pandoc instead)

I hope this saves someone else some time too...

Edit 2: 2013-06-26: I've now put this code into GitHub, at https://github.com/claremacrae/markdown_to_dokuwiki.py. Note that the code there adds support for more formats, and also contains a test suite.

Edit 1: adjusted to add code for parsing code samples in Markdown's backtick style:

# -*- coding: latin-1 -*-

import sys
import os
import json

__doc__ = """This script will read a text file in Markdown format,
and convert it to DokuWiki format.

The basic approach is to run pandoc to convert the markdown to JSON,
and then to parse the JSON output, and convert it to dokuwiki, which
is written to standard output

Requirements:
 - pandoc is in the user's PATH
"""

# TODOs
# underlined, fixed-width
# Code quotes

list_depth = 0
list_depth_increment = 2

def process_list( list_marker, value ):
    global list_depth
    list_depth += list_depth_increment
    result = ""
    for item in value:
        result += '\n' + list_depth * unicode( ' ' ) + list_marker + process_container( item )
    list_depth -= list_depth_increment
    if list_depth == 0:
        result += '\n'
    return result

def process_container( container ):
    if isinstance( container, dict ):
        assert( len(container) == 1 )
        key = container.keys()[ 0 ]
        value = container.values()[ 0 ]
        if key == 'Para':
            return process_container( value ) + '\n\n'
        if key == 'Str':
            return value
        elif key == 'Header':
            level = value[0]
            marker = ( 7 - level ) * unicode( '=' )
            return marker + unicode(' ') + process_container( value[1] ) + unicode(' ') + marker + unicode('\n\n')
        elif key == 'Strong':
            return unicode('**') + process_container( value ) + unicode('**')
        elif key == 'Emph':
            return unicode('//') + process_container( value ) + unicode('//')
        elif key == 'Code':
            return unicode("''") + value[1] + unicode("''")
        elif key == "Link":
            url = value[1][0]
            return unicode('[[') + url + unicode('|') + process_container( value[0] ) + unicode(']]')
        elif key == "BulletList":
            return process_list( unicode( '* ' ), value)
        elif key == "OrderedList":
            return process_list( unicode( '- ' ), value[1])
        elif key == "Plain":
            return process_container( value )
        elif key == "BlockQuote":
            # There is no representation of blockquotes in DokuWiki - we'll just
            # have to spit out the unmodified text
            return '\n' + process_container( value ) + '\n'

        #elif key == 'Code':
        #    return unicode("''") + process_container( value ) + unicode("''")
        else:
            return unicode("unknown map key: ") + key + unicode( " value: " ) + str( value )

    if isinstance( container, list ):
        result = unicode("")
        for value in container:
            result += process_container( value )
        return result

    if isinstance( container, unicode ):
        if container == unicode( "Space" ):
            return unicode( " " )
        elif container == unicode( "HorizontalRule" ):
            return unicode( "----\n\n" )

    return unicode("unknown") + str( container )

def process_pandoc_jason( data ):
    assert( len(data) == 2 )
    result = unicode('')
    for values in data[1]:
        result += process_container( values )
    print result

def convert_file( filename ):
    # Use pandoc to parse the input file, and write it out as json
    tempfile = "temp_script_output.json"
    command = "pandoc --to=json \"%s\" --output=%s" % ( filename, tempfile )
    #print command
    os.system( command )

    input_file = open(tempfile, 'r' )
    input_text = input_file.readline()
    input_file.close()

    ## Parse the data
    data = json.loads( input_text )
    process_pandoc_jason( data )

def main( files ):
    for filename in files:
        convert_file( filename )

if __name__ == "__main__":
    files = sys.argv[1:]

    if len( files ) == 0:
        sys.stderr.write( "Supply one or more filenames to convert on the command line\n" )
        return_code = 1
    else:
        main( files )
        return_code = 0

    sys.exit( return_code )

Examples: dpi, width, height.

If you give it the dpi information:

Add the --dpi option as stated to override the default.

If most of your pictures have a common height or width, that should be easily corrected.

For example, you changed the line to:

![my caption](./figures/myimage.png){ width=250px }

![my caption](./figures/myimage.png){ height=256px }

Or do this in straight HTML markup:

<img src="./figures/myimage.png" alt="my caption" style="width: 250px;"/>

<img src="./figures/myimage.png" alt="my caption" style="height: 256px;"/>

and the ratio will be correct.

Reference: Pandoc Readme

For HTML and EPUB, all attributes except width and height (but including srcset and sizes) are passed through as is. The other writers ignore attributes that are not supported by their output format.

The width and height attributes on images are treated specially. When used without a unit, the unit is assumed to be pixels. However, any of the following unit identifiers can be used: px, cm, mm, in, inch and %.

Dimensions are converted to inches for output in page-based formats like LaTeX. Dimensions are converted to pixels for output in HTML-like formats. Use the --dpi option to specify the number of pixels per inch. The default is 96dpi.

The % unit is generally relative to some available space. For example the above example will render to <img href="file.jpg" style="width: 50%;" /> (HTML), \includegraphics[width=0.5\textwidth]{file.jpg} (LaTeX), or \externalfigure[file.jpg][width=0.5\textwidth] (ConTeXt).

Some output formats have a notion of a class (ConTeXt) or a unique identifier (LaTeX \caption), or both (HTML).

When no width or height attributes are specified, the fallback is to look at the image resolution and the dpi metadata embedded in the image file.

Best Answer

Stop-Press - August 2014

Related Solutions

Correctly sizing PNG images in markdown with pandoc for html/pdf/docx

Examples: dpi, width, height.

If you give it the dpi information:

If most of your pictures have a common height or width, that should be easily corrected.

Reference: Pandoc Readme

Related Question