UPDATE 2023-03-27: This page is obsolete, as it refers to a prior version of this blog. However, it may be of historical interest.
As noted in a previous post, I am a big fan of the Markdown text-to-HTML conversion tool. However, nothing’s perfect. I already discussed a bug involving link ids, and I subsequently found one other reason to patch Markdown, for sites like mine that generate both HTML 4.01 Strict and XML pages (an Atom feed in my case).
The issue is that by default Markdown assumes that it is actually generating XHTML, not HTML, and by default therefore uses the XHTML syntax for empty elements, where empty elements such as the BR
, HR
, or IMG
tags must either have an end tag or the start tag must have a ’/’. (Note that some people recommend putting a space before the ’/’ in order to satisfy both HTML and XHTML requirements, but this apparently opens a can of worms and hence should be avoided in my opinion.)
By default Markdown uses the XHTML form for empty elements. However in my case I’m generating “real” HTML (as opposed to XHTML sent with a text/html
content type), so I need to change this behavior. The standard way to do this in vanilla Markdown is to modify the configurable variable $g_empty_element_suffix
, setting it to the proper suffix for HTML. This fixes HTML pages, but unfortunately then breaks XML pages, in particular the Atom feed generated by Blosxom.
I therefore decided to patch Markdown to be more intelligent about setting the empty element suffix. The strategy I decided upon was to set $g_empty_element_suffix
in the start
subroutine based on the current Blosxom flavour: If the flavour is “html” then the suffix is set to the HTML form, otherwise it is left at the default value. (Note that the patch won’t work if you’re really using XHTML under the “html” flavour; this case is hard to code for because there’s no easy way to tell that XHTML is intended rather than HTML.)
I’ve created two patches for this problem: a Markdown 1.0 patch and a Markdown 1.0.1 patch. The code is identical; the only difference is where the patch gets applied in terms of line numbers.
UPDATED: For some reason my preferred news aggregators (NewsFire and NetNewsWire) have an unfortunate habit of taking example HTML/XHTML code snippets (enclosed in a CODE
element and escaped using character entities) and interpreting them as actual tags. For that reason I’ve updated this post to remove examples of empty element syntax until I figure out exactly what’s going on and can work around the problem.