Converting markdown to LaTeX with md2tex
Posted on .
This post is part of a series:Making an integrated book/website about music coding using MkDocs, LaTeX, and Python
Markdown is a friendly format for writing readable plain text with rich text features. While there are different interpretations of how the markdown syntax should be defined, it’s widely used in technical writing for a reason: Markdown is easy to learn, yet expressive enough to represent many features of more verbose markup languages like HTML. In mkdocs, the conversion from markdown to HTML is handled by the excellent Python-Markdown library.
But what about converting markdown to LaTeX? There are several tools for this, including the Swiss army knife of document converters, pandoc. When I tried pandoc on the markdown sources for my book, it produced a lot of extraneous TeX commands. I would have to clean those up to get a usable TeX file, which was not ideal. I probably should have explored pandoc’s config options at this stage, but instead I looked for other tools that were more focused.
Enter md2tex
I eventually found a great Python CLI tool called md2tex, which I forked and added some features. It does exactly what I need: Converts the most common features of markdown to TeX.
Tools like pandoc and Python-Markdown convert the input to an intermediate representation called an abstract syntax tree (AST) before rendering the final output.
md2tex is simpler: It uses Python regular expressions to directly find markdown patterns and replace them with the matching LaTeX code. While this approach only works one way (from markdown to TeX), it lets us focus on supporting any markdown features that have a reasonable LaTeX equivalent.
From CLI to module
md2tex was initially made for CLI usage, but I needed to use it as a module in another Python script. With a small refactoring, I moved the main conversion functionality to a single convert function which can be imported elsewhere.
from md2tex import convert
md = """
# Sample document
Paragraph text.
- Here is a list item with some **bold text**.
- Another list item with [a link](https://example.com).
"""
latex = convert(md)
print(latex)\section{Sample document}
Paragraph text.
\begin{itemize}
\item Here is a list item with some \textbf{bold text}.
\item Another list item with \href{https://example.com}{a link}.
\end{itemize}Improving md2tex
I needed some additional functionality that md2tex didn’t provide out of the box. The codebase of md2tex is well organised and a pleasure to work with, so I began to add the functionality that I needed and fix a few bugs. The main features I added are introduced below.
Definition lists
In the book, I sometimes needed to define a set of terms in audio synthesis or explain the purpose of a handful of SuperCollider classes. For that purpose, Markdown has definition lists, which correspond rather nicely to LaTeX’s description environment (and HTML’s Description List elements).
scide
: SuperCollider's IDE with integrated docs.
sclang
: The standard SuperCollider interpreter.
scsynth
: The SuperCollider sound server.\begin{description}
\item[scide] SuperCollider's IDE with integrated docs.
\item[sclang] The standard SuperCollider interpreter.
\item[scsynth] The SuperCollider sound server.
\end{description}Source code
As the book is about coding and the first edition contains 347 code examples, it was important to have useful and well organised source code displays. I updated md2tex to add the following features:
- Fenced code blocks in markdown should be converted to minted environments in LaTeX.
- Every block of source code should have a descriptive title to help the reader understand its purpose. In the PDF version, the title becomes a caption for the code block.
- Coding language specification in fenced code blocks should be passed along to the
\mintedenvironments, so that the syntax highlighter knows which language syntax to expect. - We should be able to specify lines to stand out (confusingly also called “highlighted”), so that the reader may be able to focus on the relevant parts of a longer code example.
- Inline code strings in markdown should be converted to LaTeX’s
\mintinlineenvironment.
```sc title=“Some cool sounds” hl-lines=“1”
{ Pulse.ar }.play;
{ PinkNoise.ar }.play;
```\begin{listing}[H]
\begin{minted}[highlightlines={1}]{sc}
{ Pulse.ar }.play;
{ PinkNoise.ar }.play;
\end{minted}
\caption{Some cool sounds}
\end{listing}In the final rendering from LaTeX to PDF, having SuperCollider’s somewhat esoteric syntax highlighted properly required a bit of effort. I chose the LaTeX package minted over another popular choice called listings because this would provide better support for SuperCollider syntax. I’ll explain this in the next blog post.
Citations
As an academic textbook, my book contains citations and references to other sources. The plugin called mkdocs-bibtex conveniently supported rendering citations based on the same bibliography data format as LaTeX: BibTeX (do note that this specific plugin has since then been discontinued). Most citation managers like Zotero and Mendeley support exporting bibliographic data to BibTeX and related data formats. My setup uses a BibTeX variant called BibLaTeX. I know, it may be a bit confusing if you are new to LaTeX and citation processing. If you want to know more about the different citation packages in LaTeX, be my guest.
There is no standard syntax for citations in markdown. But pandoc defines one that is widely used, including by mkdocs-bibtex. As in LaTeX, each source is identified with a unique citation key which usually takes the form @eskildsen2025. That same citation key is used in LaTeX, so translating between the formats is not too complicated:
SuperCollider uses unit generators [@eskildsen2025, p. 60].SuperCollider uses unit generators \parencite[p. 60]{eskildsen2025}.If you choose the APA citation style, this will render in the text body as: “SuperCollider uses unit generators (UGens) (Eskildsen, 2025, p. 60).” A reference will be included in the reference with the corresponding bibliographic data, formatted as specified in the APA citation style guide.
Math
My book is not heavy on math, since the target audience is music students in the humanities. But it is useful to be able to show simple equations.
Math notation is tricky to represent in plain text, but this is one of the areas where LaTeX really shines. Its math notation has been adopted in several markdown rending frameworks, e.g. GitHub.
We can include LaTeX-equations directly in the markdown files, surrounded by $$ at the block level and $ inline. With the help of KaTeX, a JavaScript library, those equations are rendered with nice math formatting in the browser. And for the LaTeX version, since the equations are already written in LaTeX, no conversion is necessary.
We calculate the density of grains with this formula:
$$
\text{\small density} = \text{\small trigger frequency} \times \text{\small grain duration}
$$We calculate the density of grains with this formula:
\[
\text{\small density} = \text{\small trigger frequency} \times \text{\small grain duration}
\]Handling custom features outside of md2tex
While md2tex is useful, it is not the right tool for absolutely all aspects of converting my book to LaTeX. For instance, it does not process audio files for the simple reason that LaTeX does not have a mechanism for embedding audio into a PDF document. It does not know how to convert mermaid diagrams to LaTeX. It also is not designed to iterate over a bunch of markdown files and organise the converted output into a full LaTeX book.
If my contributions to md2tex were to be useful to anyone else, I should probably try to avoid scope creep. So, instead of making md2tex do everything, I created a Python script to take care of the rest of the conversion process. While this script is very far from ideal, it does the job and keeps md2tex reasonably simple. As I added features and worked with the output, this script quickly turned into a somewhat messy piece of code which nonetheless solves a lot of problems:
- Parses the markdown files in order.
- Preprocesses the markdown files with custom handling of some non-standard markdown syntax.
- Caches generated diagrams and graphics.
- Postprocesses the LaTeX-output, handling the integration between the PDF and web edition.
I’ll cover some of this in future blog posts.
This post is part of a series: Making an integrated book/website about music coding using MkDocs, LaTeX, and Python
- Part 1: An integrated book and website
- Part 2: Converting markdown to LaTeX with md2tex (this post)
- Part 3: Making a pygments lexer to syntax highlight SuperCollider code
- Part 4: New MkDocs plugins for embedding and visualizing audio