To become familiar with Rust I implemented a static site generator as my first project. Below is a set of notes which I would’ve found helpful when I started. Code is available here.

Static sites

I’ve always liked the simplicity of static websites where all pages are generated ahead of time instead of waiting on a back-end server to generate pages on demand. Static sites are also a perfect fit for content delivery networks (CDN) which enables great performance.

Static site generators usually accept a simple text format like markdown as input to generate pages. A simple input format eliminates most of the tool/framework lock-in and substantially increases the probability of your content being usable 10 or 20 years down the line. This is probably why note-taking software like Obsidian has become so popular.

Writing your own generator seems like a good starter project since it has a well-defined outcome and lies in the Goldilocks zone in terms of size - big enough to expose you to most of the language features and not too big to overwhelm and take months to do. Plus it allows you to include only the site functionality you want and learn a bit about web development along the way.

The plan

Write a static site generator from scratch in Rust for a blog that I might use $n \geq 1$ times. It should

be simple to modify
produce a fast, minimal site
support code syntax highlighting and $\LaTeX$ equations

Your own site generator

To generate a static site we:

Parse markdown files and transform into html with a library like pulldown-cmark
Insert them into html templates with a templating library; tera is a popular choice
Create site index, copy over assets

To make the site pretty we add a style.css, a font or two and we’re done with the basics.

Syntax highlighting

By default pulldown-cmark detects code blocks, however, it won’t highlight the syntax. Luckily this is simple to adjust: pulldown-cmark outputs events during parsing, one of which is a code block. Once we hit this event we can override the default behaviour by feeding the code string block into a highlighter and outputting colorised html. We can also set the code block font to Jetbrains Mono which looks better and supports ligatures.

This gives us nice code snippets:

pub struct EventIterator<'a, I: Iterator<Item = Event<'a>>> {
    parser: MultiPeek<I>,
    has_katex: bool,
    image_scale: HashMap<String, f64>,
}

Images

pulldown-cmark parses embedded images just fine, but it’s worthwhile to add some extra logic to make the embedding process smoother and improve site performance:

By default the image caption won’t be transferred to html even though the caption is captured as a part of the Image event. Like with syntax highlighting we can capture the parsing event and output adjusted html which includes the caption
When writing a post I want to paste images from the clipboard and not worry about resizing them manually. So I adjusted the image parser to look for an extra tag after the image: For example, ![cool](images/backflip.jpg){width=50%} would resize the image by half which makes life easier and saves a lot of bandwidth
To further reduce bandwidth I’ve added conversion of png and jpeg to webp

Now the above command works:

Math equations

To render math equations we can use katex. But unlike code blocks pulldown-cmark won’t automatically detect or render math due to lack of standardisation ¹, so we have to roll our own.

To delineate math blocks in markdown I use single dollar signs for inline math $ x^2 $ and double for display mode math $$ x^2 $$. We can then detect math blocks by checking each paragraph for $.

To render the equations we could leave the math blocks alone and include katex auto-render javascript in our site. But in the spirit of static generation, we can render the equations ahead of time with katex rust bindings and include the necessary fonts/css in our site instead of going to their CDN.

And now we can do this:

$\int_{-\infty}^{\infty} e^{-x^2} dx = \sqrt{\pi}$

Other things

Self-hosting your assets like fonts and icons may improve performance since it avoids going to a third-party CDN. Besides, having a fully self-contained site is nice.

Modern font formats like WOFF2 take less space and nowadays have good support making them an easy choice. Font size can be further improved via subsetting. ² Using a variable font keeps the codebase cleaner as you don’t have to store separate files for italic, bold, regular etc. Fontshare is a good resource.

End result

Overall I’m glad I chose this as a starter project and I’m left quite impressed with the quality of Rust tooling and documentation. Performance isn’t bad either - generating this site took 145ms on AMD Ryzen 5600x and Google PageSpeed seems happy too!

Footnotes

While Markdown is not a proper specification, CommonMark is but the spec doesn’t mention math once. Unfortunately, the popular $ delimiter allow for ambiguities which is a no-no for a spec.

For pulldown-cmark there’s lots of discussion on the issue going back to 2015 (!). However, there’s good progress on an extension.

This article on subsetting is great. However, there don’t seem to be great rust libraries to do it seamlessly and it might break things if you’re not careful, all to save what is likely less than ~50kb.