blog

Static site generator in Rust

Programming | Rust

2022-12-27


To become familiar with Rust I implemented a static site generator as my first project. Below is a set of notes which I would’ve found helpful when I started. Code is available here.

Static sites

I’ve always liked the simplicity of static websites where all pages are generated ahead of time instead of waiting on a back-end server to generate pages on demand. Static sites are also a perfect fit for content delivery networks (CDN) which enables great performance.

Static site generators usually accept a simple text format like markdown as input to generate pages. A simple input format eliminates most of the tool/framework lock-in and substantially increases the probability of your content being usable 10 or 20 years down the line. This is probably why note-taking software like Obsidian has become so popular.

Writing your own generator seems like a good starter project since it has a well-defined outcome and lies in the Goldilocks zone in terms of size - big enough to expose you to most of the language features and not too big to overwhelm and take months to do. Plus it allows you to include only the site functionality you want and learn a bit about web development along the way.

The plan

Write a static site generator from scratch in Rust for a blog that I might use n1n \geq 1 times. It should

  1. be simple to modify
  2. produce a fast, minimal site
  3. support code syntax highlighting and LaTeX\LaTeX equations

Your own site generator

To generate a static site we:

  1. Parse markdown files and transform into html with a library like pulldown-cmark
  2. Insert them into html templates with a templating library; tera is a popular choice
  3. Create site index, copy over assets

To make the site pretty we add a style.css, a font or two and we’re done with the basics.

Syntax highlighting

By default pulldown-cmark detects code blocks, however, it won’t highlight the syntax. Luckily this is simple to adjust: pulldown-cmark outputs events during parsing, one of which is a code block. Once we hit this event we can override the default behaviour by feeding the code string block into a highlighter and outputting colorised html. We can also set the code block font to Jetbrains Mono which looks better and supports ligatures.

This gives us nice code snippets:

pub struct EventIterator<'a, I: Iterator<Item = Event<'a>>> {
    parser: MultiPeek<I>,
    has_katex: bool,
    image_scale: HashMap<String, f64>,
}

Images

pulldown-cmark parses embedded images just fine, but it’s worthwhile to add some extra logic to make the embedding process smoother and improve site performance:

Now the above command works:

cool
cool

Math equations

To render math equations we can use katex. But unlike code blocks pulldown-cmark won’t automatically detect or render math due to lack of standardisation 1, so we have to roll our own.

To delineate math blocks in markdown I use single dollar signs for inline math $ x^2 $ and double for display mode math $$ x^2 $$. We can then detect math blocks by checking each paragraph for $.

To render the equations we could leave the math blocks alone and include katex auto-render javascript in our site. But in the spirit of static generation, we can render the equations ahead of time with katex rust bindings and include the necessary fonts/css in our site instead of going to their CDN.

And now we can do this:

ex2dx=π \int_{-\infty}^{\infty} e^{-x^2} dx = \sqrt{\pi}

Other things

Self-hosting your assets like fonts and icons may improve performance since it avoids going to a third-party CDN. Besides, having a fully self-contained site is nice.

Modern font formats like WOFF2 take less space and nowadays have good support making them an easy choice. Font size can be further improved via subsetting. 2 Using a variable font keeps the codebase cleaner as you don’t have to store separate files for italic, bold, regular etc. Fontshare is a good resource.

End result

Overall I’m glad I chose this as a starter project and I’m left quite impressed with the quality of Rust tooling and documentation. Performance isn’t bad either - generating this site took 145ms on AMD Ryzen 5600x and Google PageSpeed seems happy too!

pagespeed.web.dev score
pagespeed.web.dev score

Footnotes


1

While Markdown is not a proper specification, CommonMark is but the spec doesn’t mention math once. Unfortunately, the popular $ delimiter allow for ambiguities which is a no-no for a spec.

For pulldown-cmark there’s lots of discussion on the issue going back to 2015 (!). However, there’s good progress on an extension.

2

This article on subsetting is great. However, there don’t seem to be great rust libraries to do it seamlessly and it might break things if you’re not careful, all to save what is likely less than ~50kb.