To become familiar with Rust I implemented a static site generator as my first project. Below is a set of notes which I would’ve found helpful when I started. Code is available here.
Static sites
I’ve always liked the simplicity of static websites where all pages are generated ahead of time instead of waiting on a back-end server to generate pages on demand. Static sites are also a perfect fit for content delivery networks (CDN) which enables great performance.
Static site generators usually accept a simple text format like markdown as input to generate pages. A simple input format eliminates most of the tool/framework lock-in and substantially increases the probability of your content being usable 10 or 20 years down the line. This is probably why note-taking software like Obsidian has become so popular.
Writing your own generator seems like a good starter project since it has a well-defined outcome and lies in the Goldilocks zone in terms of size - big enough to expose you to most of the language features and not too big to overwhelm and take months to do. Plus it allows you to include only the site functionality you want and learn a bit about web development along the way.
The plan
Write a static site generator from scratch in Rust for a blog that I might use times. It should
- be simple to modify
- produce a fast, minimal site
- support code syntax highlighting and equations
Your own site generator
To generate a static site we:
- Parse markdown files and transform into html with a library like pulldown-cmark
- Insert them into html templates with a templating library; tera is a popular choice
- Create site index, copy over assets
To make the site pretty we add a style.css
, a font or two and we’re done with the basics.
Syntax highlighting
By default pulldown-cmark
detects code blocks, however, it won’t highlight the syntax. Luckily this is simple to adjust: pulldown-cmark
outputs events during parsing, one of which is a code block. Once we hit this event we can override the default behaviour by feeding the code string block into a highlighter and outputting colorised html. We can also set the code block font to Jetbrains Mono which looks better and supports ligatures.
This gives us nice code snippets:
pub struct EventIterator<'a, I: Iterator<Item = Event<'a>>> {
parser: MultiPeek<I>,
has_katex: bool,
image_scale: HashMap<String, f64>,
}
Images
pulldown-cmark
parses embedded images just fine, but it’s worthwhile to add some extra logic to make the embedding process smoother and improve site performance:
- By default the image caption won’t be transferred to html even though the caption is captured as a part of the
Image
event. Like with syntax highlighting we can capture the parsing event and output adjusted html which includes the caption - When writing a post I want to paste images from the clipboard and not worry about resizing them manually. So I adjusted the image parser to look for an extra tag after the image: For example,
![cool](images/backflip.jpg){width=50%}
would resize the image by half which makes life easier and saves a lot of bandwidth - To further reduce bandwidth I’ve added conversion of
png
andjpeg
towebp
Now the above command works:
Math equations
To render math equations we can use katex. But unlike code blocks pulldown-cmark
won’t automatically detect or render math due to lack of standardisation 1, so we have to roll our own.
To delineate math blocks in markdown I use single dollar signs for inline math $ x^2 $
and double for display mode math $$ x^2 $$
. We can then detect math blocks by checking each paragraph for $
.
To render the equations we could leave the math blocks alone and include katex
auto-render javascript in our site. But in the spirit of static generation, we can render the equations ahead of time with katex rust bindings and include the necessary fonts/css in our site instead of going to their CDN.
And now we can do this:
Other things
Self-hosting your assets like fonts and icons may improve performance since it avoids going to a third-party CDN. Besides, having a fully self-contained site is nice.
Modern font formats like WOFF2 take less space and nowadays have good support making them an easy choice. Font size can be further improved via subsetting. 2 Using a variable font keeps the codebase cleaner as you don’t have to store separate files for italic, bold, regular etc. Fontshare is a good resource.
End result
Overall I’m glad I chose this as a starter project and I’m left quite impressed with the quality of Rust tooling and documentation. Performance isn’t bad either - generating this site took 145ms on AMD Ryzen 5600x and Google PageSpeed seems happy too!
Footnotes
While Markdown is not a proper specification, CommonMark is but the spec doesn’t mention math once. Unfortunately, the popular $
delimiter allow for ambiguities which is a no-no for a spec.
For pulldown-cmark
there’s lots of discussion on the issue going back to 2015 (!). However, there’s good progress on an extension.
This article on subsetting is great. However, there don’t seem to be great rust libraries to do it seamlessly and it might break things if you’re not careful, all to save what is likely less than ~50kb.