This summer I worked on a side project which is currently ~5,500 lines of code, written in Go. I was the sole author, besides libraries I imported.
The project started out as a way to process information about my notes in Obsidian, an app where you create a "vault" of markdown notes by creating links to attachments and other notes.
At some point I decided to just develop the core features needed to maintain a vault for markdown notes-- tracking links between notes, providing an API to efficiently resolve links (which include shorthand links which resolve to the shortest-path matching note from the root), rename and move files. I now have the core features for the backend of a markdown-based note-taking app, though I don't have a front end.
To demonstrate some capabilities of this project, I wrote a program which uses it to export a markdown vault to an HTML static site-- which is what made this site.
Support for inline Latex
Support for [[wikilinks]]
Frontend
One of my goals was to have low asymptotic complexity for all operations. The most common operation in a collection of markdown notes is resolving "short-links": links in the form "path/to/file.md"
resolve to the file closest to the root directory whose filepath has suffix "path/to/file.md"
. Doing graph-search on the entire vault, in the worst case, requires visiting each file in the vault. The number of files in a vault could be exponential in the length of the path if each folder has multiple subfolders.
My goal for this operation was to have complexity cost proportional to the length of the short-path provided (the number of "/"
in the path). To meet this goal, I designed a data-structure which I call "augmented suffix trees".
First I implemented a suffix tree: a tree whose root represents the empty suffix, ""
, and the children of each node corresponds to adding onto the suffix we are searching for. I split file-paths by the delimiter, "/", so "path/to/file.md"
becomes ["path", "to", "file.md"]
. If this path was in the suffix tree, you would access it by going to the "file.md"
child of the root node, then the "to"
child of that node, and finally the "path"
child of that node.
Since I needed the closest match, I could use the suffix tree to get all paths with matching suffix, then return the shortest one (in terms of number of "/"
) to get the file closest to the root with matching suffix. This was inefficient, but provided a starting point for building augmented suffix trees.
I built onto the suffix tree code to make them "augmented" for some associative function f which takes in (k1, v1), (k2, v2): two key-value pairs. The reduce value of a node is equivalent to f(f((k1, v1), (k2, v2))...(kn, vn)) where (k1, v1)... (kn, vn) are the key-value pairs for each of its children. (I got this idea from my functional programming class, 15210, where we used augmented treaps).
The reduce value is cached so that you can access a node's reduce value in O(1). To remain accurate the reduce value must be recomputed each time the reduce value of a child changes-- basically, when you insert/delete a child or change the value stored at the node (in my case, whether there's a filepath terminating at that node).
My current implementation of augmented suffix trees uses an O(# children) approach to recompute the reduce val of a node. This allows me to store the children of the node in a 1-layer hash-map for O(1) access when traversing a level in the tree. To more efficiently recompute the reduce-val, I could store a node's children in a balanced binary search tree, so I would only need to call f O(log(# children)) times-- though this would increase the cost of inserting / deleting a child to O (log(# children)).
Using my augmented suffix tree, which can store arbitrary values for arbitrary keys and take any associative f (plus an identity value), I created an augmented suffix tree whose reduce value is the key of the closest matching filepath. To make my f associative I created a tie-breaking mechanism: if two children have matches the same distance from the root, favor the child whose key is lexicographically first.
This augmented suffix tree became a core data structure in my vault, allowing me to efficiently resolve short paths, and is also a standalone & tested package (using the slower suffix tree to model correct behavior) ready to be used for future projects.
Exported from a markdown vault by EKB's vaultparser