Knowledge management and graphs

These are some notes about my journey so far with using graphs for knowledge management, and an idea for a Memex-like universal graph knowledge management system.

My knowledge management journey so far

I have always been fascinated with knowledge management and the organisation of different artefacts such as songs in my music collection or pages on my websites. From my early days with WordPress I always lamented the distinction between ‘categories’ and ‘tags’ and the purposeful splitting of capabilities across two different features or tools, in place of a single unified approach.

At their most basic, tag systems can be modelled as a bipartite graph. All vertices in the graph are either the ‘objects’ of interest, such as entries in a blog; or ‘tags’, which are text labels which can be associated with the objects. Objects and tags form a many-to-many relationship and objects are associated to each other through the tags, rather than directly. Usually tags cannot be associated to each other or structured in any way, although more complex taxonomic or categoric systems allow the tags to be assembled into a hierarchy, such that if an object has a tag which itself has a parent tag, then that object is transitively tagged with that parent tag as well.

Category systems differ from most tag systems by allowing the user to construct a hierarchy of categories with transitive membership, such that if some category B is a subcategory of another category A, then if an object is categorised as B it will also be transitively categorised as A. Some category systems are many-to-one, and only allow an object to be a member of a single category.

I used to make heavy use of del.icio.us¹ many years ago and I was fascinated by the idea of ‘folksonomy’: an organically evolving, user constructed classification system. As far as I remember, del.icio.us’ tag system was flat, and tags could not be associated with each other. However, I did wonder what the site would look like if users could construct their own tag hierarchies and share these with other people. For example, I may tag some links with ‘cooking’ and ‘cheese’ and assemble these under a super-tag ‘food’. I could then publish my taxonomy and other people could then explore their own links and tags through this hierarchy.

A couple of years ago I discovered the zettelkasten system of note-taking and I adore it. I would summarise the two key properties of a zettelkasten as short notes, and links. I have yet to dive deeper into the short notes aspect, but the emphasis on links as the primary organisation system has been revolutionary for me. Links subsume both tags and categories in their entirety, because arbitrary linking is much more generic than these other systems. If I want to tag some articles as ‘food’, I can just create a ‘food’ page and then link to this ‘tag-page’ from the other pages. This is exactly how I organise this website, using a set of pages which can freely link to each other, and then creating some empty pages to serve as categories or tags. This allows me to organise pages and tags hierarchically and to break out of the hierarchy and link between pages when necessary.

I use Zotero for handling my growing collection of bookmarks, and I freely combine academic and non-academic material because I do not worry about traversing this network of resources as I can lean on my tags, and Zotero has a search system which also looks inside abstracts. Zotero suffers from the same problem as WordPress, with a mix of a hierarchical ‘collections’ system and a flat ‘tags’ system. I previously organised my Zotero library using a hierarchy of collections but I found this frustrating to use when I wanted to add something from my browser (via the browser extension) to multiple collections. Now I use tags, but because I can’t build a hierarchy of tags I have a hit-and-miss approach to tagging items with a list of tags like ‘policy, public policy, public health policy, public health’: I don’t always tag every item with all relevant tags because I can’t remember them all! Fortunately I don’t think this has caused too much pain so far. Zotero does support ‘relating’ items to each other, so I could hack together a zettelkasten-inspired link structuring approach but I worry how this will interact with other features like search, so for now I am sticking with tags.

On the desktop, files are largely bound to the hierarchical system of the directory structure, which permits a file to live in only a single category. Symbolic links can be used to break out of this and effectively place a file in multiple directories. There are also tag-based overlays on the filesystem such as TMSU², and file indexing systems to allow search for files by metadata or contents. However I have again found that the restrictive hierarchy isn’t too painful, as I have managed to develop a good hierarchy to position the many files I work with most often.

Despite the lack of pain with current approaches, I can’t help but wonder what myself and other computer users might be missing with a more flexible system, and what the world (or at least our relationships with computers, data, and knowledge) would look like if link-based or zettelkasten-like approaches were more common. My friend Dale has a philosophy that users of products can become ignorant to the limitations of those products, and that we don’t always know how much better our situation could be until we’ve tried something new or different. This was very much my experience with true wireless earbuds. I was quite happy with my regular wired in-ear headphones, even though I would get some cable noise, and even though it would take a minute or so to thread the cable down my jacket so I wasn’t worried about catching the cable on anything. In the context of the wired headphones themselves, these were just minor inconveniences, but when I got some true wireless earbuds and tried them, this completely recontextualised those issues. Suddenly, it felt so much easier and more convenient to use my headphones when I didn’t have to deal with a cable, even if this only saved me a few seconds per day, and as a result I make much more use of the wireless headphones than I ever did with the wired ones.

Content-aware linking

In addition to my website I make heavy use of link-based organisation for my PhD research, where I use a Foam workspace³ to organise and connect my notes. With research it is necessary to make use of many different types of data and documents, and I wonder how links can be overlaid on top of different resources. An initial idea would be a tool like TMSU which allows linking from one file to another via an overlay database, but I think the real benefit would come from a content-aware link overlay system.

In Foam, links are embedded in Markdown documents, such that I can write a sentence like Potatoes are a kind of <<vegetable>>⁴, and the bracketed word is interpreted as a reference to a ‘vegetable’ file, which links these two files together. Because the link is embedded directly in the document from which it originates, it contextualises the link, and promotes the creation of links as part of the natural writing process. This is one of the characteristics of wikilinks⁵. Foam even allows linking to particular sections of a target Markdown document⁶, which with sufficiently small sections or files (i.e. fully embracing the zettelkasten approach) results in links which are contextualised at both ends.

I wonder if such a content-aware linking mechanism could be moved into an overlay system like TMSU, and made to support different file types. For example, could I select a paragraph of text in a PDF, and then link this to a particular area of an image file? This could allow me to associate research papers to an interactive map of the world based on the geographic origin of their datasets; or link facts from a biology textbook to a diagram of a cell.

Such a system would confer huge benefits, by allowing any information in any file of any type to be freely associated with any other information in any other file, something which is not currently possible. Transcribers could link each sentence or word to the relevant section of a recording; qualitative researchers could associate codes across images, video, text, …; students could attach notes to any Web page or e-book, and link together concepts across different sources; music collectors could build graphs of bands and band members, which venues they played concerts at and when. Some of these capabilities are already enabled through bespoke pieces of software, but if we had a suitable graph filesystem and applications which supported linking into a file (to a subset of its content), we would get all of these capabilities, and more, for free.

If the underlying database allowed for the storing of arbitrary data in addition to just links to files or Web pages, then it is possible to store metadata in the graph, which opens up further possibilities. For example, Zotero could store item metadata in the graph database, and my address book could store contacts in the graph database, and then I can store contact information for academic colleagues and have this link to and from the papers they have authored.

Further, if the graph database can queried, even more potential is unlocked. Perhaps I want to prepare a meal for my friends who are coming over, but some of them have food allergies. I could first query my address book for a list of all unacceptable foods based on who is coming to dinner, and then I could query my recipe collection to exclude all recipes which include any of the forbidden ingredients.

Unfortunately the main difficulty in building such a system is likely to be implementing support for these links in all of the different applications which are used to view and edit different file types.

There is also the need to determine how links should behave when the underlying content is changed, and to develop a method of linking for each file type which is not overly brittle. For example, if I wanted to link to this paragraph, which of these should I store: the paragraph number; the line number; some or all of the text in the paragraph; or a hash of the paragraph text? If the paragraph changes, how does the link remain intact and point to the same paragraph? If the paragraph changes too much, is it the same paragraph? Fortunately this issue could be solved through existing version control mechanisms, and then it would be possible for a user to (perhaps manually) update their links to point to the latest version of each file, when they have time to do so.

References and footnotes

A snapshot is available at the Internet Archive at https://web.archive.org/web/20080731101324/http://del.icio.us/. ↩
https://tmsu.org/ ↩
https://foambubble.github.io/foam/ ↩
Foam defaults to parsing double square brackets as wikilinks, and I have used the same syntax for this website. Due to limitations in my own code for parsing out wikilinks I cannot write an example with square brackets or it will get parsed as an actual link! Therefore in these examples I use double angle brackets as an alternative. ↩
https://en.wikipedia.org/wiki/Hyperlink#Wikis ↩
https://foambubble.github.io/foam/wikilinks#support-for-sections ↩