As part of HoloREA I’ve been working on a simple storage abstraction that functions like a graph database does- i.e. records have stable IDs and content changes over time; and the basic means of arranging data is as blobs with ‘edges’ between those nodes.
You can check out the code here.
This is basically the framework that naturally emerged when attempting to use the hdk
methods directly and build the simplest system possible. Some of the features it offers are subtle, but we believe they are important to the semantic integrity of data in Holochain applications.
In Holochain, new data can be provided to update entries over time. hApps need to be aware of this linkage between records in order to be able to determine what is the ‘same’ record, and what is something new. More tangibly, if I have a blog post in one network linking to several categories stored in another network; and if the blog post’s ID changes as it is updated over time- then it becomes impossible for the ‘categories’ network to determine how many blog posts are filed under each category.
By adding stable IDs to records (referred to as ‘base entries’ in the codebase) and linking all references to this ‘base’ record rather than the entry content hash, we:
- Ensure that external networks are able to reason about the logical linkages between their data and our own
- Ensure that the content of an entry is efficiently retrievable (1 hop / link)
- Ensure that the content of an entry can be retrieved by following the entry update metadata as normal
- Ensure that lower-level Holochain tooling is able to reason about our records and their history in a way that preserves meaning
- Allow for linear lookups to read referenced data addresses (1 read / link), following the HDK links API
Ideological considerations:
This is an opinionated implementation and something that should be carefully considered before adopting. Conceptually considering multiple versions of something as the same record is an ideological decision.
Implementation considerations:
When implementing records it is advised that every newly created record be unique. That is, some content should be stored as part of the entry data in order to make it unique, for example a creation timestamp. This is important when considering deletions and resolving conflicts- usually, two records created at separate moments in time would not want to be considered the ‘same’ record.
It is as yet unknown how Holochain core will function in regard to deletion conflicts when re-adding identical data to that of a previously created record, especially as a result of network partitions.
Another consideration relates to deletions, and how your app handles access to archival information. You may wish to remove the records but leave the links themselves preserved (so that UI can explore deleted data on demand), or you may wish to wipe the links. In either case, you may also want to provide API access for loading deleted and archived versions of content from the DHT.
Indexing and more complex lookups between records is left as an exercise to other libraries, probably with things like Willem’s pluggable Collections API. There may be an avenue towards implementing this feature as a plug-in zome, such that it can be modularly attached to HoloREA record storage zomes and handle the queries between entries.
Where to from here?
There are still a couple of things to clean up around link handling and the way deletions are managed, but the basic pattern is established. I’m eager for community feedback on this!
The other thing I’m wondering is whether there is potential here for adoption by the HDK team as a companion library for hdk::utils
, or whether it should be left as a third-party project. I suppose the answer to that question depends on how opinionated or idiomatic others view the design to be?