Graph database abstraction

pospi · August 16, 2019, 6:55am

As part of HoloREA I’ve been working on a simple storage abstraction that functions like a graph database does- i.e. records have stable IDs and content changes over time; and the basic means of arranging data is as blobs with ‘edges’ between those nodes.

You can check out the code here.

This is basically the framework that naturally emerged when attempting to use the hdk methods directly and build the simplest system possible. Some of the features it offers are subtle, but we believe they are important to the semantic integrity of data in Holochain applications.

In Holochain, new data can be provided to update entries over time. hApps need to be aware of this linkage between records in order to be able to determine what is the ‘same’ record, and what is something new. More tangibly, if I have a blog post in one network linking to several categories stored in another network; and if the blog post’s ID changes as it is updated over time- then it becomes impossible for the ‘categories’ network to determine how many blog posts are filed under each category.

By adding stable IDs to records (referred to as ‘base entries’ in the codebase) and linking all references to this ‘base’ record rather than the entry content hash, we:

Ensure that external networks are able to reason about the logical linkages between their data and our own
Ensure that the content of an entry is efficiently retrievable (1 hop / link)
Ensure that the content of an entry can be retrieved by following the entry update metadata as normal
Ensure that lower-level Holochain tooling is able to reason about our records and their history in a way that preserves meaning
Allow for linear lookups to read referenced data addresses (1 read / link), following the HDK links API

Ideological considerations:

This is an opinionated implementation and something that should be carefully considered before adopting. Conceptually considering multiple versions of something as the same record is an ideological decision.

Implementation considerations:

When implementing records it is advised that every newly created record be unique. That is, some content should be stored as part of the entry data in order to make it unique, for example a creation timestamp. This is important when considering deletions and resolving conflicts- usually, two records created at separate moments in time would not want to be considered the ‘same’ record.

It is as yet unknown how Holochain core will function in regard to deletion conflicts when re-adding identical data to that of a previously created record, especially as a result of network partitions.

Another consideration relates to deletions, and how your app handles access to archival information. You may wish to remove the records but leave the links themselves preserved (so that UI can explore deleted data on demand), or you may wish to wipe the links. In either case, you may also want to provide API access for loading deleted and archived versions of content from the DHT.

Indexing and more complex lookups between records is left as an exercise to other libraries, probably with things like Willem’s pluggable Collections API. There may be an avenue towards implementing this feature as a plug-in zome, such that it can be modularly attached to HoloREA record storage zomes and handle the queries between entries.

Where to from here?

There are still a couple of things to clean up around link handling and the way deletions are managed, but the basic pattern is established. I’m eager for community feedback on this!

The other thing I’m wondering is whether there is potential here for adoption by the HDK team as a companion library for hdk::utils, or whether it should be left as a third-party project. I suppose the answer to that question depends on how opinionated or idiomatic others view the design to be?

pospi · August 16, 2019, 6:59am

Fighting new user link limits, but you can see some usage examples (for records and links, respectively) here:

Documentation still to come…

Brooks · August 16, 2019, 2:31pm

I think you can send a message to @Holochain and request moderator access, maybe that will get rid of your link limits?

pospi · September 27, 2019, 2:48am

I’ve recently put together some documentation that covers the high-level approach, concepts and terminology involved in this library for those who want to know more about the patterns I’m playing with without necessarily diving into the gory details. Enjoy!

Very much looking forward to reflections and possible code review on this (ping @zippy @lucksus @thedavidmeister @wollum) and other potential data architectures.

lucksus · September 28, 2019, 10:00pm

Oh, cool! I just gave it a quick glance and what I saw looks great. Thanks for putting that together, @pospi. Will give it a more thorough review when I have some space. I can definitely see how some if not all of those function could make their way into the HDK…

thedavidmeister · October 1, 2019, 1:07pm

@pospi sweet! i don’t think i’ll have much useful to say until i try to build something on top of it (which i will try to try to do soon )

dellams · September 5, 2020, 12:02am

I have started building something similar as part of my HoloNET HDK/ODK so would be great to see where you are with it so we could compare notes and see what can be shared and reused etc…

dellams · September 5, 2020, 12:03am

What projects is this being used in? Are there any examples? Thanks

dellams · September 5, 2020, 12:25am

Great. Do you have any example code I can see using this library? Thanks. Great work and is the direction I have ended up going too without even realising it till now! Lol