Holochain Collections

A drop-in zome library that allows you to take advantage of best practices for storing large amounts of data on the DHT. Holochain Collections GitHub repo.

How is this different to the FileStorage Zome?

@dellams this module is about methods of indexing and querying large sets of information. IIRC the collections API @wollum created is simply an indexing system with callbacks that the implementer defines which determine how the indexing will be performed. For example- ordering of posts by date. Do you create a private (local) order, based on the time each agent observed each post? Do you create a deterministic total ordering and use validation rules to enforce that newly added entries are located in the correct place? Or do you try something even more complicated, like probabilistic orderings that sort entries based on an aggregate observed viewing time by other agents? Do you shard your indexes to get better lookup speeds? If so, what strategy do you use?

…many things to consider…

1 Like

I’ve been working on a spike for chat where rather than all messages linking to a channel (which is no going to scale well) they link to each other in a kind of lattice. This is inspired by IOTA and OByte. I’m calling it a DagList.

It allow for pagination, partial ordering and entirely prevents hotspots which is pretty cool. Downside is unless you traverse the entire DAG you only probably get all the entries. I think this is an acceptable compromise though especially since you can fallback to a full traversal if you definitely need everything.

Description here: https://holo.hackmd.io/_ttHYAdhSO-hczIihpGIlQ?both
Spike here: https://github.com/holochain/basic-chat/pull/6 (its a bit of a mess)

It would be awesome to abstract this to Holochain Collections so others can re-use it. I don’t have the time right now but would anyone one be willing to have a crack?

Tagging people who might be interested
@nickmitchell @pospi @thedavidmeister

3 Likes

@wollum i also don’t have time right now, but sounds great as a crate

i also like the idea of agent-centric total ordering, which is something we could live with for chat which is something Obyte can’t get away with (due to financial requirements of double spending)

this would basically make everyone their own “witness” in Obyte-speak, which provides a local total ordering

1 Like

I saw this and never pinged back (and I wish there was a way to do this as a side-thread), but just wanted to say that this is one of the more exciting things I’ve seen about hApp development in a while and I look forward to playing with it! :slight_smile:

1 Like

@wollum I’ve been looking through this code a little more and umm… how far did you get with this? At first glance it looks like a pretty ripe and functional API that could be adopted easily.

Is all you’re looking for somone to clean it up and pull it out into its own module? If so, I’d be happy to do that. If I’m reading this right it basically means we can efficiently index things (with pagination) off single root identifiers (tables in the code). Each type of index the app developer wishes to create would need to be initialised and appended to as a separate root “table”. So, we now have a reasonably straightforward way to create compound indexes over entry data:

  • App developer decides on search indexes to build over the entry data
  • Upon writing (or modifying) each new entry that is added to the DHT,
    • The entry data is first written to its own location
    • For each search index, a predicate function is run which determines whether the entry is to be included as part of that index
      • If the predicate returns true, we add_content_dag() for the index’s table.
      • If the predicate returns false, we do nothing for this index.
  • Querying the entries from the DHT is then simply a matter of calling get_content_dag() for the appropriate table of the search index being queried, providing since & limit parameters for filtering.

The author stuff could use some explanation- is it a higher-order utility construct that could be split out? Or are there particular reasons that author is inherent in the core data model (perhaps conflict disambiguation / multi-agent linearity)?

The other big thing I can see missing here is item removal- how does that work?

2 Likes