A billion links (to/from one Entry). How would holochain handle that?

nmrshll · January 13, 2020, 4:16pm

What would happen if I was to try and get/iterate over/aggregate an absurdly high amount of links pointing to one entity ?

If I’m not mistaken, it’s not possible to return or process a billion results just like that (because of hardware limits, or because software (like wasm) is configured with more limitations to avoid hitting those limits in the first place).
Which doesn’t leave a lot of ways to try and handle this:

Best case, holochain can store this just fine, but is only able to retrieve part of the results at once. In this case, which part would that be by default ? Would there be ways to filter and query results (beyond LinkMatches) ? (e.g. the same way it’s possible with WHERE clauses and fine-tuned multi-column indexes on most SQL DBs) ?
Okay case, holochain can still store this just fine, and the hdk errors cleanly when there’s too many query results. It’s up to the hApp dev to layer data structures on top of holochain to get index-like query features while avoiding hitting the limit. In this case, what would that limit be ?
Meh case, querying too many things can cause panics or similar.

Does the actual holochain code resemble one of those cases ? and is there any further recommendations as to how to handle large scales like this with links ?

Thanks in advance !

Alexr1239 · January 14, 2020, 1:33am

I’m curious as well

freesig · January 14, 2020, 8:26am

I think with a list big enough (although a billion really isn’t that big) you could run out of memory in the wasm runtime. I think this would show up as an allocation error because the Vec would fail to allocate. This type of thing would probably be better done in chunks like you say.
I’m only guessing though.
This is probably premature as no-one has a usecase that is hitting this yet but I don’t see any reason why it couldn’t be built.

pauldaoust · January 15, 2020, 4:12pm

There’s also a conversation going on about links right now — it gets a mention in this PR, but unfortunately most of the conversation is happening in dev team chat and meetings for efficiency’s sake. But the gist is that, yes, we need pagination, which requires sorting and filtering (ideally at source, which means the node holding the base and its links), and it’s being worked on.

The alternative is to break up responsibility for links using things like Bucket Sets.

nmrshll · January 15, 2020, 6:00pm

Thanks, that’s a lead in the right direction.

We were starting to investigate bucket sets, collections, and other data structures that we could layer on top of holochain to make both storage and queries either scalable or more practical for large collections, so it seems to be the way to go.