It’s technically possible for a node that is an authority over some DHT region to try and run “get” for some entry in this region while not yet having this entry available because data propagation hasn’t completed yet. But according to the documentation, both of get options guarantee that in such case node wouldn’t try to run a network query. How to make sure that this get goes to the network in such case?
Here’s an example where I think that would be useful.
Suppose we have a DNA that stores 90 entries with nodes A and B. Partition size = 1, so every node is the single authority on a certain DHT shard.
Node C joins the network and now the rebalancing is happening because C should be responsible for some DHT shard with 30 entries (I assume DHT contents are split equally for simplicity). Data propagation begins at time T1 and node C is supposed to have received all 30 entries by some future time T2.
Out of those 30 entries, there are 2 that node C needs ASAP to render the UI properly. So before T2 happens, node C makes the get request to retrieve those 2 entries. (Let’s assume they’re available through a hard-coded path so C already knows where to look.) But since the data propagation hasn’t finished and since get options guarantee that node wouldn’t go to the network for entries it’s an authority for, node C wouldn’t get these entries before T2.
In this example, p3 presents a problem that would only get worse when scaling because:
there’s no guarantee about the order of events due to eventual consistency;
there’s no priority for different DHT entries;
there’s no way for a developer to manually intervene in the data propagation process to prioritize certain data.
So in the worst case scenario we will be waiting (T2 - T1) time for the data we need ASAP without having any control over it. Delays like that would make for a poor UX.
I see two potential ways out of here, but both require changes in Holochain:
allow developers to manually intervene in the data propagation process, providing an option to enforce gets to go to the network. I do realize that the whole point of these smart get options is to reduce the network load hApps would create on devices, and that allowing developers to intervene would go against those efforts. But I also feel that developers should be held responsible for the choices they make, and if used wisely this mechanism would help to improve the UX.
allow developers to define a priority level for their entries for the data propagation purposes. That would still require to sit still and receive the network gossip, but at least that would increase the chances for the right data to arrive sooner than later. So developers get more control while not being allowed to mess with the data propagation directly, which may be a win/win. But, it also feels like a pretty fundamental change in the entire Holochain data model, so I would expect that to be difficult and time-consuming.
Does anyone see any solutions that can be implemented right now? Do presented options make sense?
They make total sense @e-nastasia, and your analysis of what is happening is adequate as always.
However, this is really a temporary mechanism until sharding gets fully implemented. Right now, there is no sharding, which means that all data gets gossiped to all nodes. Which means that everyone is an authority (which just means that it can answer get requests) for every piece of data. Which means that when doing a get we are assuming that that data is locally in the node.
When sharding gets implemented, the most normal case for a get request will be to go to the network, because you will be holding only so many entries, and you will delegate most of your requests to the appropriate authorities in the network.
I think the assumption that because DHT partition size is 1 then all nodes should have all the data is incorrect. What about when I first join a DHT? It could take several minutes before my node is in sync with the rest of the DHT which means I have to wait some unknown amount of time before I can start making requests or implement some weird polling behaviour to keep checking if I have the data I am looking for.
Id also like to extend this question to get_links calls, would be great if we could specify network options when making get_links calls. Although maybe there is some technical blocker here I am un-aware of.
I am confused. From all the documentation I thought that every node doesn’t hold an entire DHT copy (with partition size = 1, at least), instead it’s an authority for a certain part of the DHT. Specifically, there’s this line in the core concepts from which I made this assumption:
When an agent wants to publish or query a piece of data, they need to figure out who is most likely to be an authority for its address.
And even if that’s the case, when new node joins the DHT the problem would still be present: new node would consider itself an authority too, so it won’t go to network for a get request and will have to wait for data to come in.
I would expect the same to happen once the sharding gets implemented.
So my question could be better rephrased as: how to enforce network gets for nodes that are authorities for the requested data?
Ohhh oh oh so sorry. I was confused about the word “partition”, which I thought meant the size of nodes that can reach one another.
You both are referring more to the storage arc size. This refers roughly to the amount of entries that a particular node is suposed to hold. Note that this number can be change dynamically depending on the situation.
I understand the problem you are referring to now. I don’t know the full behavior that will get, I know that they wanted to have like 5 parallel gets and aggregate the results, which would solve the problem.
I think that the problem with the solutions you are outlining is: how do you differentiate between a non-existent entry and an entry you haven’t received yet through gossip? If you are the authority for the entry that’s being requested, which nodes should you ask for that entry? In that case, you are “the network” to go to.
Yeah, my bad, I think it would’ve been easier to understand if I was talking about the replication factor. I confused those two concepts in my head %)
I know that they wanted to have like 5 parallel gets and aggregate the results, which would solve the problem.
Do you mean that entries included in the storage arc for a certain node would be requested in multiple parallel requests instead of a single big one?
I think that the problem with the solutions you are outlining is: how do you differentiate between a non-existent entry and an entry you haven’t received yet through gossip?
Oh riiight, this again!
But in some cases I would argue that you could be fairly certain about existence of some entries, so if you don’t have them – it’s purely a data propagation issue. In my example, some entries will always be created by the DNA creator during the creation step, so every other node could safely assume that if they’ve been invited to a DNA there will be those entries.