Relying on links to retrieve latest entry version

e-nastasia · April 1, 2020, 10:27am

This is a follow-up question about the pattern @guillemcordoba explained to me in relation to the Leap app. There already exists a WIP PR and a github issue with description of the overall idea.

In this pattern, we split a single Entry for the Course into two parts:

CourseAnchor that is never changed after creation and serves as a reliable address to reference the course it represents;
CourseData that is storing all the data for the course and is frequently updated. In order to be discoverable, each CourseData has links from CourseAnchor.

I only recently realized that this whole pattern is based on the assumption that we can always easily retrieve the latest link CourseAnchor -> CourseData and thus save on all the lookups from the original CourseData to the latest one.

However, in Holochain we can’t be sure that order in which actions are happening will be preserved, and we’re operating under the assumption that it won’t.

Question: how can we then be sure that we can get the latest link from CourseAnchor -> CourseData if link creation follows all the same data propagation and eventual consistency rules?

If we can’t be sure about that, then we might be simply moving expensive lookups for latest version from the entry itself (looking for latest Course) to the entry links (looking for latest link CouseAnchor -> CourseData).

guillemcordoba · April 1, 2020, 11:29am

Hi @e-nastasia, you’re exactly right, and this is why it’s important to also update the courseData entry as well as adding a link. The update chain will always lead to the latest update, so what we basically do is follow the link from the anchor to the entry, and then see if the entry is marked updated and follow the chain update.

Here the anchor basically helped us to jump ahead in the update chain as close as possible to the latest update, and also holds the links for the course.

Although, what we also could do is attach a version number in the link tag from the anchor to the data, since we already trust the course teacher which is the only capable of doing the link. And then we always follow the link with the highest version number. We do have to follow the update chain in case the anchor metadata didn’t contain the latest link but the course data did contain a new update.

Also, a consideration for lookup times. Following an update chain is quite expensive (you have to do N get_entries where N is the number of updates) while query getting all links with the anchor as base is always 1 query in the DHT. So attaching semantic data in the tags that allow us to better navigate the entry graph is very useful in a lot of cases.

Connoropolous · April 1, 2020, 3:52pm

I still think it was a mistake to make entries update-able at all.

e-nastasia · April 1, 2020, 5:12pm

Guillem, thank you for your detailed response!

Also, a consideration for lookup times. Following an update chain is quite expensive (you have to do N get_entries where N is the number of updates) while query getting all links with the anchor as base is always 1 query in the DHT. So attaching semantic data in the tags that allow us to better navigate the entry graph is very useful in a lot of cases.

That was an extremely important point to clarify for me, thank you!

I realized that my understanding of the situation is a bit different:
Current implementation in PR would always call create when performing a course update, creating a CourseData entry that is a different entry from the Holochain perspective. And then it will also create a new link from CourseAnchor to this entry of CourseData. If it wasn’t for this link, it wouldn’t be possible to see that CourseData N-1 and CourseData N are in fact different versions of the same course.

And this is how I understand the model you’re describing:
We’re calling update on the latest CourseData entry instead of create. Due to data immutability, we’re still creating a new entry, but the difference is that this entry would be next version from the Holochain perspective, not just a separate standalone entry. And then we also create CourseAnchor -> CourseData link for this updated entry and we can add a version to it as a tag.
When we want to get the latest course data, we take the CourseAnchor, follow it’s link with the highest tag, and from there we also verify if this entry is in fact the latest version or there exists a newer one. This is to handle the case where link to this latest version wasn’t propagated just yet.

Do I understand you correctly?

e-nastasia · April 1, 2020, 5:43pm

Could you clarify what do you mean by that? What would be the alternative solution?

guillemcordoba · April 1, 2020, 5:59pm

Omg yes. Exactly right, down to the comma. You have the correct assumptions and mental models there.

e-nastasia · April 2, 2020, 10:45am

Great to know! In this case this question is resolved and I’ll update the PR soon with fixes.

jakintosh · July 14, 2021, 4:28pm

I found this thread after designing a very similar pattern abstractly on paper, and now I’m trying to see if the approach is plausible in holochain.

The idea was:

Create a new data object that is composed of content and some metadata.
Create a second object that acts as “version control” for the first object, and link to it from 1st object metadata
This version control object will stay the same for all future versions of the content, so any version of the content on the DHT can point back to the same versioning object and jump to latest (or perhaps other versioning functionality), to avoid having to traverse a potentially very long chain of updates

As an aside: I see that this was essentially the same solution that @e-nastasia came up with, but that @guillemcordoba recommended using “Update” instead of linking together new “Creates”. So I’m wondering: What are the benefits of this? I would prefer keeping them as new separate entries, so that they may potentially be “split/merged”, or things that don’t necessarily line up with what “Update” might mean, but I’m wondering if there are other benefits I don’t know about.

Anyway, while trying to make sense of how this would work in holochain, I was running into questions about how links would work, and I have almost this exact same question:

However, I’m also wondering about how many links can we feasibly attach to an entry? For example, if I created a link from the version object to the new content object with a “latest” tag and then deleted the old one every time there was a new version, 50 updates would have 99 links on it: 50 creates and 49 deletes, with only one being still valid. Is this right? At what point would I really start to have a problem here with scaling? Perhaps the versioning object itself could get “full” and spawn a fresh object? But then we’d potentially have the link traversal problem again, though at a smaller scale.

I guess to condense my question, I’d ask: How many links can I add to an entry before I should worry about performance or scaling, for both fetching from the DHT and also querying those links? And maybe as an additional one would be: Where is the best place to learn the most about the limitations and concerns of using links in holochain (i.e. how can I help myself)?

guillemcordoba · July 14, 2021, 4:55pm

Hi @jakintosh I’m getting excited just by reading your comment. I love discussing different design patterns

So first a disclaimer: things have changed a bit in RSM, so these past discussions may be somewhat outdated. Here you can see what design considerations are different.

I would like to know a bit more about your use case in order to come up with an optimal solution. In RSM an entry can have multiple updates with no problem. I think of the underlying primitives of links, updates and deletes as tools that I can use to build meaningful constructs. For example, in an application, deleting a note may mean it’s not visible anymore by anyone, but in another context it can mean clearing the notification as active, but being able to access it in the trash.

You are on the right track also when it comes to links, it’s better to not have too many links attached to the same entry for gossip performance and storage. So you may want to use patterns like paths to shard your links if you are using anchors. In this case it seems it’s more about version control, so maybe a compromise of creating a new link every 5 updates is a good pattern? We’ll see where we land eventually as far as RSM features go, maybe there is a facility to help redirect long update chains.

Lastly I think the best place to meet and discuss these kinds of problems is in Holochain In Action: a weekly meeting we have with community hApp devs. Here is a really (really really) good one with kizuna, in which they go over the design patterns they’ve used, and here is the link for you to apply and join the sessions.

jakintosh · July 15, 2021, 7:14pm

So actually, the more I dug into what I need, the more I saw overlap with your _prtcl; I was getting really hung up on the idea of branches (which _prtcl seems to call “perspectives”).

The basic idea of my use case is a distributed knowledge graph like wikipedia (you may have seen my post on the wikinodes thread), where anyone can create, link to, transclude, or edit a node and its links. My first draft is focused on text, but the goal is for it to be “data agnostic”, as long as the data defines a “diff” protocol (that allows forward/backward construction of a state given an ordered set of changes); then these individual nodes on the graph could be pretty much anything, open to everyone to modify, fork, and merge. However, before fleshing out the details, I had assumed some kind of sequential global state (oops ), and so that’s where I suddenly got blindsided by the architectural need to support multiple perspectives. This is also the point where I realized that I was probably reinventing a wheel, and dug a bit deeper and rediscovered _prtcl.

Anyway, I’m sure you can see here where the overlap is better than I can, and I think it might be getting outside the original topic of this thread now that I’ve gotten past some of the ideas around “links and versioning”. However, I would be curious to chat more with you about _prtcl to see what your goals are for it, and to see if we can minimize the reinvention of the same wheel Ultimately, my goal is to build a user facing application that adheres to a standardized protocol, not for me to design the protocol itself (though I’d love to help define use cases to make one more robust!)

bierlingm · July 15, 2021, 10:14pm

Habe you looked into @lucksus’s AD4M project yet as a way to build that?

jakintosh · July 15, 2021, 10:38pm

I have looked into AD4M a bit! Right now I’m focused on getting a v1 (v0?) out the door, and my approach has been to take a deep look at all of the crazy cool things that people are working on and let that inform the way I build. So, handling data in a way that is simpatico with holochain, and hopefully likewise with _prtcl, even though I probably won’t integrate directly with either for first release. It seems like what AD4M is trying to achieve is a bit beyond what I can handle at the moment, but it’s something that I resonate with.

Honestly, it might be time for me to just make a project page on the forum, I’ve been working for almost 6 months, and lurking on the forums here for 4

jakintosh · July 19, 2021, 10:54pm

I made a new thread to continue the specifics of my use case and proposed design here: Versioned and Authored Content Deltas