Inquiry: Entry and link type namespacing

pauldaoust · December 5, 2019, 9:33pm

@pospi the conversation about third-party zome/crate reuse has just come to life again between me and @freesig, who just debugged his first nasty namespacing issue with entry types. The problem: entry/link type definitions can clash with each other, between two zomes for sure, but even within one zome if you’ve got entry/link definitions going on in two separate crates.

We’re really concerned about the implications for third-party code reuse, and are thinking about creating a PR that automatically applies namespacing for all entry and link type names, something like zome_name:crate_name:entry_type_name. How would this impact your architectural decisions, either with Holochain or any libraries you’re creating?

freesig · December 5, 2019, 10:49pm

One of the issues I’ve experienced / worried about is for example:

Anchors crate defines an entry called “anchor”.
The shipping crate also defines an entry called “anchor”.
I want to make a boats zome and I add both these crates to my zome.

I now have a conflict. I don’t get any warning and it’s really hard to debug. I also cannot fix this bug without getting the library authors to update their crates.

This conflict could also happen across two zomes.

A few questions come to mind.

Should crates be able to commit entries internally? If i think about the “other” world it would be strange / impossible for a crate to be able to write to my database directly.
How can I specify an entry from another zome if we enforce namespacing? Do we need another entry type like External?
Is there another solution other then namespacing?

I’m hesitant to make this just a best practice, I think we should try and enforce it somehow because in the scenario above I have no control over the practices of the library authors.

Would love some input from @artbrock @thedavidmeister @wollum or anyone else.

pospi · December 6, 2019, 1:07am

It’s a tough decision to make, I can see arguments both ways but I have definitely been caught out by a conflict. IIRC multiple entry type defs with the same name, in different zomes, but in the same DNA… it caused core to either hang or crash.

I don’t believe this should be allowed (and the namespacing is a good preventative measure from this perspective). In our case it ended up being a code smell- within the “planning” DNA, Intent and Commitment zomes both had bindings to Process in “observation”. Really what should be happening is that Process remote entry stubs are defined in their own separate zome within “planning”, rather than being defined in either of the other two places (which currently, they are). So, an error that explicitly prevents this gets my

On the other side of the discussion you have utility libraries that consumers of mixin zomes need to consume (see holochain_anchors and the anchors zome). If namespacing is to include crate_name, then some of these utility libraries may have to have curried APIs or be made otherwise configurable with the name of the mixin zome crate they target. Perhaps this small amount of additional complexity is acceptable… you could always generate the library crate code via a macro.

In many cases this limitation will be OK- polymorphism in mixin zome logic is mostly achieved via the helper library layer, not the mixin zome layer. But there might be cases where one helper library wants to be able to plug different storage backends with slightly varied data layouts. I’m not sure what sorts of requirements might drive that or whether it would even be needed; but it does need to be called out that this degree of namespacing prevents such functionality. As per above, I’m OK with this.

Should crates be able to commit entries internally?

I think they have to be able to. You mean zome crates? I spent some time early on trying to create an architecture where the business logic was separate from the hdk entries & links APIs, but it was a completely non-intuitive way to have to think and I wasn’t sure inverting the logic really made sense or was even possible. The whole thing with a Holochain app is basically implementing business logic on top of storage primitives, after all.

How can I specify an entry from another zome if we enforce namespacing? Do we need another entry type like External ?

Some native support for this would be very cool, because managing remote base entries yourself involves a lot of moving pieces. But I have noted that there is a lot of additional complexity involved with external links, and network versioning issues to consider.

thedavidmeister · December 7, 2019, 9:26am

wouldn’t namespaces be the solution here?

freesig · December 9, 2019, 11:26am

I think so. The hard part is how do you link to an entry in another zome. Currently you need to redefine it in your zome. Namespacing would prevent this from working but it feels like you shouldn’t need to redefine an entry to link to it. I think maybe we could have something like Entry::External.

thedavidmeister · December 9, 2019, 3:48pm

there’s more than names to consider when wanting to link across zomes as link validation depends on base validation

pauldaoust · December 9, 2019, 5:42pm

heh, @freesig and I were just walking through this very issue on Thursday… to me it would seem that including a crate into your zome, you should have the privilege of namespacing its entries and functions however you like. Functions is easy: you can only define zome functions inside the mod marked with #[zome], and you can choose to expose as many or as few as the lib crate’s functions as zome functions as you like, with whatever name you like.

Entry/link types are harder, especially because (a) that mixin crate’s helper functions need to know what namespace you’ve given everything, and (b) you can’t opt in/out of this or that entry type definition cuz you don’t know which ones are needed internally. That’s the risk of allowing a mixin crate to write its own data, I guess. To those with better Rust chops than I have (which is probably everyone here), can anyone think of a tidy way to do this, with or without macros? And as @freesig said, maybe allowing a mixin crate to write its own data is irregular compared to other stacks (app + relational DB) where the data layer is separate?

Good quote I think this is a useful framing that might help guide the outcome of this discussion.

@freesig did you determine if this was actually true vs me just talking through my hat?

@thedavidmeister without knowing what the “more [things] to consider” are, this feels important to dig into. Could you elaborate? I’m picturing two scenarios:

link validity depends on validity of base, doesn’t depend on knowledge of the base’s content
link validity depends on validity + content of base, which would probably require some cross-zome code sharing (struct def + deserialisation at least) to get working

thedavidmeister · December 11, 2019, 11:55am

validation needs to be deterministic and reproduceable by everyone, if you need to do validation across zomes, and end-users are deciding what those zomes are at runtime, then different people will have different opinions on validity

also the validation logic is happening “over there” which needs to be handled, probably at the subconscious layer, so it doesn’t just fail when the current zome fails to find the base entries

pauldaoust · December 11, 2019, 3:54pm

I thought it’s the developer who chooses which zomes go into the DNA, and the user can only make choices about whether to swap one bridged DNA for another (and then only when the developer has specified that a bridge dependency can be satisfied with a trait rather than a specific DNA hash)? To clarify, here we’re talking about entries and links that all live on the same DHT but whose types are defined in separate modules.

thedavidmeister · December 11, 2019, 4:22pm

@pauldaoust if there’s any difference at all then the validation could be different though right?

also i didn’t realise we were talking about everything in the same DHT so maybe what i’m saying is not relevant

pospi · December 14, 2019, 1:38am

@freesig @pauldaoust

The hard part is how do you link to an entry in another zome. Currently you need to redefine it in your zome.

I don’t think that’s accurate. AFAIK linking entries in different zomes (but same DNA) works just the same as it does when they’re in the same zome. No additional complexity involved.

@thedavidmeister @pauldaoust yes let’s scrap those last couple of comments because this entire conversation is specifically about multiple zomes inside the same DNA, as far as I know (;

thedavidmeister · December 14, 2019, 11:23am

mhmm, i was thinking of something else

pauldaoust · December 16, 2019, 10:19pm

Ah, never mind then; I told @freesig that I thought this was the case but I didn’t actually test it — it was based on my misreading of the source code that defines the link_entries action.

So if you can link to a foreign-ly defined entry type, and if we’re talking about namespacing all zomes’ entry types, as @freesig said this creates problems, because you won’t know the namespace of the other zome’s stuff at authorship time — it’s gotta be applied at compile time, I think. Either it’s based on the zome’s hash or the DNA developer (not the zome developer) gets to name it. But the dependent zome has to define its dependencies and give them internal handles that are then satisfied whenever link!() and call() are called. I see parallels with hApp bundles — the manifest file defines a bridge dependency, then assigns the handle that the dependent DNA is expecting.

Unlike bridges between DNAs, I imagine dependencies between zomes in a DNA will be static — no runtime dependency creation. Should be simple enough to put this into app.json for the time being, something like

{
  // ...
  "zomes": [
    {
      "id": "alpha",
      "location": "zomes/alpha",
      // ... here's where all the stuff generated by the zome gets mixed in --
      // name/description, code, function exports
    },
    {
      "id": "thing_that_depends_on_alpha",
      "dependencies": {
        "internal_alias_given_by_zome_author": "alpha"
      }
      // ... zome-generated stuff; in the code, you always refer to alpha
      // by the internal alias you've given it, the same way you `call()` a
      // bridged DNA not by the instance ID but by the bridge name.
    ]
  }
}

More thoughts on this:

For zome calls, the namespace resolution from caller’s internal handle to callee’s DNA-defined name can happen at dispatch time.
But we also care about entry type names in dependencies (for the sake of link definitions). I don’t know where that’s actually used — is it merely for the sake of generating the link type definitions in the zome’s block in dna.json? If that’s the case, maybe the aliases defined in app.json simply replace the link type defs in zome.json? Is there any base/target type checking that happens subconsciously at validation time, or does it just check that the base and target exist? It doesn’t look like base and target are actually passed into the validation function, despite what the documentation says.

thedavidmeister · December 17, 2019, 11:14am

@pauldaoust the base was definitely a dependent validation at one point (maybe this has changed, i did not check the code), in that case the problem isn’t that you need the base or target for the current validation but that you need the base to have already been validated from the perspective of the current zome before the link validation will even start

pauldaoust · December 17, 2019, 5:20pm

@thedavidmeister okay, cool, so at the subconscious layer, link validation just makes sure that the dependencies (base and target) exist and are validated, but doesn’t check any of their content against the link’s constraints.

Re: validation dependencies, I can see a future where app authors will want to either (a) check that dependencies are valid, or (b) check that they’re valid and use their content in validating the current entry on which they depend. (In the above case that would look like “base and target are both of the expected entry types”; could be handled by the HDK.)

pospi · December 19, 2019, 4:37am

you won’t know the namespace of the other zome’s stuff at authorship time

Yes, I agree this is a significant barrier to this being workable. There needs to be some way to refer to entries between zomes at compile time. That’s why I advocated for zome_name:entry_type_name rather than zome_name:crate_name:entry_type_name, because the former two can be controlled predictably within a single project (read: DNA). It creates a DNA-global namespace for zome names, but I think that’s OK- and if we create some patterns for macro-driven helpers that can inject the name of the zome, we have the best of all worlds.

@pauldaoust’s proposal for defining zome aliases in the DNA manifest file gets my support as a more considered solution that effectively does the above but with a little more structure.

pauldaoust · October 1, 2020, 8:04pm

Update on this subject: The new Holochain RSM has the zome ID and entry type ID of each committed entry, as well as the zome ID of each committed link (links no longer have types). That effectively prevents namespace clashes between entry types from two zomes, which are supposed to be nicely encapsulated black boxes from a modularity perspective.

Additionally, because you don’t have to define types for your links anymore, that means you don’t have the problem of needing to referr to another zome’s entry types using a reliable handle.

So between these two things, the problem is effectively gone.

guillemcordoba · October 1, 2020, 8:33pm

It makes my brain reaaaally happy when by removing an abstraction problems disappear, and a new axis in some weird dimensional mental model appears (more flexibility without link types). In short: yeyy!!

pauldaoust · October 7, 2020, 5:54pm

@guillemcordoba could you unpack why typeless links are so exciting to you as an app developer? I can make guesses, but I dont’ think I’ve got the insight that comes from having worked with both Redux and RSM. (And TBH it surprised me that we removed link types.)

thedavidmeister · November 27, 2020, 1:34am

@pauldaoust just speaking for myself here, but rust works very well on raw Vec<u8> binary data, trying to squeeze certain abstractions into ‘slots’ or ‘strings’ is just more difficult than a plain old vector sometimes