Naming conventions & module patterns for Rust crates?

pospi · November 4, 2019, 7:09am

Getting closer to thinking seriously about how to package things for other developers to consume when developing Holochain apps. At a high level, what I can see is needed is a naming convention to make locating Holochain zome packages on crates.io easier. Has anyone else thought about what an appropriate prefix might be?

holochain_zome_?
holo_zome_?
happ_?

Something else?

Beyond that, the next question is to figure out what should be exported by each crate. Just throwing some thoughts out there based on patterns I can see becoming necessary for HoloREA, but at this stage I think it makes sense for the main crate entrypoint to export the full zome definition. This makes importing an unmodified “mixin” zome into your project essentially a one-liner.

From there it is probably a case of exporting zome function handler callbacks and entry / link type definitions as individual functions in sub-modules, so that they can be selectively imported into consuming zome handlers for use-cases where domain-specific business logic wants to manage the behaviour of library zomes.

DHT datatypes and structs I am less sure about, given that in our architecture those are shared code (because they are shared between zomes in many cases). But many developers will probably want to export those from their zome crates as well, given that other zomes will need to import them in order to decode payloads requested via call between DNAs.

Any strong opinions on this @pauldaoust @thedavidmeister @wollum?

thedavidmeister · November 4, 2019, 11:08am

using rust crates as a native way to achieve “mixins” is totally something we wanted to achieve by adopting rust

i’m very glad we’re getting to the point that we can start realising some wins in this area

i’ll mention a few things for context tho:

crates that reference github checkouts can’t be published, so make sure to sweep your projects as the old hc scaffolding did exactly this
if we’re getting to this stage i’d like to chase up whether we can hit stable rust versions for zome development soonish, even if core binary builds need nightly it would be great to get HDK and downstream WASM on stable
yes it makes sense to split your datatypes and structs and impls into a dedicated crate, we’ve found this can be critical for interop in certain contexts due to the way that the rust compiler works
there’s some scripts you can use that publish things to crates.io in a release hook if you’re using holonix for releases

pauldaoust · November 4, 2019, 8:11pm

My gut feeling is that hc_zome_ is nice and easy to type, and doesn’t seem to clash with any existing namespaces (there’s a bit of space occupied by hc, mostly by crypto libs). It (or its longer variant holochain_zome_) also feels the most accurate — holo implies Holo Hosting to me, and happ implies entire DNAs or constellations of DNAs.

As for what to export… my advice will be limited by how little I know about the Rust module system. But yeah, I like the idea of a crate’s entry point being the full zome definition, so you can drop the zome into your DNA. Are there any technical barriers to further breaking it up into submodules in the way you describe for picking and choosing of functionality, but in a crate-ish fashion?

Here are the types of not-quite-full-zome modularity I’d like to see:

Helper libraries — modules that implement universal functionality and let you add app-specific functionality via inversion of control. Example: a lib that implements the basics of a certificate-based permissions system — authorities’ signatures, expiry time, subject’s public key, etc — but let you add your own conditions like the actual resource and the specific privileges being granted.
Struct definitions that allow zomes/DNAs to (de-)serialise each other’s data. (What’s the compilation and runtime cost of including them in each zome lib? I suspect that the generated serde stuff will exist in every compiled zome; is that correct?)

This touches on the bigger topic of code reuse and dependency management. What exactly is our story about zomes and what they’re for? Basic units of code reuse, or means of encapsulation and data hiding, or both? And DNAs — they’re for core vs accessory functionality, but also for partitioning privileged groups from each other. I’ve got more thoughts, but I don’t wanna hijack this thread. Just bringing it up in order to say that the intended use cases for the constructs known as zomes and DNAs will inform our patterns for crates.

pospi · November 7, 2019, 3:45am

I’ve logged an issue for this in HoloREA, if anyone wants to help out by being our first implementor and assisting with defining the interface contracts or writing developer documentation

github.com/holo-rea/holo-rea

Restructure zome modules as appropriate for third-party integration

opened 03:44AM - 07 Nov 19 UTC

closed 04:44AM - 23 Jan 20 UTC

pospi

enhancement

There's a bunch of naming conventions to align on and standardise first, see htt…ps://forum.holochain.org/t/naming-conventions-module-patterns-for-rust-crates/1294/2 Then it will be a case of changing the names of the zome crates & wiring up dependencies; and shuffling API handler function names around to make them saner for importing, probably looking at standardised sub-module names. The module structure implemented in those modules already converted can be used as a pattern to follow. - [ ] observation DNA - [x] economic event - [x] economic resource - [x] process - [ ] fulfillment - [ ] satisfaction - [ ] planning DNA - [ ] commitment - [ ] intent - [ ] fulfillment - [ ] satisfaction - [ ] specification DNA - [ ] process specification - [ ] resource specification - [ ] unit - [ ] proposal DNA - [ ] proposal - [ ] proposedTo - [ ] proposedIntent --------------- *(note: these items are worth splitting out to their own issues):* Step 2 involves the `identifiers` constants present in each module. Many `_defs` and `_storage` crates depend on common entry & link type identifiers that are shared in foreign key entries split between DNAs. Resolving this unnecessary coupling involves splitting **remote key index** entry & link type definitions out into separate zomes for better compartmentalisation, which is mainly a case of changing `to!` links to `from!` ones. (eg. `Process` record address stubs are defined in the `commitment` zome but they also link to the `intent` zome, so should be in their own separate `process` zome). We also need to address how to make this more "mixin-y", probably by genericising this `process` zome into a `links` or `anchors` zome that can be generically called from both `commitment` and `intent`. Otherwise it's non-trivial to configure a generic `process` link registration zome to link to both `Commitment` and `Intent` entry types... Remember to create documentation for implementors in the ecosystem wiki and iterate it as we go. The rest is probably a dance with implementors to see what subsets of functionality they need to include in their projects, and how we can make integrating those features as little effort as possible. "Entire zome" and "just datatypes and API handler interface" is certainly necessary, and a good place to start.

pospi · November 7, 2019, 3:51am

I just wanted to flag the above with @pauldaoust as there is a conversation in another thread that has me concerned about the feasibility of mixin zomes - see How to bidirectionally replicate data between DNAs? - #13 by pospi

Might be best to continue any conversations about “driving one zome’s entries from within another zome” here, as I think it’s more related to this than it is to bidirectional data replication

pauldaoust · December 5, 2019, 9:39pm

A post was split to a new topic: Inquiry: Entry and link type namespacing

pospi · December 14, 2019, 1:11am

@philipbeadle @dhtnetwork I just wanted to make sure the two of you were aware of this conversation, in light of the background chatter RE the “anchors” mixin zome. I’m a stickler for sensible naming (sorry / not sorry) and it would be really good to see the hApps team leading by example on crate naming conventions.

(I would also like to see an announcement thread for that project on the forum somewhere as I believe it’s an important effort which should be gaining more community involvement in its development!)

dhtnetwork · December 14, 2019, 1:19am

Hi @pospi dully noted. If we setup proper standards, it will be much easier down the road.

The Anchors work is still very early, thus we have not announced it. At the right time, we’ll need it post information about the anchors crate.

Thanks for the message. We can start by posting a thread in the technical discussion area. I’m not sure if I would classify the crate as a project just yet.

pospi · December 14, 2019, 1:26am

There are some optimisation things I wanted to draw out that will impact these architectures (ping @wollum).

Looking at the new app spec it appears as though we have some WASM optimisation steps being run now. I’m curious how much optimisation this will perform in regard to eliminating dead code; and how much of a module gets packed into resulting bytecode.

The reason to ask is that it makes more sense to me for the entire zome to be packaged as a single crate rather than having… probably 3 crates at a minimum (eg. hc_zome_anchors, hc_zome_anchors_lib & hc_zome_anchors_structs). Why 3 crates? Because in order to achieve an optimised build for all cases, you need to account for:

What data structures and code are needed to define the internals of the zome itself?
What helper functions are needed within other zomes to drive the internals of the mixin zome?
What data structures are needed to access the mixin zome’s records from an external DNA?

In each case, you want minimal necessary bytecode to be generated. So the more tangible question is- if my crate structure looks similar to this…

pub mod zome { /* ... */ }
pub mod helpers { /* ... */ }
pub mod api { /* ... */ }

(Where zome contains the zome, entry, link & function definitions; helpers contains the utility functions to be used by other zomes within the same DNA; and api contains the Serde structs needed to query and decode the zome’s records from an external DNA.)

…can I get an optimised WASM bundle such that if I’m compiling an external DNA which only requires stuff inside api, the code within helpers and zome is stripped?

Or would it just be generally more idiomatic and better to release them as separate crates? (note: I haven’t thought much about how such packaging would affect circular references between crates…)

pospi · January 8, 2020, 6:36am

I’m now experimenting with a setup whereby I want to include some logic from a standalone zome (meaning, the zome is also meant to run by itself, without edits) into a third-party zome for extending.

It looks like I might be encountering some of the rust crate packaging restrictions @thedavidmeister has been referring to above. I tried for a simple re-export to see if it would work, and got a “provides no linkable target” error when attempting to compile. So I guess the first question is: am I hitting this error because basically a crate that exports a zome def can’t be used as a library? I am presuming “yes”.

If that’s accurate, then my previous post can’t work. The zome def and helpers would need to be in separate crates, and I would need to import only the helpers into the third-party zome, and basically redefine the entire #[zome] by drawing in those helpers.

Is that essentially what needs to happen? It makes extending zomes necessitate a fair bit of boilerplate for simple changes, but I can also see how that’s unavoidable since the expanded code for a zome is probably full of stuff that would make composition of such definitions difficult or impossible.

So here is my revised crate structure for the necessitated flexibility I can see needed at this stage (given a mixin zome named “foo”):

hc_zome_foo exports the full #[zome] def. Can’t be imported by any other zomes.
hc_zome_foo_defs exports the entry type definitions as normal functions. The #[entry_def] macros would be added in the hc_zome_foo crate, inside #[zome]. This enables third party devs to use the zome’s entries in custom zomes.
hc_zome_foo_lib exports the functions to be used for #[zome_fn(...)], #[validate_agent], #[init] & other zome callbacks; as well as any other utility helper methods that act on the zome entries / links.
- The reason I think all of these belong in the same (separate) crate is that often third-party customisation will want to leave some functionality “as standard”, whilst also extending others. Having the handler functions separately means they can be imported into a custom zome and bound to #[zome_fn(...)] macros to provide standard behaviour. I think there’s also merit to the idea that any other zome in the local DNA may want to call into another zome’s API without going through call, for security reasons.
- That said, maybe it’s a use-case dependent thing. I’m sure there will be zomes that aren’t intended to be used in this way but that will want to be interfaced with via higher-order helper methods which are of a different nature to the API handler functions. For these cases the zome handler functions are just bloat in the WASM code… but if the compiler strips dead code, this doesn’t matter. Anybody have strong opinions? Maybe there could just be crates suffixed onto this base name- hc_zome_foo_lib_api_handlers, hc_zome_foo_lib_helpers etc…
hc_zome_foo_internals would provide the internal struct definitions needed inside the local DNA of the zome. These would be required by hc_zome_foo_defs but also by hc_zome_foo_lib and thus any DNA-local zomes wishing to interact with the mixin zome’s entries.
hc_zome_foo_rpc would define external data structures needed to send data into the zome and decode results from it. Ideally these can be split from hc_zome_foo_internals to keep the crate smaller. Note that in many cases hc_zome_foo_internals will want to import hc_zome_foo_rpc in order to define the relationships between external data structures and internal storage (pretty sure impl rules allow this?)

How does everyone feel about these new names and the progression of this fact-finding? Is this starting to sound good?

pauldaoust · January 8, 2020, 6:49pm

Hey @pospi you’re running into the modularity issues @freesig and I have been trying to work through. The big issue is, as you’ve discovered, that you can’t do #[zome] twice in a zome. And that has consequences for defining zome callbacks and entry/link types.

I for one would like to give my to pretty much all of what you’re proposing. You’ve got explicit separation between:

structs that are part of the public contract (_rpc)
structs that are part of the internal guts of the module (_internals)
domain logic (_lib)
entry/link type definitions (_defs)
zome function definitions

These are the primitives that I’ve identified too. @freesig and I have been talking about seeing a Holochain DNA as an implementation of a database (the ‘porcelain’ of your data model, to use Git’s lingo) on top of the database primitives that the graph DHT provides (the ‘plumbing’). Therefore it’s not ugly or leaky to allow the logic in _lib to directly perform source chain and DHT access commands.

Some further thoughts:

It does make sense to allow a consumer of your module to selectively expose logic from your _lib as part of their zome’s public API vs simply call it from their logic, but it doesn’t make sense to allow them to selectively include entry/link type definitions — this causes them to break through your abstractions and think about which defs they need to include. Therefore, I think the HDK needs a macro called #[entry_defs] which consumes a function with a signature of () -> Vec<ValidatingEntryType>, which would allow a dev to include all of your defs in one go.
Further to point 1, I also think that _defs and _lib are mutually dependent, so it may never make sense to split them into separate libs. Helper functions expect the entry types to be defined and usable; entry types are unnecessary without helper functions to make use of them. I’m happy to be told otherwise though.
We’ve still got a namespacing issue, so it may be important to allow the consumer of your module to provide a prefix that gets tacked onto all your entry/link type names. In this case, #[entry_defs] would consume a function of type (String) -> Vec<ValidatingEntryType>, where the string is the prefix. However, that means that the helper functions need to know what you’ve named those types, and I don’t have a clue how to do that in Rust. Passing the namespace name into every helper function that needs it would be an obvious way to do it, but that’s fiddly. Can macros help us out?
Similar to my leaky-abstraction concerns with selectively including entry/link type defs, consumers of your module shouldn’t need to worry about whether your init and validate_agent functions are important to the functioning of your module. Should there simply be the expectation that a consumer of a module should always call your functions from their callbacks of the same name, or is there a more macro-friendly way to enforce this?
Currently structs defined in _rpc will have a hard dependency on Holochain guts because in order to be useful they need to use macros in holochain_json_derive. Is this a concern? Should the core devs look into reducing the size and scope of holochain_json_derive? (For all I know it’s already pretty small, but I have a feeling it ain’t.)

pospi · January 12, 2020, 4:16am

Great to see convergence as always! I think this basically means I can keep going ahead with what I’m doing and see how the refactoring pans out.

I think we might have a use-case for “this causes them to break through your abstractions and think about which defs they need to include” - implementing custom resource attributes for a domain-specific resource type. You want selectivity here because you don’t want the entry and response type defs from the core Holo-REA EconomicResource used in your industry-specific resource API; you want BoxedBeef resources and a bunch of supply chain identifiers specific to boxed beef. But you do still want the standard EconomicEvent and other entry types to be used. Broadly agree that #[entry_defs] might be a useful macro to have, though.

Will see what happens RE _defs and _lib. I still think the split is useful since only the hosting zome wants _defs— “DNA-local” zomes can’t redefine the same entry types, in fact that’s an error.

RE namespacing issue, helper functions knowing type names & macros: yes, macros can help us out. I also think this discussion starts to make #[entry_defs] less of a good idea, because it also leads to making selective use of entry/link type definitions more of a common practise.

The straightforward way to handle the namespacing would be for the DNA developer to define their own entry & link type names as appropriate to their use-case; and then to invoke a compiler macro that accepts the names of the namespaced entry & link types and generates an appropriate set of _lib helper functions as a result. I think this is one of the options we spoke about in the zome link namespacing discussion.

Fine with the structs in _rpc and their dependencies- it doesn’t result in any code overhead since they’re just shared dependencies that the other companion crates are going to need in any case. Also a proc macro crate may not have any overhead since it’s a compile-time bit of code?

pauldaoust · January 14, 2020, 8:47pm

Thanks for being a guinea pig for this emerging pattern.

Let me see if I understand — by ‘entry’ type, do you mean a default resource type that isn’t all that flexible but can be used out-of-the-box by people with basic needs? What does a response type look like? Sounds like in a custom implementation an industry would want to implement events and define their own resource types; is that right?

My naïve thought is that this could be broken up into different _defs and _internals crates that could be included separately in a zome — one set for required bits, and one set for optional bits. Is that workable?

Oh, of course — I forgot about the case of self-contained zome modules and the zomes that want to interact with them.

Okay, cool — that was the idea I had, but wasn’t sure if it would be ergonomic for module authors and consumers. Would love for it to be automagical — that is, the helper-function-compiling macro could see what the module consumer already named the entry types and adjust accordingly. No idea if that’s possible, but what about this as a convention?

The module creator defines a struct that contains all the entry type definitions a module can produce and needs to work with:

struct MyModuleEntryTypeDefinitionMapping {
    required_entry_def_1: String,
    required_entry_def_2: String,
    optional_entry_def_1: Option<String>
}

The module consumer creates an instance of this struct.
The function that returns the Vec<ValidatingEntryType> receives this struct as an input parameter and uses it in its entry definitions. Maybe this could be a macro that generates the macro-tagged entry type definition functions; I dunno.
The macro that compiles the helper functions also receives this struct, and magic happens somewhere in here — either in the macro or in the generated functions — to honour the names and the optional-ness of each entry type.

This way, as long as the module creator makes good on the promises implied in this convention, we make sure that required entry types are defined, names are chosen for all entry types that are used, and helper functions know which entry types weren’t defined — all at compile time.

That’s true of zomes that import your module, but @freesig and I are also talking about the desire to support clients (UIs and middleware) written in Rust; it’d be nice for them to be able to use the same RPC structs too. This is a real scenario: both @freesig and @simwilso are doing it. The problem is that, not only are you dragging a bunch of Holochain into your client, but holochain_json_derive and its dependencies target a pretty specific version of Rust, so you have to toe the line if you want to use such a _rpc crate.

pauldaoust · January 14, 2020, 8:54pm

I just had a thought: Why not call _internals _structs? Feels like a more descriptive name. Then _rpc could be _structs_rpc.

pospi · January 16, 2020, 1:12am

I ended up calling them _structs_rpc and _structs_internal, respectively. I found myself asking “which structs, again?” without the additional context.

Overall this pattern is looking pretty useful (see the rea_* crate groupings under /lib), and has the side-effect of making my code a lot cleaner, and thus the dependencies between modules a lot clearer.

Something I am noticing is that the entry & link type IDs are referenced by zome defs & _structs_internal of other zomes, which makes that import quite weighty just to share constants. There’s also an argument to be made that internal entry / link type defs shouldn’t be shareable between zomes, but I haven’t followed through my lingering dependencies to find out whether or not that’s the case yet. (I suspect it might be, when looking at the defs in eg. EconomicEvent and noticing those to! links could just as easily be from! links in a separate zome module.)

So, if that becomes relevant I’ll bring it up again but I suspect the cross-dependency issue will go away with further modularisation.

I’d like to propose renaming _structs_internal to _storage; and _structs_rpc to _rpc. They are shorter names, and I also believe they add more context- structs are still talking about the things (and types of things) rather than what the packaged behaviour is for. The rest of the architecture makes more sense to me this way- for example, it seems obvious that library methods for manipulating a zome would depend on that zome’s storage backend and I/O interface.

pospi · January 16, 2020, 1:33am

Ok the more I think about it, you want to share these constants between zomes. Either way I cut it I am defining “an event links to a fulfillment” or “a fulfillment is linked to by an event”… the identifiers are needed on both sides of the relationship. I don’t want to have to redefine the same foreign key entry types in every pair of DNAs, I just want to import from a shared module. I think that ends up as _storage_consts in the new naming system?

I also like this module split because it means that things like globally unique network IDs, DNA hashes and zome names end up in separate modules that help to make the power structures and relational underpinnings of a wider hApp system evident…

pauldaoust · January 17, 2020, 4:23pm

yeah, that sounds good to me too — both the new names and the separation of _storage_consts from _storage. Ideally I’d like to see a function that lets a zome interrogate another zome (at compile time) for its entry/link type names, which would work well with built-in namespacing. But that has to be part of a larger design conversation.