How to bidirectionally replicate data between DNAs?

I notice you get Cyclic dependency in bridge configuration when you attempt to have DNAs talking to each other. Are there any plans to work towards lifting this limitation?

There are some use-cases where it feels useful to be able to send messages bidirectionally between two different DNAs, and currently this limitation prevents such an arrangement.

Some more context on this: what Iā€™m actually trying to achieve here is bidirectional data propagation between DNAs, so perhaps the signalling API can be used to implement this. More in holo-rea#57.

Short version: I think it could be a good solution, provided signals emitted in one DNA are received by all connected DNAs running in the conductor; and no bridging is needed to enable this data to propagate.

Do you sacrifice data validity doing it that way?

You mean with signals vs with call? I donā€™t think so- either way, the payload has to validate against a Serde schema in order to be parseable; and the incoming data goes through the same validation steps regardless of how it enters the zome (thatā€™s just up to how you architect your code).

1 Like

@pospi I canā€™t help but notice you created a cyclic forum / GH issue dependency there with cross-linking :smiley: life imitates art!

I donā€™t know the reason behind us preventing cyclical bridge dependencies. Ashanti might know about this; I know she was working on bridge config sanitisation a couple months ago. Sheā€™s not on the forum though; will ask on MM.

So far signals are only something you explicitly trigger in a DNA for consumption by the client app. I donā€™t know what the long-range plan for signals is, but I believe weā€™re also looking at having DHT events trigger signals to subscribers, and perhaps signal propagation between DNAs as well. Donā€™t know why you couldnā€™t; you can have calling dependencies between zomes in a DNA.

(It occurs to me that the client could also do ad-hoc bridging between DNAs, and signals could help facilitate this.)

I got more information on this from @zippy:

@lucksus did most of the work on this. I think the cyclic dependency check has to do with the order of instantiating the DNAs and how things are called to during that instantiation to establish the bridging. Technically two DNAā€™s should be able to depend on eachother because the bridging is just about a capability token having been requested and granted. It maybe that the current implementation of the conductor canā€™t handle doing that in two phases during installation. Letā€™s check in with @lucksus when he gets back.

Iā€™m also going to tag @zippy to draw him in to the emerging broader discussion here.

Is developing a ā€œsignal passingā€ system design worth pursuing as the preferred way to manage inter-DNA communication? On face value it feels as though it may simplify a lot of the complex link management logic that Iā€™ve had to implement between records kept in different network spaces. And there are still limits to what I can do there- so far all the event flows have been in one direction but that wonā€™t hold for every feature I need.

A programming paradigm based on signal processing also feels like more of a ā€œCeptr-likeā€ workflow :wink:

Jamison Day would be another good guy to bring into this conversation, but sadly heā€™s not on the forum (yet). When we were discussing this in the internal MM, he was concerned that allowing bidirectional bridges would create circular dependencies that (if not managed correctly) could trigger an infinite loop of new entries being created in response to other entries being created. To my mind, though, that risk already exists between zomes and even within a zome. It does require discipline to prevent this sort of thing.

Iā€™m also pretty fond of the programming paradigm based on signals, and YES, it does feel very Ceptr-like. Very much ā€˜receptive capacityā€™/yin/etc and all that. Also makes me think of functional reactive programming, which is touted as a way to get a grip on knock-on effects because itā€™s all about one-way data binding (although you can definitely create infinite causative loops there too).

Hmmmm, signal-passing as a way of propagating data across bridges. This is an interesting line of inquiry. Hereā€™s what popped up for me:

A client and a DNA have a conversation with each other with a pair of channels. One of them, zome calls, is active/direct/yang: the client directs the DNA to do something.

The other, signals, are passive/diffuse/yin: the DNA is informing rather than directing. Thereā€™s no dependency on the client; in fact, the DNA doesnā€™t even know/care if a client is listening.

So now weā€™ve got this nice duality, yin/yang, passive/active, diffuse/direct, a nice normalisation that creates a control flow without circular dependencies.

Hmmmmmā€¦ what other thing acts as a ā€˜clientā€™ that can make zome calls?

Well how about other DNAs via bridging?

We probably want to avoid explicit circular dependencies among DNAs too. So what could we use to get two-way communication between DNAs and still keep the dependency graph clean?

:thinking:

I donā€™t think there is a way to do it with the dependency graph clean- as you said, these things require discipline.

The way I would implement it for bidirectional functionality is to have a control flow as follows (indentation shows caller ā‡’ callee relationships):

UI call to DNA A
    DNA A gateway
        DNA A handler
        update signal
            DNA B receiver
                DNA B handler

UI call to DNA B
    DNA B gateway
        DNA B handler
        update signal
            DNA A receiver
                DNA A handler

Essentially, separating out the handler logic from the various interpreters ensures you avoid any infinite loops.

When processing operations within the same DNA, different combinations are possible since you can create zome A records from within another zome B and vice versa:

UI call to zome A
    zome A gateway
        zome A handler
        zome B handler
        update signal
            (for 3rd-party use only, nothing listening)

UI call to zome B
    zome B gateway
        zome A handler
        zome B handler
        update signal
            (for 3rd-party use only, nothing listening)

Would be interested in others thoughts on these approaches and how the logic might differ when intra-DHT vs inter-DHT; specifically with regard to validation. (I think this might be where bidirectional call is needed, if receiving networks care about the integrity of external data linking to them.)

I think there is parity between inter-DNA messaging and client messaging- could it just be the same API, with explicit grants between zomes to filter the message types they are listening for?

Yeah, isnā€™t this risk also there when connecting say three DNAs, cyclic entry creation that could spin out lots of entries by dependency with poor code? So not a viable route anyway?

Thereā€™s been some really good indepth discussion on this between @pauldaoust & myself for those who are watching along. Requirements and use-cases crystallizing nicely.

@ViktorZaunders yeah, thatā€™s exactly the concern that one of our team members raised. Reasoning about two mutually interdependent DNAs is moderately easy, and so is reasoning about three if you wrote them all. But once you have an ecosystem of developers and their DNAs, you could run into circles with unintended consequences.

@pospi

you can?! Wow, I didnā€™t realise that ā€” it feels like leaky encapsulation to me. Wonder if that was just an oversight and itā€™s gonna get closed in the futureā€¦

Back in my programming days, I used to fret and obsess about circular dependencies. In fact, they were impossible cuz I was writing in C#. Made for some convoluted ways of getting the dependency graph untangled; dependency injection was usually my go-to.

Thinking about it, ambient signal emission doesnā€™t prevent this either. DNA X could emit signal A that causes DNA Y to take action and emit signal B that DNA X receives, causing it to emit signal A again. It was just my proposal for a way to un-circular-ize hard DNA dependencies in a tidy way. Ambient/declarative/passive signals allow a DNA to be ignorant of its consumers, whereas direct/imperative/active calling/message-passing introduces tight coupling because it requires the caller to know how its callees work.

Iā€™m chewing over your thoughts on signalling, message passing, and validation @pospi. Raises interesting thoughts. Re: validation, I see the value. It could be ensured just as easily with emit as with call, I think, because my conductor can provide assurances that the right DNA emitted that signal.

Re:

Definitely see what youā€™re saying there. Looking for the base abstraction, call could certainly be seen as a special case of send with a method name and tuple of arguments as its message.

And within one conductor, thatā€™s pretty much how it works ā€” the conductor creates a special public grant type for intra-DNA zome calls, inter-DNA zome calls, and UI calls, then gives the token to the callers. It is nice to have a special affordance for function calls layered on top of message passing though, wouldnā€™t ya say? :wink:

Thereā€™s even a pattern for doing it between agents ā€” Alice shares the cap token with Bob, which he then passes back to her whenever he wants to ā€œcall a zome function in her running instanceā€ (which actually just looks like him sending her a message consisting of function name, parameters, and cap token). Alice checks the function he wants to call against the conditions of the grant represented by his token, then calls the function for him if it all checks out. Eventually I think this might have its own convenience function in the HDK.

I hope not, because I think itā€™s a really good time-saver for zome mixins. Think of them as stateful additions to custom business logicā€¦ itā€™s way easier if they can be plugged in to define all the record types, and can be driven by that business logic. I think itā€™s actually necessary to have that feature to be able to use zomes as functional mixins correctly- otherwise you need to expose all zome functionality over the RPC gateway, which means that external clients would be able to manipulate the zome state without restriction. Unsure if Iā€™m explaining that properly but hope it makes senseā€¦

Oh, hmmm, I see your point ā€” youā€™re saying that if you want to use a zome to add functionality to another zomeā€™s stuff (e.g., zome B could hang links on zome Aā€™s entries, and you want to make it generic so that zome A could be anything) either they need to be able to privately talk to each other without exposing their guts to the UI (that is, some sort of exposure level other than hc_public), or they need to be able to access each otherā€™s entries directly. Is that about right?

Iā€™ve always seen zomes as basic units of encapsulation that shouldnā€™t be able to access each otherā€™s data. This is less important when one developer creates all the zomes in a DNA, but it becomes a big deal when devs start plugging third-party zomes into their own DNAs. Youā€™ve got this issue of zomes serving two related but distinct purposes: encapsulation and modularity.

I only half know what Iā€™m saying; itā€™s hard to talk about it with concrete examples. Sounds like youā€™ve got some though; what sorts of mixin-style zomes have you built that depend on access to other zomesā€™ entry types?

Interesting that you say ā€˜functionalā€™ mixins, since data hiding isnā€™t really a thing in functional programming; thatā€™s more of a leftover from OOP days.

@freesig I guess this answers our question about whether entry types are namespaced by zome name or hash! Looks like all data lives in a common pool.

Yep, we are on the same page 100%. So here is my use-case, without which doing REA accounting on Holochain for any non-trivial app would be much more involved:

  • HoloREA has an ā€œobservationā€ DNA that holds Process, EconomicEvent and EconomicResource records (plus a few others), each in separate zomes.
  • Business logic in a client app wants to constrain the way that particular events are entered, such that only certain resource types are allowed. Or pick whatever use case here, I think itā€™s safe to say that businesses will want particular business processes to be carried out without allowing any random arbitrary process to be defined by their users.

I think thatā€™s it. You can see the same pattern in operation currently between the EconomicEvent and EconomicResource zomes- resources arenā€™t manipulateable directly, only via events. But I wanted to distinctly separate the two, to essentially make people context-switch between thinking about logging events vs thinking about querying resources.

So you could maybe combine both of those into one zome & call it ProvenanceTrail or something, ok, donā€™t like it but I could live with it.

But what happens when a business wants to do their custom integration over the top? That would mean they have to put both those zomes, along with their custom zome, all together in order for it to work.

Basically it feels to me like such a change would force you to build monolithic zomes rather than being able to neatly separate them.

I think this probably also has implications for zome traits- canā€™t do those as easily if youā€™re forced to combine them with other namespaces all the time.

What do you think? Am I making sense or missing something?

data hiding isnā€™t really a thing in functional programming

Of course it is! What do you think closures are for? In fact, they are better at data hiding than objects areā€¦ access specifiers can be bypassed by reflective capabilities a lot of the timeā€¦ no ā€œbackdoorā€ into a closureā€™s state from within the languageā€¦

1 Like

ahhhhhhh the map gets even clearer. So I was picturing the observation layer as something that simply provided an API to create the three building blocks (resources, events, agents) without any higher knowledge of why itā€™s being asked to create. Which is sort of true, but sort of not, in the sense that it should only create these things in response to legitimate business rules that come from the outside.

In the smaller scale example (EconomicEvent and EconomicResource), I take this to mean that only the EconomicEvent zome should be allowed to call EconomicResource's functions, because only it knows how to do it properly. And in the larger scale example, only the custom rules for a business know how to properly call any of the observation layerā€™s zome functions ā€” is that right?

The question that comes up for me right away is, how does the observation DNA determine whether itā€™s being called legitimately or not? I donā€™t know. Thereā€™s no way to say hc_public(but_only_for_these_approved_UIs_and_DNAs). Iā€™ve got thoughts involving dependency injection floating around in my head, but you couldnā€™t do that cross-DNAā€¦ you could do it cross-zome, though, and of course inside a zome.

okay, fair enough :smiley: I wasnā€™t thinking of closures when I wrote that up. I was thinking about that immutable, category-theory style that FP encourages ā€” data structures are just dumb objects with no internal state, and state mutation involves mapping/reducing/filtering on data structures and the monads that hold them, to produce new data structures.

Broadly yes! I think weā€™re still understanding each other.

Well, this is what Iā€™m solving for. By not exposing the observation API endpoints in a custom integration, and only allowing its data stores to be manipulated via some proxy zome (the ā€œbusiness rulesā€, in this case), you prevent the observation API from being able to be called incorrectly (indeed, at all).

FWIW in a standard install, the observation API is precisely as you have described- it provides facilities to create building blocks without any higher knowledge of why itā€™s being asked to create. Itā€™s only when people need custom constraints and logic that it becomes necessary.

Anyway, I think we have veered off topic somewhat. Can we basically agree that there is a need for handlers in one zome to be able to manipulate entries defined in another? I feel as though that is a hard requirement in order for ā€œmixin zomesā€ to be able to function as intended.

Ha ha, sorry, I just wanted to take the opportunity to understand the need more concretely :wink: so youā€™re saying that the observation API endpoints are not hc_public, correct? Oh, and I forgot to ask: whatā€™s the definition of ā€˜manipulating another zomeā€™s entriesā€™ in the context of this discussion?

Umm no, I donā€™t think thatā€™s anything to do with it. If the API endpoints werenā€™t public and you were expecting capability tokens between DNAs to restrict functionality for you, youā€™d be out of luck. The agent could just take the cap token and use it against the RPC gateway of the ā€œrestrictedā€ zome to pass in whatever parameters they wish. Real security of this sort is only possible if there is no way to call into the zome externally except via the proxy zome.

@pospi look what I just discovered when I was browsing through the Guidebook! From Emitting Signals:

Future additions will be:

  • Signal signature description in the DNA ADR 13 describes signals as statically defined properties of a DNA which would enable conductor level binding/connecting of signals with slotes (i.e. zome functions) similar to bridges but with looser coupling.

Reading ADR 13, we find:

Finally, just as you can call any function using the core_api::call() , you can register a listener with core_api::listen() and you and unregister a listener with core_api::unlisten().

This suggests that your desire ā€” to see signals emitted in one DNA to be received by all connected DNAs in the same conductor ā€” is on the map!

1 Like