How to bidirectionally replicate data between DNAs?

pauldaoust · September 12, 2019, 8:39pm

Jamison Day would be another good guy to bring into this conversation, but sadly he’s not on the forum (yet). When we were discussing this in the internal MM, he was concerned that allowing bidirectional bridges would create circular dependencies that (if not managed correctly) could trigger an infinite loop of new entries being created in response to other entries being created. To my mind, though, that risk already exists between zomes and even within a zome. It does require discipline to prevent this sort of thing.

I’m also pretty fond of the programming paradigm based on signals, and YES, it does feel very Ceptr-like. Very much ‘receptive capacity’/yin/etc and all that. Also makes me think of functional reactive programming, which is touted as a way to get a grip on knock-on effects because it’s all about one-way data binding (although you can definitely create infinite causative loops there too).

Hmmmm, signal-passing as a way of propagating data across bridges. This is an interesting line of inquiry. Here’s what popped up for me:

A client and a DNA have a conversation with each other with a pair of channels. One of them, zome calls, is active/direct/yang: the client directs the DNA to do something.

The other, signals, are passive/diffuse/yin: the DNA is informing rather than directing. There’s no dependency on the client; in fact, the DNA doesn’t even know/care if a client is listening.

So now we’ve got this nice duality, yin/yang, passive/active, diffuse/direct, a nice normalisation that creates a control flow without circular dependencies.

Hmmmmm… what other thing acts as a ‘client’ that can make zome calls?

Well how about other DNAs via bridging?

We probably want to avoid explicit circular dependencies among DNAs too. So what could we use to get two-way communication between DNAs and still keep the dependency graph clean?

pospi · September 18, 2019, 6:28am

I don’t think there is a way to do it with the dependency graph clean- as you said, these things require discipline.

The way I would implement it for bidirectional functionality is to have a control flow as follows (indentation shows caller ⇒ callee relationships):

UI call to DNA A
    DNA A gateway
        DNA A handler
        update signal
            DNA B receiver
                DNA B handler

UI call to DNA B
    DNA B gateway
        DNA B handler
        update signal
            DNA A receiver
                DNA A handler

Essentially, separating out the handler logic from the various interpreters ensures you avoid any infinite loops.

When processing operations within the same DNA, different combinations are possible since you can create zome A records from within another zome B and vice versa:

UI call to zome A
    zome A gateway
        zome A handler
        zome B handler
        update signal
            (for 3rd-party use only, nothing listening)

UI call to zome B
    zome B gateway
        zome A handler
        zome B handler
        update signal
            (for 3rd-party use only, nothing listening)

Would be interested in others thoughts on these approaches and how the logic might differ when intra-DHT vs inter-DHT; specifically with regard to validation. (I think this might be where bidirectional call is needed, if receiving networks care about the integrity of external data linking to them.)

I think there is parity between inter-DNA messaging and client messaging- could it just be the same API, with explicit grants between zomes to filter the message types they are listening for?

ViktorZaunders · September 18, 2019, 1:15pm

Yeah, isn’t this risk also there when connecting say three DNAs, cyclic entry creation that could spin out lots of entries by dependency with poor code? So not a viable route anyway?

pospi · September 27, 2019, 2:42am

There’s been some really good indepth discussion on this between @pauldaoust & myself for those who are watching along. Requirements and use-cases crystallizing nicely.

pauldaoust · September 30, 2019, 7:39pm

@ViktorZaunders yeah, that’s exactly the concern that one of our team members raised. Reasoning about two mutually interdependent DNAs is moderately easy, and so is reasoning about three if you wrote them all. But once you have an ecosystem of developers and their DNAs, you could run into circles with unintended consequences.

@pospi

you can?! Wow, I didn’t realise that — it feels like leaky encapsulation to me. Wonder if that was just an oversight and it’s gonna get closed in the future…

Back in my programming days, I used to fret and obsess about circular dependencies. In fact, they were impossible cuz I was writing in C#. Made for some convoluted ways of getting the dependency graph untangled; dependency injection was usually my go-to.

Thinking about it, ambient signal emission doesn’t prevent this either. DNA X could emit signal A that causes DNA Y to take action and emit signal B that DNA X receives, causing it to emit signal A again. It was just my proposal for a way to un-circular-ize hard DNA dependencies in a tidy way. Ambient/declarative/passive signals allow a DNA to be ignorant of its consumers, whereas direct/imperative/active calling/message-passing introduces tight coupling because it requires the caller to know how its callees work.

I’m chewing over your thoughts on signalling, message passing, and validation @pospi. Raises interesting thoughts. Re: validation, I see the value. It could be ensured just as easily with emit as with call, I think, because my conductor can provide assurances that the right DNA emitted that signal.

Re:

Definitely see what you’re saying there. Looking for the base abstraction, call could certainly be seen as a special case of send with a method name and tuple of arguments as its message.

And within one conductor, that’s pretty much how it works — the conductor creates a special public grant type for intra-DNA zome calls, inter-DNA zome calls, and UI calls, then gives the token to the callers. It is nice to have a special affordance for function calls layered on top of message passing though, wouldn’t ya say?

There’s even a pattern for doing it between agents — Alice shares the cap token with Bob, which he then passes back to her whenever he wants to “call a zome function in her running instance” (which actually just looks like him sending her a message consisting of function name, parameters, and cap token). Alice checks the function he wants to call against the conditions of the grant represented by his token, then calls the function for him if it all checks out. Eventually I think this might have its own convenience function in the HDK.

pospi · October 17, 2019, 5:53am

I hope not, because I think it’s a really good time-saver for zome mixins. Think of them as stateful additions to custom business logic… it’s way easier if they can be plugged in to define all the record types, and can be driven by that business logic. I think it’s actually necessary to have that feature to be able to use zomes as functional mixins correctly- otherwise you need to expose all zome functionality over the RPC gateway, which means that external clients would be able to manipulate the zome state without restriction. Unsure if I’m explaining that properly but hope it makes sense…

pauldaoust · October 17, 2019, 2:58pm

Oh, hmmm, I see your point — you’re saying that if you want to use a zome to add functionality to another zome’s stuff (e.g., zome B could hang links on zome A’s entries, and you want to make it generic so that zome A could be anything) either they need to be able to privately talk to each other without exposing their guts to the UI (that is, some sort of exposure level other than hc_public), or they need to be able to access each other’s entries directly. Is that about right?

I’ve always seen zomes as basic units of encapsulation that shouldn’t be able to access each other’s data. This is less important when one developer creates all the zomes in a DNA, but it becomes a big deal when devs start plugging third-party zomes into their own DNAs. You’ve got this issue of zomes serving two related but distinct purposes: encapsulation and modularity.

I only half know what I’m saying; it’s hard to talk about it with concrete examples. Sounds like you’ve got some though; what sorts of mixin-style zomes have you built that depend on access to other zomes’ entry types?

Interesting that you say ‘functional’ mixins, since data hiding isn’t really a thing in functional programming; that’s more of a leftover from OOP days.

@freesig I guess this answers our question about whether entry types are namespaced by zome name or hash! Looks like all data lives in a common pool.

pospi · October 28, 2019, 9:28am

Yep, we are on the same page 100%. So here is my use-case, without which doing REA accounting on Holochain for any non-trivial app would be much more involved:

HoloREA has an “observation” DNA that holds Process, EconomicEvent and EconomicResource records (plus a few others), each in separate zomes.
Business logic in a client app wants to constrain the way that particular events are entered, such that only certain resource types are allowed. Or pick whatever use case here, I think it’s safe to say that businesses will want particular business processes to be carried out without allowing any random arbitrary process to be defined by their users.

I think that’s it. You can see the same pattern in operation currently between the EconomicEvent and EconomicResource zomes- resources aren’t manipulateable directly, only via events. But I wanted to distinctly separate the two, to essentially make people context-switch between thinking about logging events vs thinking about querying resources.

So you could maybe combine both of those into one zome & call it ProvenanceTrail or something, ok, don’t like it but I could live with it.

But what happens when a business wants to do their custom integration over the top? That would mean they have to put both those zomes, along with their custom zome, all together in order for it to work.

Basically it feels to me like such a change would force you to build monolithic zomes rather than being able to neatly separate them.

I think this probably also has implications for zome traits- can’t do those as easily if you’re forced to combine them with other namespaces all the time.

What do you think? Am I making sense or missing something?

data hiding isn’t really a thing in functional programming

Of course it is! What do you think closures are for? In fact, they are better at data hiding than objects are… access specifiers can be bypassed by reflective capabilities a lot of the time… no “backdoor” into a closure’s state from within the language…

pauldaoust · November 4, 2019, 10:51pm

ahhhhhhh the map gets even clearer. So I was picturing the observation layer as something that simply provided an API to create the three building blocks (resources, events, agents) without any higher knowledge of why it’s being asked to create. Which is sort of true, but sort of not, in the sense that it should only create these things in response to legitimate business rules that come from the outside.

In the smaller scale example (EconomicEvent and EconomicResource), I take this to mean that only the EconomicEvent zome should be allowed to call EconomicResource's functions, because only it knows how to do it properly. And in the larger scale example, only the custom rules for a business know how to properly call any of the observation layer’s zome functions — is that right?

The question that comes up for me right away is, how does the observation DNA determine whether it’s being called legitimately or not? I don’t know. There’s no way to say hc_public(but_only_for_these_approved_UIs_and_DNAs). I’ve got thoughts involving dependency injection floating around in my head, but you couldn’t do that cross-DNA… you could do it cross-zome, though, and of course inside a zome.

okay, fair enough I wasn’t thinking of closures when I wrote that up. I was thinking about that immutable, category-theory style that FP encourages — data structures are just dumb objects with no internal state, and state mutation involves mapping/reducing/filtering on data structures and the monads that hold them, to produce new data structures.

pospi · November 5, 2019, 11:56pm

Broadly yes! I think we’re still understanding each other.

Well, this is what I’m solving for. By not exposing the observation API endpoints in a custom integration, and only allowing its data stores to be manipulated via some proxy zome (the “business rules”, in this case), you prevent the observation API from being able to be called incorrectly (indeed, at all).

FWIW in a standard install, the observation API is precisely as you have described- it provides facilities to create building blocks without any higher knowledge of why it’s being asked to create. It’s only when people need custom constraints and logic that it becomes necessary.

Anyway, I think we have veered off topic somewhat. Can we basically agree that there is a need for handlers in one zome to be able to manipulate entries defined in another? I feel as though that is a hard requirement in order for “mixin zomes” to be able to function as intended.

pauldaoust · November 6, 2019, 4:23am

Ha ha, sorry, I just wanted to take the opportunity to understand the need more concretely so you’re saying that the observation API endpoints are not hc_public, correct? Oh, and I forgot to ask: what’s the definition of ‘manipulating another zome’s entries’ in the context of this discussion?

pospi · November 7, 2019, 4:06am

Umm no, I don’t think that’s anything to do with it. If the API endpoints weren’t public and you were expecting capability tokens between DNAs to restrict functionality for you, you’d be out of luck. The agent could just take the cap token and use it against the RPC gateway of the “restricted” zome to pass in whatever parameters they wish. Real security of this sort is only possible if there is no way to call into the zome externally except via the proxy zome.

pauldaoust · November 26, 2019, 8:47pm

@pospi look what I just discovered when I was browsing through the Guidebook! From Emitting Signals:

Future additions will be:

Signal signature description in the DNA ADR 13 describes signals as statically defined properties of a DNA which would enable conductor level binding/connecting of signals with slotes (i.e. zome functions) similar to bridges but with looser coupling.

Reading ADR 13, we find:

Finally, just as you can call any function using the core_api::call() , you can register a listener with core_api::listen() and you and unregister a listener with core_api::unlisten().

This suggests that your desire — to see signals emitted in one DNA to be received by all connected DNAs in the same conductor — is on the map!

pospi · November 28, 2019, 8:06am

Hey I have a friend who wants to contribute some tax accounting standards stuff to HoloREA’s ecosystem… she has dived in to Corda a bit so I was hoping I could share your doc with her in order to get her up to speed, provided I tell her not to circulate? How do you feel about that kind of thing per that document specifically?

pauldaoust · November 28, 2019, 9:52pm

I think that’s fine, esp for the sake of HoloREA

pospi · December 2, 2019, 10:04pm

https://miro.com/app/board/o9J_kx0H2NA=/

pospi · December 2, 2019, 10:30pm

The craziest validation I’ve seen so far- https://github.com/holo-rea/holo-rea/issues/93

pospi · December 2, 2019, 10:50pm

pospi · December 2, 2019, 10:56pm

pospi · December 2, 2019, 10:59pm