Threading & Thread safety

ldwm · August 20, 2020, 4:20pm

I’m working on a model to track unspent transactions in a native credit currency. I would prefer being able to save the balance state, but should I take a threading model into account?

If there are simultaneous RPC calls that result in (new) transactions, would I have to reconstruct the total balance state on every request from the local chain? (that’s @guillemcordoba 's approach in the Mutual Credit repo I saw)

// keep track of all unspent transactions.
// TODO: should this be a global mutable variable,
// can i safely use get_entry() safely on every request
// or should I reconstruct this structure from local chain?
#[derive(Serialize, Deserialize, Debug, Eq, PartialEq, Clone, TreeEntry)]
pub struct UTXO {
    outputs: HashMap<PubKey, ReferencesMap<Transaction>>,
    balances: Reference1<Balances>
}

ldwm · August 26, 2020, 9:22am

Do you perhaps know about this @pauldaoust?

pauldaoust · August 26, 2020, 6:37pm

I feel like there are a couple threads in here, and I need to check whether I’m on the right track.

When you say ‘being able to save the balance state’, do you mean keep it in some sort of persistent memory so that the agent doesn’t have to keep recalculating it every time it needs it?

In this scenario, it sounds like your big concern is correctness of balances vs multiple function calls changing state in parallel. Is that true?

Because I might miss your response, I’m going to give a pre-answer assuming that I’m right It’s best to look at a source chain as a journal of local state changes from which you can reconstruct a full state at any point (like a write-ahead log in a relational database). This should mean that you can cache the reconstructed state in-memory (in your case the balance) and just keep applying the changes to the state as they come in and are committed.

As you’ve probably already discovered, there’s no persistent memory between zome calls; the only way you can persist anything is as a source chain entry, or in your front-end (e.g., GUI) somewhere. Either of those might be viable options…

Could you tell me more about the UTXO model you’re using? I’m only familiar with it in either a blockchain context, where there’s only one ledger to worry about, or an R3 Corda context, where each chain of UTXOs has its own history and is passed around peer-to-peer with each transaction (and validator pools are used to prevent double-spending).

As for having three simultaneous RPC calls, I presume you mean that they’re all called at a certain point and could see the same source chain state, which means that whichever call writes first will ‘win’ and the other two calls will see an incorrect state. Is that what you’re concerned about? If so, I don’t know how Holochain currently handles it, but with some refactors we’re working on, each zome call will run in its own transaction space. If call A starts, then call B starts, then call A modifies the source chain and finishes, then call B tries to do the same thing, call B will return an error, similar to trying to write to an SQL table that has a write lock on it. This prevents inconsistent state from concurrently calling functions that write to the source chain.

Hope this helps; ask as many questions and make as many corrections as you like!

ldwm · September 8, 2020, 4:01pm

Hi Paul,

thank you for your extensive reply! You’ve addressed my concerns. I’m indeed thinking about writing the balance state to prevent walking the chain on every request/transaction. For that to work I guess I need parallelism guarantees. What I get from your answer is that there is work underway to implement some form of atomic write locking.

Is there already some form of API for it that I can take into account? Or documentation?

Is it possible to get the length of the chain without iterating over every entry? With that number a preliminary write lock could be implemented until the refactoring you speak about is available.

Edit: I realized I can commit the chain entry index along with the Balance state and use hdk::api::query with a pagination offset from there.

pauldaoust · September 15, 2020, 5:35pm

Yes, you’d need atomic write locking on your source chain. But it wasn’t available until we announced the new version of Holochain today. (Note: no official releases, just something you can (easily) build yourself if you clone the repo.)

Here’s how it works with the new Holochain RSM. On every RPC call into your DNA, the runtime loads up a snapshot of your local state (source chain). Inside the RPC’d function body, any time you commit something to your chain, it stages it to a ‘workspace’ and advances the snapshotted source chain’s tip. Once the function completes, it runs all the subconscious stuff (mostly validation) and tries to write the staged changes. But if the source chain tip has advanced compared to the snapshot, that means that another function (running in parallel) has already modified local state. At this point the losing function fails and discards all of its writes, returning an error to the RPC caller.

So it’s not so exactly a lock, but the end effect is the same. I guess the big difference is that it’s a ‘late rollback’ – the function has to finish executing before it finds out that it lost the race against another function.

No API or documentation, really, just built-in functionality.

Unfortunately not in the old one; glad to hear you’ve found a solution… however, I encourage you to start taking a look at refactoring on RSM. Some hApp devs have been working with it for a couple months already, and they say it’s cut down their LOC count by 60%!

ldwm · September 15, 2020, 8:29pm

Thanks for your swift response after the blog post of the RSM announcement. Does this mean that the call will error to the client caller and that the client will have to retry? Or can the “locking error” be caught in the Rust runtime so the operation can transparently be retried after refreshing the snapshot?

pauldaoust · September 15, 2020, 9:38pm

That’s right; the call will pass the error back to the client caller. I wonder if it’d be possible for the client to specify call semantics like “Retry on write failure”… possibly an improvement for the future? I feel like any client-side frameworks with one-way data bindings will want to know that the call failed, so they can go around the loop to get the new internal state that was updated by some other function and try again based on that. I don’t know enough about all those modern newfangled client-side frameworks that people are using these days

ldwm · September 16, 2020, 6:36am

This sounds quite problematic to me; has it been tested what kind of write throughput the system can handle before running into write errors? From the perspective of an agent serving multiple web browser users…

can I (still?) rely on Holochain being able to handle hundreds of users performing write actions every second, or should we now rely on queueing middleware to keep retrying until writes succeed?
Or can Holochain now really be compiled to client-side WASM so every web browser user can actually run an agent?
What kind of write volume should we start accounting for?

guillemcordoba · September 16, 2020, 9:04am

Hi @ldwm, love this discussion. In which scenario you see an agent in holochain serving multiple web browser users? In holo, every user has a different Cell (combination of AgentPubKey+DnaHash, previously known as instance), so two users talking to the same conductor won’t have the conflict if committing in parallel since they will have two source chains.

Tell me if I’m off and I misunderstood your question.

ldwm · September 16, 2020, 3:31pm

I’m considering the case where I would run a Holochain node that accepts connections from visitors with web browsers that do not run Holochain. The protocol exposed through Holochain is “permissionless”, and so every “user” can send transactions to the node. This seems to me to be the lowest barrier of use possible. I would prefer casual “users” to not have to run a node, unless it could be done transparently in a browser.

guillemcordoba · September 16, 2020, 3:34pm

Hum okey I see, I think they are not designing for this case, they are designing for the case in which every user has a separate identity (and therefore, instance) as in Holo, so yes I think that would be problematic.

ldwm · September 16, 2020, 3:46pm

Thanks for the quick reply! I know that in regards to Holo that is the design but I was under the impression that at some point web browsers would also be targeted so ordinary visitors could also have an identity and chain, which is somewhat how Nano (the currency) works. Hope it’s still on the agenda or I’ll have to move to a more classical blockchain structure…

guillemcordoba · September 16, 2020, 3:49pm

Mmm if you are talking about non-signed in users, that’s a good question I don’t know the answer to… Signed in users will run fine though.

pauldaoust · September 16, 2020, 7:52pm

@ldwm hm, I’m chewing on the conversation so far… First of all, I might’ve misspoken re: races ending in the loser reporting an error to the caller. This is from Art’s Unpacking The New Holochain article published today:

We also guarantee that if multiple processes are attempting to write to the source chain, only the one to finish first will succeed, requiring the others to retry their validation on top of the newly updated local state.

(emphasis mine) I read this as saying that it will auto-retry the function call using the same params and a new state snapshot, but it’s not 100% clear to me.

While I agree with you that one instance serving multiple agents would be the “lowest barrier of use possible” (at least if you wanted to run your own infrastructure rather than using Holo Host), I also agree with Guillem that it’s going against the grain of Holochain’s design so there’s probably no plan to optimise for this use case. In the scenario you’re describing, the web host’s instance would still show up as one ‘user’ among many in Holochain’s one-user-one-instance ontology, and that user would speak to the rest of the app’s DHT on behalf of all its website visitors. Kinda like this:

It’s unconventional, but it’s been done before. As with the project described in the link (humm.earth), you’d need to layer another app-level concept of user-hood over top of Holochain’s primitive of representing a person with one or more agent keys.

With a Holo-hosted hApp, the HoloPort magically runs an instance for each web user, preserving the Holochain-ish ontology:

When the conductor eventually is compiled to WASM and runs in the browser, those little pink cells will move into the browsers, but the ontology of user agency isn’t changed – just the places where the cells are executing. At that point the HoloPorts will still be necessary, but only as machines that store the data that the browser-hosted cells produce. Hope that makes sense?

* Note: for non-signed-in users, hosts will run a read-only cell insatnce that can’t write data to its source chain. This is to allow web crawlers to crawl it or visitors to see what the app is like before signing up, etc. As soon as agency (the power to take action) is involved, that’s when the user has to sign up (which actually looks like creating a key pair in their browser, though they don’t know that).

ldwm · September 16, 2020, 8:38pm

Thanks again with your extensive and swift reply! I was hoping that automatic retry was considered, then there is hope! And you are definitely right that “deferred agency” is going against the grain and requires some duplication in signing and key infrastructure (like Javascript HD-wallet) but it’s luckily not too complex for my use case and well worth the effort of providing lowest-barrier access while running own infrastructure.

I’ve posted a request for clarification on the atomic write issue in the Typeform sheet for the upcoming AMA and might still ask somewhere else. Please let me know when you know more!

pauldaoust · September 16, 2020, 8:59pm

I just got an official answer from @maackle :

Currently it errors out to the caller. In the future we might add the ability to request a retry upon failure, but for more stringent transactions like in holofuel, we never want to retry if the chain has grown, so we just default to that most strict case for now

So I have to revise my response to “possibly but not yet”. Whoops, spoke too soon!

maackle · September 16, 2020, 11:56pm

I should also revise my response. What I described is our immediate-term plan. In current reality, we haven’t yet gracefully handled this exception, so the conductor crashes if the source chain changed during a zome call. It is a simple matter to turn this into an error that gets sent out to the caller, but we haven’t prioritized that yet. So for now, please avoid this case

Or help us fix it: Source chain "as-at" collision check should return nice error, not panic · Issue #361 · holochain/holochain · GitHub

pauldaoust · September 17, 2020, 4:52am

(PS it’s super easy to grok the Holochain RSM codebase now!)

ldwm · September 17, 2020, 8:40am

Thank you all for your responses! It’s clear now. So if I would want to continue with my use case, I would either:

have to patch the conductor to retry the request
write client middleware to retry the request
test the throughput and hope the node is so fast I would not run into parallel writes
wait until the HC engine can compile as a browser client
depend on HoloPort to run agents on behalf of web browser clients

Is there a software-only variant available that functions like the HoloPort? A variant of the conductor that maintains a Agent/chain on behalf of every user?

Edit: like the HoloPort NixOS linux image for example?

Edit2: I see here that the infrastructure for the use case is actually available. The only thing needed seems to be the Holo OS that’s specifically configured to only run my app and doesn’t expect payment.

pauldaoust · September 17, 2020, 3:12pm

Hey there. ~~Off work today but I’ll try to answer tomorrow!~~ I think yes, you’d have to implement some sort of retry logic. I think the new Holochain will be mostly fast enough (I was reading an internal discussion on benchmarking yesterday, and the new Holochain was doing in 14 seconds what the old Holochain was doing in 1 hour – though I don’t know if it was pushing concurrency; it might’ve been sequential writes.) I think you will need to think about concurrency and guard against race conditions though, regardless of how fast Holochain is.

To my mind the best place to do this is in your middleware (which I assume is running on the server). And the UI should probably also have an awareness of its internal state possibly getting out of date and needing a refresh from the server, and still possibly being out of date (exactly like when you git pull && git push – even then you’re not 100% guaranteed that someone else hasn’t advanced the branch tip in the split-second it took to sync). Either that, or your model shouldn’t care about it. (A good example of this is a chat app where the sequence of Alice and Bob’s messages on a single source chain don’t matter too much.)

Anyhow, I hope this helps – this is partly just me thinking out loud and discovering things as I go along

This is an interesting inquiry – I would like to see some software that allows a standalone server to serve multiple users with their DNAs hosted in-browser. The gossip would be proxied to the DHT through the server (although WebRTC is an interesting possibility), and the source chains would be persisted to the server’s storage (although localStorage is an interesting, albeit very un-robust, possibility). At this point you’ve replicated what Holo Host will look like in beta, but you’d be able to run it yourself.