Elemental Chat's weird bugs, eventual consistency, and you

pauldaoust · December 23, 2020, 4:57am

So, as I wrote in the (currently not-yet-published) Dev Pulse this week, Elemental Chat has some weird bugs that only an engineer can love. I am an engineer and I love these bugs, because (a) they tell me that Holochain is working exactly as it should, and (b) it exposes some interesting properties of Holochain’s eventually consistent DHT that are worth unpacking.

Here’s how the bug manifests:

Alice sends a message.
Alice’s message appears in Bob’s UI.
Alice’s message disappears from Bob’s UI.
Alice’s message reappears in Bob’s UI.

What’s happening here? Let’s take a look using a good ol’ sequence diagram (inputs and outputs have been simplified):

sequenceDiagram participant Alice UI participant Alice conductor participant DHT participant Bob conductor participant Bob UI Alice UI->>Alice conductor: create_message("Hello Bob!", "Welcome Channel") activate Alice conductor Alice conductor->>Alice conductor: create_entry("Hello Bob!") (write to source chain) Alice conductor->>Alice conductor: create_link("Welcome Channel", "Hello Bob!") (write to source chain) Alice conductor-->>Alice UI: Ok(message_data) deactivate Alice conductor Alice conductor-xDHT: publish message and link Alice UI->>Alice conductor: signal_chatters(message_data, "Welcome Channel") activate Alice conductor Alice conductor->>DHT: get recently active chatters activate DHT DHT-->>Alice conductor: Bob deactivate DHT Alice conductor-xBob conductor: remote_signal("Hello Bob!", "Welcome Channel") activate Bob conductor Bob conductor-xBob UI: emit_signal("Hello Bob!", "Welcome Channel") deactivate Bob conductor Bob UI->>Bob UI: (Add "Hello Bob!" to message timeline) Alice conductor-->>Alice UI: Ok(1 sent) deactivate Alice conductor Bob UI->>Bob UI: (polling interval) Bob UI->>Bob conductor: list_messages("Welcome Channel") activate Bob conductor Bob conductor->>Bob conductor: (look in DHT shard, see nothing) Bob conductor-->>Bob UI: Ok([]) deactivate Bob conductor Bob UI->>Bob UI: (re-render empty message timeline) DHT-xBob conductor: gossip message and link Bob UI->>Bob UI: (polling interval) Bob UI->>Bob conductor: list_messages("Welcome Channel") activate Bob conductor Bob conductor->>Bob conductor: (look in DHT shard, see new message) Bob conductor-->>Bob UI: Ok(["Hello Bob!"]) deactivate Bob conductor Bob UI->>Bob UI: (re-render message timeline with "Hello Bob!")

Questions

Can you see what’s going on here? Where is the message getting lost?
If you have UI dev experience, how do you think this design was arrived at?
What do you think would be a good design to make sure the messages Bob’s already received don’t disappear?
An eventually consistent DHT combined with live signals means there are certain times when, from Bob’s perspective, the message doesn’t actually exist but he knows about it anyway. What sorts of UX patterns might communicate the almost-but-not-yet nature of some of this data?
Would Alice appreciate knowing that Bob has seen her message? What would that sort of design look like, in terms of both DNA and UX?

abrahampalmer · December 23, 2020, 1:32pm

I like the idea that the UI has a cache for what it has created and sent to it’s conductor, retrieved, or received via a signal. Since we have all those nice hashes, it seems like we can benefit from keeping the UI in an assumed successful mode and not bothering the user with the details. “Updates” can come in that don’t result in any changes to currently displayed react components - no news is good news. If we need to add some extra state indication for critical processes, that still seems preferable to just an item vanishing and then reappearing.

pauldaoust · December 23, 2020, 4:12pm

I love this exploration @abrahampalmer, and similar thought occurred to me – they’re just hashes; why can’t we aggregate them? If something does fail, we can throw it away later. As a guy who’s passionate about UX, I like the idea of “extra state indication” rather than making things just disappear.

scott_pdx · December 24, 2020, 8:07pm

I’m not a coder, but like to think about things like this in the abstract… I wonder if this analogy would be useful:

We’re in a courtroom with a stenographer who types really slowly. The witness says, “There was a bullet hole in the east wall of the victim’s apartment.” A bit later, the judge asks the stenographer to read back what the witness said, and the stenographer says, “I’m sorry, Your Honor, I haven’t finished typing it yet, so I can’t fulfill your request to read it back.” The judge is annoyed and says, “Fine, then just tell my what you’re going to type, from memory.” “Oh, yes, I can do that…”

The problem is that there are two types of data – spoken and written – which correspond to short-term and long-term memory (cached and hashed?), and neither type is adequate for determining what was actually said. Both most be taken into account when presenting a best approximation of “truth”.

There is a gray area in the timeline between when the data is “spoken” and when it is written. In the courtroom, the written record is only referenced when the memory of what was spoken has faded. In a chat app, I imagine that the short-term memory (cached data) would be reliable at least until the app was closed and/or the network connection was lost.

mikeg · December 30, 2020, 10:59pm

Hi @pauldaoust, big fan of message sequence diagrams, although I suspect you need a number of them in a distributed system with variations at time-based steps (the polling loops in your diagram)… Sorry haven’t been keeping up with Elemental Chat, but I was wondering about the Alice-Bob direct message in the middle; I guess more efficient than Bob checking the DHT, but is this setting up a race condition (sounds like it is based on the behaviour you are seeing)? Feels like one step too far. Plus what is the “Bob” DHT->Alice conductor message in the middle… I assume get_recently_active_chatters is telling you exactly that, but what does “recent” mean?

pauldaoust · January 6, 2021, 10:01pm

@scott_pdx your analogy is beautiful and I want to steal it for a future blog post Your point about neither type being adequate for determining what was said didn’t make sense to me at first, but I think I understand it now: spoken is fresher and faster, while written is authoritative, but neither of them captures the ‘fullness of truth’ – at least until everyone has stopped talking and everything has been recorded. Is that what you’re saying?

@mikeg yes, you’re absolutely correct about polling loops and other sorts of concurrency throwing off the sequential-ness of my sequence diagram This one captures what one particular weird sequence might look like. It does set up a race condition, I think, and that’s why Bob sees messages disappear. Its main goal is to make the UI ‘feel’ responsive, so this direct-message-then-check-DHT approach is definitely not gonna be appropriate for every application. You don’t want to confirm a winning bid on an auction, for instance, until you see it accepted by a quorum of DHT peers.

As for get_recently_active_chatters, I haven’t delved into the source code yet. I believe it breaks up recency into blocks of time and creates an anchor for each block of time? And then any node who’s currently online publishes ‘heartbeat’ links from the current block to their own agent ID? Sounds like a lot of DHT data being generated, but this is after all just a demo app and I’m sure there are better approaches, like having peers directly keep track of the people they’re interested in and heartbeating each other via signal_remote, with last-seen times stored in the UI or something. Longing for the day when we might see temporary (that is non-source-chain-linked, non-authoritative, delete-able) data on the DHT!

scott_pdx · January 6, 2021, 11:44pm

@pauldaoust - You can’t steal my analogy because I gifted it to the public domain. Your summary goes a bit beyond what I was saying but I agree with you. My main point was that in a chat app, what is being said “now” vs what has slipped into “the past” are separate phenomena that get treated in the UI as if they were only one. (This is why there is always that awkward moment when you hit send and the time metadata display says something like “just now” because it feels “too soon” to describe what was just said the same as what has become “history”.) If we aim to mimic “real life” in our computing models, we need to consider this present/past distinction more carefully.

This could be an excuse to go down the rabbit hole of “What is consciousness?” relative to time – present, past, future. I have an inkling that Rupert Sheldrake’s concept of memory could inspire new data storage and retrieval paradigms. As I understand it, Sheldrake believes that our memories are NOT stored in our brains like data in a hard drive or RAM. Rather, the act of remembering involves what amounts to “space-time teleportation” (my words, not Sheldrake’s), i.e., when you remember something, your brain/mind somehow brings your awareness back to the point in space-time where/when it happened, via what he calls morphic resonance. So you are literally re-viewing the event. I can see how this is similar in some ways to concept of decentralization in computer science.

Shedrake was inspired by the 19th-century French philosopher Henri Bergson, who explored some of these ideas in Matter and Memory. There is a section of the wikipedia summary of that book that is quite interesting to consider alongside the issue we’ve been discussing:

To have or take consciousness of anything, means looking at it from the viewpoint of the past, in light of the past. Contenting oneself with reacting to an external stimulus means being unconscious of the act…