Understanding the Rust-WASM DNA/Zome runtime semantics

PekkaNikander · December 2, 2019, 10:47am

[ My apologies if this is off-topic here. I’m a newcomer to holochain. ]

I am struggling to understand how the Rust code for a Zome is converted into WASM structures and how then WASM communicates with the rest of the world. I’ve found bits and pieces of this here and there, but so far no comprehensive explanation. Below is my current take of this, with some questions.

1. Rust macros

There seems to be two layers of Rust macros, those in HDK and those in HDKv2

If I understand those correctly, first HDKv2 adds to a [#zome] module a struct ZomeCodeDef:

pub struct ZomeCodeDef {
    pub init: InitCallback,
    pub validate_agent: ValidateAgentCallback,
    pub zome_fns: ZomeFunctions,
    pub entry_def_fns: Vec<syn::ItemFn>,
    pub traits: ZomeTraits,
    pub receive_callback: Option<ReceiveCallback>,
    pub extra: Vec<syn::Item>,
}

Then, at the HDK level, some of these apparently get converted into pub extern "C" functions, cf. init.

Am I right? Or are HDK and HDKv2 completely separate and independent? If that is the case, how does then the external world talk with the HDKv2 callbacks?

2. Glueing the external world to the WASM runtime

Anyway, at some point the Zome definitions need to provide the WASM runtime a set of functions it can call. The following the code in th HDK, for init this seems to define a public extern "C" fn receive, which apparently gets called by the underlying runtime when initialising the Zome.

I’m including a simplified version of the unexpanded receive here:

pub extern "C" fn receive(
  allocation_of_input: RibosomeEncodingBits) -> RibosomeEncodingBits 
{
   let init = init_global_memory(allocation_of_input);
   fn execute() -> Result<(), String> {
      $init_expr
   }
   match execute() {
      Ok(_) => Success.into(),
      Err(e) => …
   }
}

Hence, the code simply seems call the locally defined execute (which in this case is expanded from the $init_expr part of the macro), and either return Ok or Err encoded into RibosomeEncodingBits. Elsewhere there seems to be more encoding and decoding going on.

I presume the same pattern is followed everywhere when the runtime calls the WASM code:

Decode RibosomeEncodingBits into JSON (or something else)
Call a locally defined execute
Encode result into RibosomeEncodingBits

Is that right?

Now, what I haven’t found yet is how the WASM code calls the runtime, nor the runtime side implementation of the encoding and decoding. Any pointers to that?

pauldaoust · December 3, 2019, 7:19pm

Hello hello! This is perfectly inline with what this category is for — nice to see it getting some use

For those who are new to the discussion, WebAssembly only has two ways for talking with the outside world: bytes and memory pointers. When an external (host) function wants to call a function that your WASM code exposes (as with the init callback in @PekkaNikander’s example above), it can either:

pass it a bunch of bytes, or
poke all the input data into the WASM virtual machine’s memory space, then pass a pointer that says where to find the input data.

When the WASM function returns a value, it has to do one of those things. This is also what happens when a WASM function wants to call a host function, but in reverse. In both cases, the host code and the WASM code are responsible for making sense of those bytes and/or pulling them out of shared WASM VM memory space.

This is about to change though — or at least get a bit easier.

I don’t know the ins-and-outs of the HDKs, but I can tell you a few things:

The HDK and the HDKv2 are complementary — the v2 was the one we always wanted to build, but when we started building it, Rust’s macro tools were pretty limited. They’re feature-equivalent, although that’ll become less true over time because the original HDK is deprecated. It’s my understanding that they both get turned into a ZomeCodeDef at the end, but v2 is just nicer to use.
Your understanding of what happens when the runtime calls a WASM callback is exactly right, with a couple additions:
1. Pull bytes from memory; Decode byes to JSON
2. Cast JSON into native types to match local function’s input params
3. Call a locally defined execute
4. Encode result into JSON, and then bytes, and put bytes into memory
Those wrapper functions are then exposed to the host via a pub extern "C" fn (why the "C", I don’t know).
A WASM host can expose its own functions to WASM code via a foreign function interface. The Rust compiler lets you import this FFI using an extern block. That’s how the DNA code calls the Holochain API functions. Again, the HDKs define wrapper functions that shadow the actual low-level API functions, handling the memory pointer and JSON (de-)serialisation / casting stuff.
A strange thing happens during DNA compilation. The zome code actually gets compiled first, then interrogated for some metadata that needs to end up in the DNA’s JSON (function exports, entry types, etc). I don’t know if it gets compiled twice — once to machine code for the first pass, then again to WASM for the actual finished product.