I started a branch that replaces wasmi with wasmer
i’ll keep trucking through the technicals until i get all the tests passing but i thought i’d open the meta process up for visibility and discussion
pros
starting with the motivation for why we do want to move towards wasmer…
better performance
wasmer promises ~2 orders of magnitude faster execution of wasm logic
The wasmi runtime is interpreted and we measured it to perform between 150–190x native speeds. It was therefore omitted from the charts as the large difference made the graphics difficult to view the smaller runtime values.
how much of this translates to what we are doing (a lot of logic sits in the rust core, outside wasm anyway) remains to be seen, but i’m looking forward to running the profiler once it is executing wasm correctly
simpler codebase
wasmer offers the use of native rust closures and better macros for “importing” rust functions and access to core more generally into wasm functions that happ developers can use
importing functions is what makes it possible for a happ developer to call a “commit entry” function inside wasm and have that run rust functions outside wasm
with wasmer we should be able to more easily capture the execution context that holochain is running at the point that we import all the wasm functions simply by using lexical scoping - wasmi on the other hand is based more on structs and trait implementations.
the wasmi code works OK but it pushes us to wrap several abstractions around what we are doing (let a happ call a core function), that make it much harder for new developers who want to contribute to understand what is happening in the code (i expect to delete several thousand lines of code and tests by the time i finish porting to wasmer)
to be fair, some of the simplifications could have been done anyway but it all seems more straightfoward with the wasmer tooling
more standard
wasmer is maintained by a dedicated organisation that exists to push wasm into many ecosystems in a standardised, high performance and ergonomic way
they maintain runtimes for go, c, java, rust and c# and a dedicated wasm package repository for wasm-native reusable code
wasmi is maintained by parity, a company dedicated to building blockchain technology (e.g. on ethereum and polkadot)
the focus, funding and ongoing development of wasmi are ultimately driven by the target to have wasm running “on blockchain” (whatever that means long term)
cons
looking at what we would be losing (and if/how this might be relevant)
determinism
the main reason wasmi exists at all is because of “non-determinism” in all other wasm implementations
i haven’t found a clear spec or outline explaining comprehensively all the non determinism wasmi has identified and explicitly addresses that other wasm engines do not, but my research has identified a few high level areas.
deterministic wasm execution return values
this is the most obvious type of non-determinism that people think of, and the wasm specification outlines known sources of non-determinism in any implementation that follows the spec (as far as i know, wasmer follows the spec)
- https://github.com/WebAssembly/design/blob/master/Nondeterminism.md
- Feature suggestion: strict "deterministic mode" · Issue #582 · WebAssembly/design · GitHub
intuitively we can think of this as “1 + 1 always equals 2, right?”
and luckily “1 + 1 = 2” is indeed always true but there are sources of nondeterminism listed
- different features in different wasm versions: this essentially boils down to the same problem as any other breaking API change in the conductor or HDK…
- threading (future feature): generally i don’t think it makes sense to include code that leads to the type of non-determinism/concurrency that threads introduce into the places (e.g. validation logic) that non-determinism is most dangerous in holochain, also this is a future problem as sane threading models in wasm aren’t really “a thing” yet
-
NaN
handling is non-deterministic in how the bits of theNaN
are handled, which also means that doing things likeif maybe_nan > 0 { ... }
are probably non-deterministic either: this really needs to be made clear to happ developers and some best practises established but it seems (probably) entirely manageable with native rust functions likef32.is_sign_positive()
f32 - Rust or our own equivalents in the HDK - fixed width SIMD has nondeterminism: at this stage i don’t think this impacts anybody, may be a longer term consideration somehow
- environment resources can run out: e.g. memory could be used up on one machine where it would not on another… i don’t see wasmi solving this either as the wasm spec allows for up to 4GB of ram to be allocated per linear memory, memory usage is a combination of core and happ planning
- any other non-determinism in the language that compiles down to wasm: e.g. something in Rust that is not deterministic is not going to be fixed by wasmi OR wasmer
so it’s not clear to me how much of this wasmi actually solves…
wasmi can’t fix higher level language concerns, it can’t prevent resource exhaustion, changing wasm features, concurrency concerns from threading…
potentially it could define some NaN
and SIMD
behaviour that is deterministic but it’s not a silver bullet and both of these cases should be manageable in the happ zome layer
deterministic VM etc.
This post explains in length the thinking behind wasmi here.
A lot of it boils down to what is needed to safely superimpose WASM on top of a blockchain.
The determinism discussed here talks about complexities from targetting different architectures and the ad-hoc optimisations introduced by JIT compilation.
All of this is a problem because on a blockchain nodes are not able to opt-out of executing malicious code.
So this means that a single “compiler bomb” could bring an entire blockchain to its knees in one nasty black swan event.
V8 and SpiderMonkey are not just theoretically nonlinear: real-world “compiler bombs” (pieces of code that cause the compiler to take an exponentially long amount of time) have been found and there is no reason to believe that even if they are fixed that more will not be found in the future.
In the holochain world I don’t see this as an issue as every happ has an isolated/dedicated DHT/network and every user is free to participate or not participate in running every WASM (zome).
A compiler bomb could certainly be coded into a WASM and exist in the world but it seems impossible to cause users to suddenly start running it, almost by definition.
There may be a concern for delegated node execution (a.k.a. holoports) if users can simply force ports to run things arbitrarily, but this general problem is neither introduced not exacerbated by the potential for compiler bombs as a happ developer can much more easily write malicious code straight into the WASM and deploy that.
Deterministic execution (gas) cost
Blockchains charge fees for their usage, the only blockchain widely used that seriously considers WASM is the one wasmi was designed for: Ethereum.
Ethereum has the concept of “gas cost” that MUST line up 1:1 with the actual execution cost forwarded to the end-user, at risk of potentially critical security vulnerabilities or scalability problems.
Holo might want metering to be as close as possible to real execution costs for obvious reasons (raising holofuel invoices) but it’s not so critical as in the ETH world (e.g. there are no time-bound global blocks with gas limits to be managed). In practise it seems to me like allowing 100x performance optimisations is far more significant than bean counting CPU cycles, even in the Holo context - Holoports should get faster/better as code and hardware improves over time, not treat CPU as a shared/scarce resource.
If i’m wrong about this, it would seem to be an argument for supporting multiple WASM backends rather than enforcing the lowest common denominator. IMO there is no reason or justification to force holochain conductors run natively by end-users to run 100x slower simply because holoports have some domain-specific metering concern.
cost of change
cost to happ devs
it’s important to understand whether we expect changes to the behaviour of existing WASM code
in short, other than the specific determinism issues outlined above there should not be any differences in WASM execution
both wasmi and wasmer are implementing the same WASM spec
swapping out one for the other 1:1 without changing any underlying core workflows that handle the imported AP functions should not change anything from the perspective of the WASM
the main thing that could cause some change that might actually impact an existing WASM is the handling of floats (because integers don’t support NaN
there is only an opening for mistakes when floats are used)
one thing to note is that we don’t actually support floats as arguments to/from WASM functions natively because there is no From
implementation between floats and JsonString
- holochain-serialization/json.rs at develop · holochain/holochain-serialization · GitHub
that automatically mitigates a lot of potential problems as happ developers right now cannot accept or return floats without implementing custom serialization logic for it, so i’d expect that most people are just working with integers at the moment
cost to core devs
the other cost of change is the work of doing the refactor to convert between wasmi/wasmer
realistically though, if the simplifications to the code that i’m expecting/hoping for do materialise, the conversion should pay for itself relatively quickly by making the code easier to work with
the exception to this is if we want to try and support wasmi and wasmer in parallel right now
i don’t feel confident providing an API wrapper (like we have for networking and persistence) that adequately wraps both wasmer and wasmi, while still achieving the goal of code simplification - at least not right now, not in a reasonable timeframe in context of everything else that needs doing
for that reason i’m presenting this as an either/or scenario, if we want to move forward with wasmer i think we will need to drop wasmi for the short-mid term and only re-introduce it if we can show it is critical to resolve a well defined problem (e.g. if we find we need better CPU metering on holoports or something…)