I’ve had little experience with MapReduce but surely it would be possible to upload data to a DHT (ideally, a temporary DHT) to be processed by an arbitrary number of nodes, for many purposes. If the data can be processed entirely within a sandbox inside a hApp instance, I don’t see how you couldn’t validate it by setting a redundancy factor where multiple nodes process the same subset of data.
There will be a lot of underutilized computing power in the Holo network, this seems like a great opportunity for hosts to earn mutual credit for joining a grid.
I think the use case would be pretty tough considering the 4GB limit for in memory data in WASM. Also, it would be significantly less performant. Golem is an example of a company trying to accomplish this, and the practical needs for data processing are typically much more complicated. Also, typically there’s an expectation of privacy in your data. They are using a piece of hardware known as an SGX enclave to ensure the processor does not have the ability to view the users data.
One limit that I think has some other use cases is allowing zomes to take functions as inputs. So being able to provide a map-filter-reduce function to a zome might be interesting.
Anyways, I think it’s theoretically possible if a user wrote the logic into a public function, then uploaded the data to a DHT, then had a reward for each individual chunk being processed. Then I think you would take connected zomes, have a filter step to new entries, and lastly a reduce step to a final entry. I think the biggest issues would be performance and a lack of anonymity, but it would be interesting to see implemented.
This is a pretty cool idea. The big question I see the two of you discussing is how to make the thing generic enough to allow anyone to upload arbitrary functions to the DHT in a way that processors could download them, execute them, and produce verifiable results. Quite similar to what Golem is trying to tackle; I agree.
Here’s a thought: maybe the sandboxed execution engine could live in a client outside the DNA, and the DNA’s only function is to share functions and inputs with those who want to download and process them.
Another idea: maybe the DNA itself is the thing that processors download and execute. One DNA per job. The infrastructure is already there in the Holochain conductor. Joining the DNA means you get access to both the functions and their inputs, and entitles you to claim a reward. One way you could do this is to have 100% replication and have the processing actually happen in the validation function, so as you receive a dataset you automatically process. This is a horrible abuse of validation functions, but it’s an interesting idea nonetheless
@rlkel0 do you know how Golem does verification of outputs? Do job uploaders just say “I want n confidence that these results are correct” and then n processors run the job and compare results?
Golem provides a nice write up, but it basically recommends randomly verifying a few pieces of the computation, or paying multiple people to run computations to ensure same results. Then, they have reputation for computation engines.
@rlkel0 thanks for sharing that link; it’s kind of how I expected Golem to work — random checks, redundant computations. I’ll have to read about ‘reputation for computation engines’.
Yeah, unfortunately snarks/starks/bulletproofs etc are infeasible at this time. Same goes for homomorphic encryption. The question really becomes about encryption and trust. The reputation score seems important, and they’re doing some innovative things with sgx. A library called graphene allows for secure code execution, where the executor can’t examine the code they are executing. https://grapheneproject.io/
Abusing validation callbacks seems crazy, but it also seems rather elegant. I think you’d need the processor in the DNA to ensure it is deterministic. You’d also need to validate the incoming code to ensure it is purely functional so that it produces the exact same output for all nodes (e.g. no random number generators, uuids)
@rlkel0 Graphene sounds ambitious and crazy cool as a way of sandboxing execution and making it reliable. I’m not surprised Golem are interested. I haven’t yet found the part where they talk about the executor not being able to examine the code they’re executing though? Does that work in combination with enclaves?
@marcus thanks for your approval of my abusive-yet-elegant idea I think that the job executors would def want to have an execution environment that they can trust to be deterministic, so they aren’t blacklisted for computing the ‘wrong’ result when it’s cross-checked, but as yet the Holochain ribosome doesn’t enforce that on validation functions. It may one day, but I’m not sure what the core team’s plans are re: banning function calls that introduce sources of entropy.