The current state of Holo/Holochain is that Holochain stores all private data on the source chain and hence the private data never leaves the author’s device. Holo on the other hand stores the so-called “private” data on some remote host(s)'s storage.
The public data on both Holo and Holochain is replicated because of two main reasons:
1) Fault tolerance
2) Efficient and fast access for those who may ask for it
Because of 2, the public data’s replication strategy acts more like a conventional CDN.
This leads us to the conclusion that the intent of public data is for fast and reliable access for the “public”.
Here starts the discord!
The private data on Holochain serves two purposes, viz. privacy and fault tolerance, albeit the responsibility of fault-tolerance is on the user himself; the data is at the whims of its author and is safe to the same extent.
However private data on Holo serves… well… no purpose whatsoever! It may be fault-tolerant by being replicated to multiple (5 or so) nodes, but is anything but private! And suggesting our future customers that it is, solely on the basis of a few KYC documents the hosts had had to submit is downright insane! Here comes the beautiful part. The private data on Holo does serve one purpose, i.e. fault-tolerance. However, unlike public data, the intent for private data’s replication is not that of a CDN. It is replicated literally for being able to withstand a few copies getting lost in some catastrophe!
Now here’s the improvement proposal.
On both Holo and Holochain, we maintain three distinct classes of data, different both due to differences in intent and functionality.
- Public: Intents that of a classical CDN; replicated to as many as may be needed, depending upon the load. If nobody’s asking for it, it’s replicated to 3 or so nodes. If a thousand people demand for that data every day, then it’s replicated to some 50 or so nodes… you get the idea… It’s all over the DHT.
- Private: Intents of fault-tolerance; replicated to some 3 or so nodes.
- Protected: Again, intents that of fault-tolerance; replicated to 3 or so nodes; encrypted. Even zome code cannot read that “data” field inside it. It’s like a black box! And by encrypted, it means “encrypted as hell”. Even a thousand quantum computers shouldn’t be able to decipher it; it’s already known that symmetric encryption is quantum resistant. Only public-private cryptography is at risk. So as long as your password isn’t compromised, you can rest assured that your “protected” data on Holo wouldn’t either.
In action, it would look something like this:
Hash(“top secret data”, password, salt) -> gibberish + salt.
This gibberish + salt is your protected_data.
Operating on this protected_data type issues compiler errors (thanks to Rust).
However, data that can’t be operated on is of little use. Hence not everything should go into the protected_data. A diligent programmer would store, let’s say the time_of_message_send beside the protected data, not “inside” it. Only the message would go inside. This allows us to… let’s say… search for messages sent on Halloween. The zome code would be able to do that; client code wouldn’t have to get all messages, decipher them, then filter out those sent on Halloween! This requires having another field in the “protected” entries… let’s say “etc” in which you can store operable stuff like time_of_send.
This ciphering/deciphering shouldn’t happen on the front-end code, especially close-sourced front-end. Chapperone, if I understand correctly, should be delegated with ciphering with a random salt + password, and deciphering with the password + the given salt that it was ciphered with.
Needless to say, all this should already be achievable if the programmer follows some naming convention of let’s say appending _protected after every encrypted field, and by skipping Chapperone and rather doing login in the front-end code. However, I believe it’s gonna become a rather frequent design pattern and should better be abstracted out. Doing so would not only increase development speed but also user-trust in that their protected data will remain protected. Might even ditch the excuse of KYC!!!
- The A Man