Where are private keys and data stored if you're using Holo Host?

AdriaanB · October 1, 2019, 10:21pm

Thanks a lot for the deep answer and resources. I am going to check them out.

I think the capability token discussion and validating encrypted data got a bit mixed, but that is fine.

Following use-case:

MyApp is web-based and only accessible through a browser (thus Holo).
Grandma creates an account and needs to store her private keys & data somewhere.

Question: Where is this data stored? I thought the HoloVault would act as the local source-chain in this case, but could be wrong.

A capability token to access this ‘vault’ would open up a whole bunch of GDPR-complaint use-cases.

Thanks a lot, I am still trying to figure out how startups could make new business models possible with Holo and Holochain.

Thanks a lot for the in-depth answers and excuse me for my ignorance

pauldaoust · October 2, 2019, 5:30pm

Hello @AdriaanB — I forked this off to a new discussion because it felt like a separate topic; hope you’re okay with that. If not, let me know and I’ll re-merge.

Holo Host is a weird situation. Sovereignty is the big selling point of Holochain, so we want Grandma to preserve her sovereignty. This is all about who has access to her private key, because that’s the key to her source chain. That’s easy with Holochain proper, because her private key lives on her device. But with Holo Host, if we let a host store her private key, it could impersonate her. So what do we do?

The browser generates the private key and keeps it in memory. When the tab is closed, the private key ceases to exist. So to answer half of your question, the private key is not stored anywhere.

But wait! With all those private keys being generated, how do we maintain a consistent identity? Easy: the key is always the same every time it’s generated, using a ‘seed’ made out of:

Grandma’s email address
Grandma’s password (which never gets transmitted over the wire)
A public ‘salt’ value provided by Holo’s semi-centralised infrastructure

Read more in this dense but interesting article about Holochain, Holo, and DPKI.

Now for the second half of your question: where is data stored?

It’s stored on the host — in alpha, this’ll be one host per Grandma; in beta it’ll be five hosts per Grandma. Public data doesn’t need to be encrypted (and can’t, because the peers in that app need it), but private data is tricky. The host technically doesn’t need to be able to read it; only Grandma needs to. So I think the Holo dev team is exploring at-rest encryption for private entries.

Each app stores its own private data on its own source chain per agent, whether Holo or Holochain. As for HoloVault (now called Personals & Profiles), it’s just another app with its own source chain — it’s just a handy way to store data.

I think capability tokens are a perfect way to access private data. Consider a medical records app:

Healthcare providers and patients both live in the same public DHT. Dr Alice wants Bob’s medical records from a previous doctor. Here’s how it could work:

Dr Alice asks Bob for a certain set of medical records.
Bob creates a capability grant for the specific records she wants, then stories it as a private entry on his source chain. He shares the hash of the grant entry with Dr Alice; this is her capability token.
Dr Alice stores this token on her source chain as a capability claim. Whenever she needs to, she requests records from Bob and presents the token to him.
Bob checks the token against the privileges he granted to her. If all checks out, he retrieves the records and sends them back to Dr Alice.

This happens through node-to-node messaging, which means that both Dr Alice and Bob need to be online. I suppose you could do it async via encrypted public entries too.

AdriaanB · October 3, 2019, 5:40pm

Thanks a lot Paul! This clears things up a lot.

pauldaoust · October 3, 2019, 8:59pm

No prob. I’ve done a write-up on Capability Delegation ; if you have the time or inclination, let me know if it is easy to understand. Thanks!

nphias · January 15, 2020, 12:01pm

Hi @pauldaoust , myself and others were wondering about the holovault… we thought that the personas zome would store data privately in the source chain somewhere… but it turns out that in the code it doesnt (public DHT)… what is the significance of the holovault?.. is there an code? … or was it just an idea floating around?

pauldaoust · January 15, 2020, 5:37pm

Huh, this is a surprise to me too! It was my understanding that the personas/profiles DNA had everyone store their own data on their own source chain, and sharing was handled by node-to-node messaging (with agent node lookups handled by the DHT), similar to how I illustrated a theoretical medical records app above. @philipbeadle @artbrock can you shed some light on this?

Incidentally, this is separate from the above conversation about where keys and data are stored on Holo Host, although it does relate in one small way: I may have been wrong about the at-rest encryption. It might be the case that hosts will need to encrypt it with their own keys, so they can do their duty to participate in some automated processes on behalf of the hosted users. That data exposure risk is with fiduciary responsibility and legal agreements, etc.

tats_sato · February 13, 2020, 3:20am

Hi @pauldaoust ! Been wondering about how Holo deals with private key and source chain so glad I found this thread

I did not quite get this part and got a couple of questions

Does it mean host will encrypt the source chain of the agent with the agent’s public key and send it over to the browser of the agent so that s/he may decrypt it with his/her private key? Or are hosts going to encrypt the source chain with their own public key? In that case, I don’t quite understand the entire process of it.
What actually is the “automated process” that the host will be doing on behalf of the hosted enduser and what is it for?
If my understanding is correct, source chains are encrypted with the public key of the owner of that chain and so doesn’t that count as at-rest encryption since private key of the owner is needed for entries inside source chains to be read?

Really great that private key is not stored anywhere! Just a question though, after private key is regenerated inside the browser, what is the exact flow of data in order for lets say Alice to access her source chain stored in the holoport? Is Alice’s private key sent in anyway to the host? (I hope not) If not then my hunch is that the host will send the encrypted source chain of Alice so that Alice can decrypt it with her private key and do bunch of things with it then after Alice is done manipulating her source chain, she’ll encrypt it again with her public key and return it to the host for storage but I think Im wrong and so I hope you can shed light on this matter. Thank you!

pauldaoust · February 13, 2020, 6:39pm

Hey @tats_sato great questions. The short version:

All entries are stored on the host.
If they’re encrypted at rest, they’re encrypted by the host’s private key, not the agent’s.
Alice accesses her private data by calling zome functions on the host, which have access to her source chain. The host sends the function’s return value to her in the clear (but TLS encrypted, of course).
Whenever the host wants to write an entry/header pair to Alice’s source chain, it has to ask her to sign them.
The above three points mean that the data can stay on the host, but Alice’s private key never leaves her devices.
Because the host has access to Alice’s private entries, it has to be prevented from doing bad things by law rather than by technology.
This might change in the future, although the host will always have to be able to read Alice’s headers and public entries (that means the host is assumed to be a trusted member of the DNA along with Alice).

Now the long version:

I don’t quite understand the exact automated processes that Art is thinking the host will need to do on behalf of the user, so these will all be guesses. (Maybe if @artbrock has the time, he can set the record straight ) Here’s all I can think of:

Validating others’ entries
Sharing public entries and headers to other nodes
Responding to direct messages in ways that potentially access private entries

Of those three, the host doesn’t need access to their hosted agents’ private entries, so they can be safely encrypted by the agent’s private key. The only thing that might need access to private entries is the last one, which means Art must be thinking that the agent’s instance is ‘running’ even when the user isn’t interacting with it via the UI. Personally I’d be okay if it required the user to have the UI open in order to decrypt direct messages. However, that means the private entry will be sent back to the host in clear text anyway (so they can forward it to the DHT agent that requested it), which eliminates any advantages of encrypting it with the agent’s private key at rest.

So I would guess that the host encrypts the agent’s private data with the host’s special key, which doesn’t prevent the host from seeing it but does prevent thieves from accessing it (if the host’s owner unplugs the USB drive). You can see that this is a tricky problem, which might be best governed by law rather than technology.

I can see something that might work: encrypt private entries with the agent’s private key, so that when they’re stored on the host they’re inaccessible. Then the receive callback is executed on the browser, not the host, and all calls that retrieve the user’s private entries just return the entries encrypted, and the browser decrypts them and encrypts the receive callback’s return value with the requestor’s public key. This way the host would never be able to access the private entries or data based on them. But all this is pure imagination on my part and not on any roadmap (AFAIK).

I don’t know what the exact roadmap plans are, but I do know that at one point the plan was to have all the Holochain and zome code executed in the browser, except for validation functions. I think this would allow the agent to encrypt all of their private entries with their private key and would permit the above scenario I dreamed up.

tats_sato · February 14, 2020, 10:00am

Hi @pauldaoust! appreciate your detailed response.

pauldaoust:

All entries are stored on the host.

If they’re encrypted at rest, they’re encrypted by the host’s private key, not the agent’s.

Alice accesses her private data by calling zome functions on the host, which have access to her source chain. The host sends the function’s return value to her in the clear (but TLS encrypted, of course).

Whenever the host wants to write an entry/header pair to Alice’s source chain, it has to ask her to sign them.

The above three points mean that the data can stay on the host, but Alice’s private key never leaves her devices.

Because the host has access to Alice’s private entries, it has to be prevented from doing bad things by law rather than by technology.

This might change in the future, although the host will always have to be able to read Alice’s headers and public entries (that means the host is assumed to be a trusted member of the DNA along with Alice).

Made an ultra-simple sequence diagram for this just so Im sure I understood you correctly.

Several questions I have from this part.

In the developer pulse 62, it said,

Each web user is assigned to multiple redundant HoloPorts, distributed across the globe rather than concentrated in a few data centers owned by one company.

I also heard from @guillemcordoba (sorry to pull you in here and please correct me if I’m wrong) that each source chain has a redundancy factor of 5 on Holo Network. The redundancy factor might change in the future but this means that multiple holoport owners will get at least a read access to an agent’s source chain right? I just see a challenge in enforcing the law when your private data is held by 5 hosts living in 5 different countries. Im excited to see how holo will solve this challenge!

In the core concept 03, there was a portion that said,

When the DNA wants to create an entry for you, it first validates its content according to the rules defined for its type. This protects you from accidentally producing bad data.

It then asks your conductor to sign the entry with your private key.

Your conductor adds the signature to a header and attaches it to the entry.

Your conductor saves the entry as the next item in your source chain.

Just want to ask if the process no2 happens a little bit differently in Holo? When the DNA asks the conductor (running in HoloPort), will the conductor send the entry to Alice so that she can sign it then send the signed entry back to the HoloPort?

This is also exactly my dream!! With the short version you have described, I can’t avoid but see the challenge of communicating this part to the user of happ hosted on Holo network. Convincing the end user of happs hosted on holo with something like “Your data is private but x number of other people who will host your source chain can also read it! But don’t worry the law is there for you if they do something bad!” seems to be a real challenge for me. It would be a dream come true if hosts will only store private source chains that are encrypted with the agent public key so that even it is hosted on another computer, we can guarantee users of happs hosted on holo that their private data only belongs to them and no one else can read/write on it without your private key. so my final question is,

Does Holo team see the ability of the host to read private source chain of agents they are hosting for a problem that must be solve? I just wanna know if the team has any intention to prevent hosts to read agent’s source chain down the line regardless of when it will happen.

I believe this is a critical topic especially if developers are intending to (just like we are intending to) create a private and secured p2p communication application on top of Holochain and Holo Network. It’s a challenge to me to claim that a chat application is secured and private when literally even messages you sent to yourself or metadata of your profile can be read by someone else. That’s even when we can say that this someone can be trusted and verified, because that doesn’t really change the fact that they can read your data.

Sorry for a long post and I’m aware that some of the questions I asked need answers directly from the Holo team so I hope we can hear from them in anyway.

premjeet · February 14, 2020, 3:51pm

I can’t understand the need of replicating each source chain to 5 different hosts in the network, why? Though the header of each source chain is already in the network, and by which the entire source chain can be retrieved as the previous entries are inter-linked. So, finding the header is sufficient to access the source chain and that can be saved at the user-end. Please clarify me.

tats_sato · February 15, 2020, 7:28am

I am just making a wild guess so don’t quote me on this one but I guess the source chain needs redundancy because if not, the end user will have no way to access his/her source chain if that one holoport goes offline for various reasons (accidentally unplugged, turned off, etc). But again, not really entirely sure as well

nzharry · February 15, 2020, 10:01am

After reading through this thread, I had the exact same reaction to this.

@pauldaoust you mentioned a couple of times it may be necessary to rely on law rather than technology to ensure an agent’s private data is kept safe from their hosts. I can’t see how someone who has an app that deals in sensitive data would be comfortable building on Holochain if it’s not possible to ensure the security of their user’s data. As this is a distributed network, most end users won’t know their hosts, so expecting them to trust them surely isn’t an option.

@pauldaoust some clarity on this point would be great as it seems fundamental to the viability of Holo. Thank you

gjones617 · February 16, 2020, 7:50pm

This is very interesting (and helpful info/ reply; thank you). It brought to mind the idea of “Zero-knowledge” or “zk” proofs, as they call em - a concept which I find fascinating… sort of “message within a message;” with which I think the possibilities are great, especially within the realm of passwords/ keys…

Any chance something like this might be implemented into Holo/ holochain?

HoloFuture · February 18, 2020, 8:26pm

Paul seems to not know exactly what the answer is here. But via the quote below it appears the team was on the path of allowing everything to be encrypted/decrypted via the browser and seperate from hosts. This is very likely the path the team went down. Further clarification would obviously be nice from the team that’s actually working on this:

"I can see something that might work: encrypt private entries with the agent’s private key, so that when they’re stored on the host they’re inaccessible. Then the receive callback is executed on the browser , not the host, and all calls that retrieve the user’s private entries just return the entries encrypted, and the browser decrypts them and encrypts the receive callback’s return value with the requestor’s public key. This way the host would never be able to access the private entries or data based on them. But all this is pure imagination on my part and not on any roadmap (AFAIK).

I don’t know what the exact roadmap plans are, but I do know that at one point the plan was to have all the Holochain and zome code executed in the browser, except for validation functions. I think this would allow the agent to encrypt all of their private entries with their private key and would permit the above scenario I dreamed up."

tats_sato · February 19, 2020, 5:51am

I really do hope that this is the path we are on. This question has been in our team’s mind the past week since it’s so critical haha. Hopefully we can get an answer from the team soon despite the busy schedule

gjones617 · February 19, 2020, 2:42pm

(from above) “Because the host has access to Alice’s private entries, it has to be prevented from doing bad things by law rather than by technology.” -pauldaoust

I’m a bit confused here. I understand how a Host might be able to read an agent’s data that he or she is hosting; but, seeing as the agent is the only one with access to the private key, the Host should not, and ostensibly is not capable of writing or changing agents’ data, right??

AdriaanB · February 19, 2020, 3:43pm

“Whenever the host wants to write an entry/header pair to Alice’s source chain, it has to ask her to sign them.”

This is the answer to your question, right?

gjones617 · February 19, 2020, 7:43pm

Ah, okay, thank you !

pauldaoust · February 20, 2020, 11:35pm

I’m starting to suspect I got some facts wrong, based on a conversation @artbrock started with me that sounds like it’s referring to this very forum thread. I’ll get clarification from him and then hopefully I can explain a bit better (unless you want to chime in directly @artbrock !)

But @gjones617 yes, you are absolutely correct about:

as @AdriaanB confirms. @GraceR put it very eloquently on Twitter during a discussion on the merits of the Cryptographic Autonomy License:

Conceptually, this is fairly simple. I am saying “this”. I know I said “this” and nobody else can say “this”, pretending it was me. Theoretically if twitter holds my keys, they could change “this” and it would seem I had said “that”

Later twitter could revoke my access to “this” and I wouldn’t be able to digital have the memory or proof of having said “this”. CAL basically means that only I could have said “this” and that I have the right to always remember and show proof that I I said “this”.

BTW, this is why I say it’s “data assault” rather than “data theft”. Someone is tampering with my ability to “remember” what I said. If I know my memory is failing and I want to write them down to remember them, this becomes a kind of extension of my brain function.

So regardless of whether you’re using Holochain natively or via a Holo-hosted app, you should always have the exclusive power to sign your own entries.

pauldaoust · February 20, 2020, 11:51pm

sorry, sometimes I think I’m clarifying but I’m actually just muddying