Questions about DHT, cryptography, and security

Sol · August 21, 2019, 3:23am

Hi @pauldaoust continuation of a question on if an agent change their DNA, what will happen? You gave a list of steps:

" Here’s what it might look like:

Alice hacks her Holochain conductor to report legit DNA IDs but secretly use a different set of rules.
She publishes the right DNA ID as her first entry, followed by her agent ID.
Other agents running that DnA ID accept her as part of the network.
Alice plays it cool for a while, producing valid entries.
After gaining her peers’ trust, she commits fraud, which shows up as an invalid entry.
The DHT peer(s) responsible for storing that entry notice that it’s invalid, add Alice to their network blocklist, and publish the warrant to their neighbours.
The neighbours store the warrant and also add Alice to their blocklists.
Whenever Alice tries to interact with anybody else (transactions, requests for data, etc), they look for outstanding warrants, discover the warrant, and add Alice to their blocklists.
Alice now inhabits a lonely gossip network with one member."

There was a tech AMA on Holo TG and Philip beadle seems to have a different take (on a similar question posed) and says:
“The hash is checked when the conductor starts up the DNA and loads it into memory. The hash is the network address too so if you try to change. the code and recompile the WASM hash changes and you’re on your own network anyway.”

Does this actually means if a malicious agent change their DNA - since the DNA’s hash is also the network address (i assume is for the app/DHT network) - the network address changed and therefore the person is in their own “lonely” network without any original/honest users to interact with? Just by changing the DNA rules and thus honest users dont even need to validate any invalid transactions arising from malicious agents changing their DNA because the malicious actors will not be in the same network as honest users to begin with?

Sol · August 22, 2019, 4:56pm

Another question: why do holochain use libsodium for cryptographic implementation? Why not other options like Bouncy Castle, Botan, Libgcrypt etc?

Sol · August 23, 2019, 7:29am

@pauldaoust a 3rd qns

When a author’s transaction start to get broadcast to random nodes in the DHT, are each transaction send directly from author straight to each of the chosen nodes?

Or for some nodes, the transaction need to “hop” to 1 or 2 node (that were not selected) in between b4 finally being relayed to the “chosen” node?

If yes to the above scenario - Do these “nodes in between” need to validate the transaction as well? Or they dont need to validate since they are not chosen but are required to send (or they can also choose to withhold) them to the next node?

pauldaoust · August 26, 2019, 6:53pm

Hi @Sol I’ve moved your questions to a thread of their own — the ‘About the…’ posts are system-level posts that are just for describing the category.

There was a tech AMA on Holo TG and Philip beadle seems to have a different take (on a similar question posed) and says:

“The hash is checked when the conductor starts up the DNA and loads it into memory. The hash is the network address too so if you try to change. the code and recompile the WASM hash changes and you’re on your own network anyway.”

Ideally, yeah, that’s what would happen. You’d try to mess around with your DNA and your Holochain conductor would fork you off into your own DHT. But the problem is that there’s no way to prove to your peers that you haven’t messed around with Holochain itself to report a DNA hash that doesn’t match the DNA you’re actually using. Heck, you wouldn’t even have to use a DNA at all — you could just create a little program that talks the Holochain protocol. You’d need cryptographic signing at the hardware level to prove that you’re using an official, unmodified Holochain runtime, and technology is not at that stage yet.

But “the proof of the pudding is in the eating”: If Alice produces bad data, that’s the only proof that her peers need thta she’s modified her ‘rules of the game’ for the DHT. This is why we still need validation.

That’s not to say that the Holochain conductor’s DNA hashing is completely useless: You could be an honest user, downloading what you think is a DNA that allows you to participate in application X — and in fact what it really does is cause you to inadvertently do all sorts of evil things. But as long as you trust the Holochain conductor itself, there’s no way that a malicious third party can trick you into downloading an evil DNA. The best they can do is cause you to be in the wrong DHT, where any evil actions that your DNA takes won’t blacklist you in the right DHT.

why do holochain use libsodium for cryptographic implementation? Why not other options like Bouncy Castle, Botan, Libgcrypt etc?

As you probably know, ‘good’ cryptography happens out in the open. The robustness of an algorithm depends not on hiding its implementation but on subjecting it to as much scrutiny as possible. People recommend not ‘rolling your own’ crypto. This means:

Don’t create your own algorithms.
Don’t write your own implementations of trusted algorithms.

Your single-maintainer library won’t be as bug-free as one maintained and analyzed by hundreds of academics, white-hat hackers, and security professionals.

NaCl, and its daughter project libsodium, take this one step further:

Don’t choose your own algorithms.

They have a very opinionated stance: there are a lot of crypto algorithms out there, and it’s not enough just to pick one and use it. You need to know whether it’s actually a good algorithm for for your needs. So NaCl/libsodium track the current industry ‘best practice’ recommendations and only implement one or two algorithms for any given category – key derivation, key exchange, signing vs encryption, hashing, etc.

Bouncy Castle et al are big huge toolkits that aim to provide reliable and bug-free implementations of lots of different cryptographic algorithms. But they don’t give as strong guidance as NaCl/libsodium about what algorithms you should use to get the job done.

Anyhow, that’s my limited understanding of why we went with libsodium. We can take an advantage of a whole industry’s wisdom rather than having to become deep crypto experts ourselves. That frees us up to be experts in our own domain.

When a author’s transaction start to get broadcast to random nodes in the DHT, are each transaction send directly from author straight to each of the chosen nodes?

Different DHTs use different replication models. My understanding (which may be incorrect) is illustrated in this document, complete with happy pictures tl;dr:

An author tries their best to contact the one node that they think is ‘most responsible’ for holding their entry.
That node either accepts it, validates it, and gives back a validation certificate (or warrant), or they point the author to the node they think is ‘more responsible’.
The author themselves contacts the ‘more responsible’ node. Steps 1 and 2 are repeated until one node finally accepts the entry.
After validating the entry, the ‘most responsible’ node spreads the data to its neighbours. A node will always have a (nearly) complete picture of its neighbourhood, so it knows exactly who else should be holding the entry. I don’t know if it just passes it on to the next-nearest neighbours, or all of them at once, or what.

So you’ll note that the Holochain DHT design favours direct contact rather than eventual propagation. This lets us come to consistency quickly — we call it ‘fast push, slow heal’.

jmday · August 26, 2019, 7:31pm

An author tries their best to contact the one node that they think is ‘most responsible’ for holding their entry.

That node either accepts it, validates it, and gives back a validation certificate (or warrant), or they point the author to the node they think is ‘more responsible’.

The author themselves contacts the ‘more responsible’ node. Steps 1 and 2 are repeated until one node finally accepts the entry.

After validating the entry, the ‘most responsible’ node spreads the data to its neighbours. A node will always have a (nearly) complete picture of its neighbourhood, so it knows exactly who else should be holding the entry. I don’t know if it just passes it on to the next-nearest neighbours, or all of them at once, or what.

As I understand it, there is no need to find the “most responsible” node, per se, but rather one (or more) responsible nodes. That is, any node that should store that hash in its rrDHT storage arc would be perfectly fine rather than trying to find the most ideal one. Entry resilience (for most hApps) will require that many nodes validate and hold the data and the publisher needs to obtain sufficient validation signatures - indicating both validation and storage - before the publish operation is successful. Via gossip, all nodes with an rrDHT storage arc that contains the entry will also soon validate and store it.

Well after the initial publish and initial gossiping of the entry, any nodes that later expand their rrDHT storage arc to include that entry will need to validate and store it. They may obtain it from any other nodes with an rrDHT storage arc that contains the entry.

I hope this helps.

pauldaoust · August 26, 2019, 7:50pm

@Sol this @jmday guy is one of the two people working on rrDHT, which is our working name for Holochain’s DHT. So he’s legit

For the definition of ‘rrDHT storage arc’, refer to the document I linked in my previous answer — you’ll see that every peer has a ‘neighbourhood’ arc, which is an oversimplification of storage arcs. Basically you’re announcing to your neighbours, “I promise to validate any entry whose hash lands in this part of the DHT.”

@jmday for clarification:

Does the author communicate directly with at least one ‘responsible’ party (once they’ve found them via direct messaging nodes that they think are responsible), rather than relying on gossip to eventually get the entry to the right spot? I might be confusing this with the original hc-proto design, which I think did work that way.
A validator communicating new entries to its peers — does this happen multicast via gossip, or do they contact their closest neighbours on either side, who then contact their closest neighbours, etc?

jmday · August 26, 2019, 10:02pm

We are still working on the implementation, so please consider these answers subject to change due to plans meeting reality…

Does the author communicate directly with at least one ‘responsible’ party (once they’ve found them via direct messaging nodes that they think are responsible), rather than relying on gossip to eventually get the entry to the right spot? I might be confusing this with the original hc-proto design, which I think did work that way.

Yes, the author leverages DHT routing to locate at least one peer that is advertising a storage arc that contains the location where the entry belongs. Direct interactions then ensue until the publish is successful (which requires collecting sufficient signatures to ensure entry storage resilience).

A validator communicating new entries to its peers — does this happen multicast via gossip, or do they contact their closest neighbours on either side, who then contact their closest neighbours, etc?

Due to communication encryption challenges, multicast will not be used. A form of multi-unicast is one option for both publishing and gossip, however.

Sol · August 27, 2019, 6:46am

Very cool @jmday and @pauldaoust for the replies. Educational content again for me

So let’s say the redundancy level for a transaction is 25 - do the author need to know/find each of the 25 nodes directly? Or it sends to a single node whose hash is most related to the transaction hash? And then that node then propagate the transaction to his neighbours (since their hash are likely to be similar to the transaction hash too) ?
Does a transaction consist of 2 separate “parts” to validate? Namely the header and entry? If yes - lets consider this scenario:
if there is a transaction that involve Holo fuel transfer between 2 counterparties and say redundancy level is 25 - will there be total 2 (counter party) X 2 (header + entry) x 25 = 100 different nodes to do validation/storage?

2a) if yes to 2a - would the 100 nodes belongs to roughly 4 neighbourhood based on hash of entry, header and transactions originating from 2 transacting authors?

Since any author could have been issued a warrant before + there is a possibility a author could somehow delete their previous transaction from their chain - would a validation done by any public nodes contain a mandatory process to 1st check if author was flagged as malicious before + check with other peer nodes who contains author’s previous transactions? They definitely cannot check the author’s chain alone to confirm if the current transactions is valid or not.

3a) Based on this “asset transfer” context where more robust validation is needed, could such “mandatory” validation process be done in a way that is efficient + inexpensive ?

pauldaoust · August 27, 2019, 5:33pm

Now that I’m not sure of. Sounds from @jmday’s answer that it’s undecided whether the author contacts all of those 25 nodes or whether they just toss it to one of the nodes (the one whose address is closest to the transaction’s address from the author’s perspective). My hunch is that it’ll make the most sense for the validators in the entry’s hash neighbourhood to take responsibility for saturating the DHT up to the expected 25 copies.

Two or more entries, depending on the design. Your guess about multiple neighbourhoods is very right. In the simplest case (non-transactional stuff like blog posts), you have one entry with one author header/signature on it. That means you have to get two separate neighbourhoods to validate your stuff — one for the entry, one for the header. In addition, whenever a header is published, the validator holding the previous header expects to be told about it so they can keep track of whether the chain has forked. That means you have to talk to three different neighbourhoods to get your entry published!

It gets more complicated for transactions. In the simplest design, you have one entry with two author headers/signatures. That means one neighbourhood for the entry + one neighbourhood for each of the two headers + one neighbourhood for each of the two previous headers = 5 neighbourhoods, each with 25 validators! If you and your counterparty are trying to commit fraud, 125 people is a lot of people to try to deceive.

HoloFuel goes one step further still. Rather than one entry with two signed headers, each transaction consists of four entries, each with its author’s signed header. In addition, the signature is duplicated inside the content of the entry itself. Each step in the flow consists of a signed ‘envelope’ around the details of the step, which may consist of the previous step’s signed envelope. Here’s how it looks:

Initiator proposes a transaction — either a promise of payment or an invoice. 1 signed envelope + 1 header.
Receiver audits the transaction and posts an acceptance. 1 signed envelope + 1 header.
Initiator confirms that they’ve seen the acceptance. 1 signed envelope + 1 header.
Receiver confirms as well. 1 signed envelope + 1 header.

This makes (4 entry neighbourhoods + 4 header neighbourhoods + 4 neighbourhoods for previous headers) * 20 resilience factor (made-up number; might be less or more) = 240 neighbourhoods who have witnessed at least some part of the transaction! In addition, each step contains the signature of the author and all previous signatures on all previous steps, so it’s a pretty robust audit trail.

This is something I’m not entirely clear on. I do know that a validator can attach a warrant to their copy of an entry and share that warrant with its neighbours, but it would be almost as expensive to search for those warrants as to manually audit each counterparty’s transaction history. I wouldn’t expect warrant-checking to be mandatory — that should be an app creator’s decision — but I would like to see settings to make it easy for an app creator or user to have their Holochain do that at the subconscious level.

And it wouldn’t have to be expensive, because it wouldn’t have to be manual. As long as I can trust the validators who approved the last entry + header, I can trust the entire transaction history. Why? Because I trust those validators to have done their homework, which means that they made sure they trusted the previous validators, who made sure they trusted the previous validators, etc.

An alternative quick path would be to allow a warrant to be attached to an agent entry. It’d be very difficult for a malicious actor to control the neighbourhood they’re in, so this would be pretty reliable and would require only one lookup.

jmday · August 27, 2019, 6:18pm

Since any author could have been issued a warrant before + there is a possibility a author could somehow delete their previous transaction from their chain - would a validation done by any public nodes contain a mandatory process to 1st check if author was flagged as malicious before + check with other peer nodes who contains author’s previous transactions? They definitely cannot check the author’s chain alone to confirm if the current transactions is valid or not.

Deleting a previous transaction from a source chain would lead to a source chain fork. Fork detection and the resulting warrant will allow hApp developers to define how those situations should be handled. In the case of a currency, I should think the immune response actions should be rather harsh.

Sol · August 28, 2019, 5:06am

@jmday @pauldaoust

if a previous entry was deleted, what would be the process for fork detection? And would it be easily detected and known to public peers?
If a transaction is published by the author + public nodes who validate/stored author’s past transaction are required to be updated --> will it be part of the process where other public nodes (who are selected to do validation) not just check the author’s source chain but also with the validators who did the validation of the last transaction?
If a previous transaction was deleted by the author, and the author published the latest transaction --> does it now means the author does not need to publish to validators who validate/store the author’s last transaction?

Sol · August 28, 2019, 5:13am

I really like your description of how robust holo fuel transaction/validation process will be. While it is definitely robust + not easy to deceive every validators in every neighbourhood, would that means the “finality” of the holo fuel transaction will take longer + more expensive? Or based on holochain’s network/agent-centric approach (already tested?), it is still very fast and cheap?

Sol · August 28, 2019, 5:15am

If there is a network partition of a DHT to say, 2 network. Any potential attack vectors?
If a malicious actor is in 1 of the DHT partiion, it is not possible for him to attack another DHT partition anyway right?
When 2 partitioned DHT network regain connection, how do they sync and managed conflicting transactions, if any?

pauldaoust · August 28, 2019, 8:24pm

@Sol

We’ve been talking about doing it one of two ways:

Each validator who is storing the previous header is supposed to get a notification from the chain’s author when a new header is published. The notification is then stored as a link to the new header. If a validator receives a second notification, they know the chain has forked.
The same, but the notifications go to the validators that are holding the agent’s ID.

It sounds like you’re digging into the “trust the last transaction” design, but I’m not quite sure. Could you explain more?

Depends on what you’re making a comparison to ‘finality’ is always a probabilistic thing in eventually consistent systems, so it depends on level of trust vs risk involved in the transaction. Finality would probably mean “a quorum of validation signatures has been collected for the entry”, where ‘quorum’ = the resilience factor. Validation can be parallelised, but you’d still need to consider the time involved in making a network connection to each validator and waiting for their response.

Yes, a few possible attack vectors:

Create a partition involving a bunch of malicious agents. ‘Launder’ a bad transaction so far back in the transaction history and collect a quorum of validation signatures from those malicious agents. When the partition reconnects, no honest agents bother to revalidate anything because they see all the signatures. Possible solutions:
- Have agents revalidate entire transaction histories, either randomly or for high-value transactions. This works like a security spot-check.
- Have Holochain measure the ‘health’ of validation signatures at the subconscious level. If a validator notices that the pattern of signatures is irregular and doesn’t match their view of the DHT, they can choose to revalidate, which could trigger a cascading revalidation that causes all the entries from the partition, honest or invalid, to be pulled into the honest partition.
An agent engineers two partitions, each containing one of their targets. They do a transaction with each of their targets and get the transaction validated by each of the peers. Then they allow the partitions to rejoin, and their transactions with both targets are invalidated. But they now have the goods they paid for, so they can throw away their now-blacklisted identity. This would be very hard to engineer — essentially they would have to mount an ‘eclipse attack’ against both targets, which partitions them both into their own separate networks, away from the main network. Possible solutions
- Before accepting a payment, detect whether you’re in an eclipse by pinging ‘beacon nodes’ that are known to be part of a ‘good’ part of the DHT.

Syncing would likely happen as people on either side of the partition start interacting with each other and noticing that their counterparty’s data was previously held in a partition. The nodes will ask validators that were once part of their own partition for the data, those validators will notice that they don’t have it, and will ask their previously partitioned peers for it.

As for conflicts, that depends on the type of conflict and on how the application should handle it. A big subject for another time!

pauldaoust · September 3, 2019, 6:35pm

A post was split to a new topic: rrDHT vs Kademlia

Sol · August 31, 2019, 6:12am

What i mean is that besides checking the author’s source chain, is it also a “mandatory” that validating nodes need to check other peer’s nodes who had store the author’s last transaction? Because the author could always delete his past transaction entry.

Yes, i understand what you meant. Holo fuel transactions is meant to be robust, and therefore more steps - which is always good to know. You also mentioned that validation can be parallelised. Nevertheless, is there already test or measurement done on holo fuel transaction/validation so far with “certain level of resilience factor” ? And if yes - how’s the results so far?
Not sure if you had answer my earlier question. But i can rephrase it: If a DHT network partitioned into 2 and a malicious node is in 1 of the partition, that means he is unable to launch or coordinate an attack on the other partition network right? Or an attack is still possible?

pauldaoust · September 3, 2019, 8:44pm

What i mean is that besides checking the author’s source chain, is it also a “mandatory” that validating nodes need to check other peer’s nodes who had store the author’s last transaction? Because the author could always delete his past transaction entry.

Depends on the app. Each app will specify a ‘validation package’ — a bunch of entries that the new entry depends on. All of those entries must also be valid: accessible on the DHT, sufficient number of validation signatures, no warrants. If any of these criteria are not true, we don’t even bother to validate the new entry.

Currently the validation package can consist of:

nothing,
all the previous headers leading up to this entry’s header, or
all the previous headers and public entries leading up to this entry’s header.

In the future it’ll also include dependent entries — in the case of HoloFuel that would mean an acceptance requires a valid proposal, and a confirmation requires a valid acceptance (which already required a valid proposal).

Right now we’re working out details on how all of this should be designed, so validation is only done locally and fork detection isn’t implemented. But once the lower-level details are finalised, we’ll get the behaviour you’re talking about for free, as I understand.

I have to apologise; I don’t know all the details. I want to say that the work on HoloFuel has been on proving correctness of the algorithm itself, so that when the validating DHT stuff is finalised, HF will ‘just work’. But maybe things are further along than I know?

My idea of an attack vector for that was the second one — about an agent engineering two partitions and existing in both. Nothing to stop them from doing that and avoiding detection in both partitions, because neither of the honest agents in the two partitions can see each other. Some of Bitcoin’s double spend attacks are dependent on this sort of thing. But for an agent trying to join an already healthy network and engineer a partition, it would be a nearly impossible battle. They would almost have had to control the network from the beginning. It’d be like a troublemaker trying to get into your circle of friends and family and cause them to split into warring factions that never talk to each other — possible, but not everybody is going to be perfectly loyal to one side or the other, so information would always leak. And that’s what you want — you want some sort of partition breaker.

premjeet · September 11, 2019, 7:09pm

Is it possible to find the nodes having certain “trait” at the low level in the network? Or, you have to search it at “holovault/persona”… For example, as an app provider I want the list of all nodes who can host my data for not more than X holo.

Sol · September 18, 2019, 6:28pm

@pauldaoust some new questions

Should a holochain node/host become popular - how do they prevent ddos attacks?
any ways to prevent external hacking of a node’s chain? or that is not possible but is at least detectable?

Sol · September 25, 2019, 3:32am

@pauldaoust looking forward to your advice