Questions about DHT, cryptography, and security

Sol · August 31, 2019, 6:12am

What i mean is that besides checking the author’s source chain, is it also a “mandatory” that validating nodes need to check other peer’s nodes who had store the author’s last transaction? Because the author could always delete his past transaction entry.

Yes, i understand what you meant. Holo fuel transactions is meant to be robust, and therefore more steps - which is always good to know. You also mentioned that validation can be parallelised. Nevertheless, is there already test or measurement done on holo fuel transaction/validation so far with “certain level of resilience factor” ? And if yes - how’s the results so far?
Not sure if you had answer my earlier question. But i can rephrase it: If a DHT network partitioned into 2 and a malicious node is in 1 of the partition, that means he is unable to launch or coordinate an attack on the other partition network right? Or an attack is still possible?

pauldaoust · September 3, 2019, 8:44pm

What i mean is that besides checking the author’s source chain, is it also a “mandatory” that validating nodes need to check other peer’s nodes who had store the author’s last transaction? Because the author could always delete his past transaction entry.

Depends on the app. Each app will specify a ‘validation package’ — a bunch of entries that the new entry depends on. All of those entries must also be valid: accessible on the DHT, sufficient number of validation signatures, no warrants. If any of these criteria are not true, we don’t even bother to validate the new entry.

Currently the validation package can consist of:

nothing,
all the previous headers leading up to this entry’s header, or
all the previous headers and public entries leading up to this entry’s header.

In the future it’ll also include dependent entries — in the case of HoloFuel that would mean an acceptance requires a valid proposal, and a confirmation requires a valid acceptance (which already required a valid proposal).

Right now we’re working out details on how all of this should be designed, so validation is only done locally and fork detection isn’t implemented. But once the lower-level details are finalised, we’ll get the behaviour you’re talking about for free, as I understand.

I have to apologise; I don’t know all the details. I want to say that the work on HoloFuel has been on proving correctness of the algorithm itself, so that when the validating DHT stuff is finalised, HF will ‘just work’. But maybe things are further along than I know?

My idea of an attack vector for that was the second one — about an agent engineering two partitions and existing in both. Nothing to stop them from doing that and avoiding detection in both partitions, because neither of the honest agents in the two partitions can see each other. Some of Bitcoin’s double spend attacks are dependent on this sort of thing. But for an agent trying to join an already healthy network and engineer a partition, it would be a nearly impossible battle. They would almost have had to control the network from the beginning. It’d be like a troublemaker trying to get into your circle of friends and family and cause them to split into warring factions that never talk to each other — possible, but not everybody is going to be perfectly loyal to one side or the other, so information would always leak. And that’s what you want — you want some sort of partition breaker.

premjeet · September 11, 2019, 7:09pm

Is it possible to find the nodes having certain “trait” at the low level in the network? Or, you have to search it at “holovault/persona”… For example, as an app provider I want the list of all nodes who can host my data for not more than X holo.

Sol · September 18, 2019, 6:28pm

@pauldaoust some new questions

Should a holochain node/host become popular - how do they prevent ddos attacks?
any ways to prevent external hacking of a node’s chain? or that is not possible but is at least detectable?

Sol · September 25, 2019, 3:32am

@pauldaoust looking forward to your advice

pauldaoust · September 30, 2019, 9:06pm

This will be a feature of Holo Host, but not Holochain proper. About all you can tell about a Holochain node, if they’re not also a Holo hosting device, is how big a chunk of the DHT they’re willing to store.

With Holochain, that’ll be their responsibility — if they’re popular, there’s probably a good reason, such as being the node for an important individual. So it’s probably in their interest to set up good protection — like an OS-level firewall. Certain kinds of attacks, like gossip flood attacks, can probably be stopped in Holochain itself.

With Holo Host, the story changes, because hosts can’t control whether they’re hosting for an important person. They might not have the system in place to protect themselves. On the public side, we’ll be using CloudFlare’s DDoS protection. But I don’t know if the hosts themselves can be accessed on a public IP. I haven’t talked to the HoloPort devs in a while, but I understand they were going to be setting up good firewall rules. Don’t quote me on any of this, cuz I’m just guessing

Because each chain entry is signed by its author, there is only one way to hack their chain: you’ve got to steal their private key. This is a vulnerability for sure — it’s hard to protect against leaving your device on the bus, getting infected with malware, or falling victim to the $5 wrench attack — but it’s a vulnerability shared by all authentication systems, not just cryptographic ones.

So external hacking is impossible. The best that a malicious agent could do is stage an Eclipse attack or refuse to store data that they’re supposed to store — and all that would do is give an incomplete source chain, not a corrupted one.

Sol · October 20, 2019, 2:12pm

Hi @pauldaoust another question on the topic of randomness which affects security/validation.

I have been reading 2 great articles on randomness used by Ethereum and Near Protocol to choose the validators to produce blocks. The objective is the random process cannot be biasable nor predictable by malicious agent.

I understand that in holochain, the public validator get chosen to validate entries based on how “similar” is their public key hash is to entry’s hash.

I like to know how robust is this random validator selection process? Is it in any ways still possible to be biasable and predictable by malicious agents trying to get selected?

2 reference articles below:

pauldaoust · October 21, 2019, 6:53pm

Let’s first take a look at why a malicious agent might want to influence randomness. In blockchain, if you can influence a seed such that you could be selected next as a validator, you could do all sorts of nasty things to that block. But with Holochain, validators tend to not be interested in the data they receive, because it doesn’t involve their economic interests — e.g., in a currency app, their account balance isn’t involved in the same chain of transactions as the transactions they’re validating.

But in a Holochain app you as a data producer may be interested in influencing the randomness that chooses a validator for you. If you want to get bad data to pass as good data, you need to find a validator to collude with you.

Depending on the data and the size of the DHT, this might not be very hard. If you control the entire contents of the data structure, you can totally influence the validator selection. A transaction might have ‘memo’ fields, with which you can ‘mine’ the hash into a bad neighbourhood that’s under your control. There wouldn’t be much value in introducing a verifiable-delay function here, because it’s meant to slow down a group of people, not a single person.

But there are some nice statistical things that make it kinda worthless to bother with this:

You can’t control who’s in the neighbourhood. Our next DHT design will collapse all addresses into a 32-bit space. “Wait,” you say, “that makes it even easier to mine an entry into a dishonest neighbourhood!” True, but it also makes it much more difficult to keep honest nodes out of that neighbourhood.
Most apps that require high level of validation confidence will be transactional apps, where a number of entries are produced for each transaction (I think HoloFuel produces up to five). As mentioned in a previous forum thread (can’t find it), that exponentially increases the difficulty of engineering enough bad neighbourhoods. Both the initiator and the receiver could be in collusion with each other for this to even work, and by the end of a five-step transaction flow (like with HoloFuel), you have to be in control of five neighbourhoods, which is 2⁵ (16×) more difficult than controlling just one. And you have to do it for every bad transaction!
If only one party is corrupt, they don’t have any influence over the data their counterparty produces in response. This means that the best they can do is create an invalid first entry, get it validated by corrupt peers, and then have the honest counterparty immediately catch the fraud.
I don’t know the exact plans, but I believe that in the future we’ll have automatic detection and revalidation of entries that were created in a partition, so this makes it more difficult for colluding nodes to create a partition for the purpose of ensuring that all their validating peers are corrupt.

GreatDragonian · November 16, 2019, 3:42pm

Hi @pauldaoust! Have seen you around in the currency design course. Its great!! I’ve felt its like i’m learning some deep secrets of the Universe that are mostly hidden in our culture! What do you think?

I want to ask a question, because I got really excited with the Holoports announcement and started to think again about the technical parts of Holochain:

Imagine Alice wants to publish an invalid entry for certain hApp, which requires 10 validations, so she follows this procedure:

Simulate the entry creation process to obtain the hashes of both the header and the body.
Create two neighborhoods of 10 sybil nodes each, that will respectively validate the body and the header.
Submit the body and header respectively to one node in each neighborhood and get the entry validated.

Note that:

In step 2, if there exist, say 1000 nodes in the DHT, the address space is splitted in 1000, so its easy to construct the first node of each validating neighborhood: probability 1/1000 via brute force to make it look reasonable, in the sense that at most 1 node that should contain the transaction (the nearest honest node) may possibly not contain it. Also, the probability of finding the second one is also low (1/1001 because now the address space is divided in 1001 parts) and the third one (1/1002), etc… so in total I just expect to need around 1000+1001+…1009 = 10,045 Brute force searches to find each neighboorhood (20,090) in total for the 2 neighborhoods.
Also in step 2, precomputing the hashes and afterwards creating the validating neighborhoods is much easier than creating the neighborhoods first and then mining a nonce that will make the validating entry fall into both neighborhoods (in this case, the complexity is indeed multiplicative, not additive).
In step 3, the sybil nodes may have hacked their addresses tables, so they only propagate entries to the (just created) peers.

How could Alice’s attack be stopped and detected? Thanks for your answer!

GreatDragonian · November 18, 2019, 3:51pm

Update:

I believe I had my doubt solved at the Telegram group!

As far as I understand, Holochain uses the concept of storage arcs for each node, in which every node keeps up to date regarding transactions on a certain address space: jmday

That way, if ONLY the sybil nodes validate a transaction, and there exists a honest node whose storage arc contains a transaction’s hash (so it supposedly should have validated it as well), 2 things: can happen

The transaction gets invalidated when the honest node receives it and all involved nodes are blacklisted.
The honest node never knew of the transaction (because the sybil nodes hacked their address tables). In that case, the DHT is in an inconsistent state regarding that transaction (because only some nodes know about it) and so, its easily detectable and proper measures may be taken (for example, a full recursive check of ledgers involved).

I believe that’s why uptime is key to the proper functioning of high security apps on Holochain, and in the specific case of HoloFuel, this is achieved by means of the holoports.

Also, I am starting to think that holochain does not use global consensus, but a more natural, “holographic” consensus, that depends on the scale. Sounds awesome!! For example, we know time is relative in the Universe… but here on we have a local consensus about it.

What do you think?

sidsthalekar · November 19, 2019, 2:24am

@pauldaoust - just curious if you guys have explored reputation backed validation circles? I think the reputation economy would help that tremendously. (I can share more)

Sol · November 19, 2019, 5:12pm

@pauldaoust another question!

Everytime, a entry is created, its header includes a time-stamp.

The question is, how is this time-stamp derived? By the time of the machine of the author?
If yes, does it factor in timezone of where’s the author is based? As 1200 for someone based in Japan is different to 1200 for someone based in Singapore.
how does the counterparty or validating nodes determine the accuracy of the time-stamp of the entry relative to their own time?
What if the author’s try to “fake” the time-stamp of the entry?
If the author tries to do 2 conflicting entries at the same time to double spend - how does the counterparty or validating nodes able to objectively know the correct order sequence of the 2 entries?

Basically, my end question is - for a agent-centric network like holochain with no global ledger or no “global-notion of time”, how do nodes able to objectively and easily tells the proper sequence of events? I think this is very important for security. Look forward to hear your views.

pauldaoust · November 19, 2019, 6:17pm

@sidsthalekar that would be pretty cool, and it’s essentially what Secure Scuttlebutt does — propagation happens through friend networks, and only valid data gets propagated. So far though I’ve only heard about validation via random hash-based selection. An old article by Art, though (pre-Holochain), talks about using trusted notaries rather than random peers. You could define ‘trusted’ as ‘I trust this person’s validation result because of their reputation graph’, although for the foreseeable future it’d be an application-level validation, not something at Holochain’s subconscious layer.

@Sol

Yes, it’s derived from the machine’s clock.
I believe it does factor in the TZ, it’s an ISO8601 timestamp which includes TZ information. (Of course, the accuracy depends on what TZ the owner of the machine has chosen.)
Validating nodes typically don’t trust the timestamp as reliable (though I suspect they’d want to see an increasing series of timestamps). If your app’s validation rules require accurate timestamping, there are two options:
1. Set up trusted nodes that do signed timestamping. This is a centralisation point.
2. Use ‘network time’, which is the average of the timestamps of the first R validation signatures (where R = the resilience factor of the app). The expectation is that a random selection of validators will be likely to have system clocks that are within a range of correctness. This isn’t available at validation time though — only afterwards, for use in validating subsequent entries. And right now it’s just an imaginary feature.
See 3.
It depends on the implementation of the currency. There are a few scenarios:
1. A currency that finalises transactions synchronously through node-to-node messaging. The entry is created and passed to the recipient, but not written to any source chain until both the initiator and the recipient sign it. At that point it’s written to both of their source chains. I’m not an expert on this and would need to think about it more, but it would involve both parties committing to ‘lock’ their chains until the transaction were finalised, and giving the recipient time to check that the initiator hadn’t completed some alternate transaction at the same time. The part I haven’t quite figured out yet is how the recipient would discover the alternate transaction.
2. A currency in which each transaction step is written to the DHT. Holochain’s design is closer to this, and it has two advantages: you can safely initiate multiple transactions, and it’s easy to detect double spends — all the recipient needs to do is make sure the initiator’s entry exists on the DHT before confirming it.

Generally global notion of time isn’t necessary or even reliable when you’re trying to construct a correct sequence of events. It’s better (and easier) to simply prove that B happened after A. This is all blockchain does — its global block clock simply establishes happened-after relationships between transactions.

The difference with Holochain is that it recognises that you don’t need to construct a complete global order of events — all you need to do is construct an order of the events that you cared about. Sometimes that means parallel trees of history:

Let’s say you’re Hector, the purple guy at the very end. Do you care when Charlie sent money to Eve in relation to when Alice and Bob sent money to Diane? No; all you care about is that each bit of coin you’re being given has a valid trail of transfers behind it, all the way back to the original money creation events. In other words, you care that Alice had money to give before she gave it to Diane, and likewise when Diane gave to Frank and so forth. Timestamps and total ordering don’t matter, but logical ordering of the stuff you care about does.

Here are some more thoughts from the FAQ: How are timestamps and ordered timelines of events achieved on Holochain?

pauldaoust · November 19, 2019, 6:35pm

@GreatDragonian I missed your earlier post on this thread. I like the possibilities you’re exploring, and the consequences of those possibilities. I never thought of creating the entry first and then post-mining the Sybils to validate it. Some thoughts:

Yep, storage arcs say “I’m taking responsibility for this section of the DHT” so the evil Sybils are required to propagate the invalid data to the honest nodes in their neighbourhood. But you’re also right that the evil Sybils are under no obligation to actually share their data with the honest ones. They can hack their peer tables, or just simply be selective about what they gossip to their peers. FWIU here’s how it will look: I, an honest node, look at evil Alice’s source chain and say “hm, I wonder what my peers think of each of Alice’s entries.” So I’ll ask the DHT for the validation certificates of the entries on Alice’s chain. I’m going to go to the peers that I assume to be trustworthy, which Alice and her Sybils have no control over. The ones in the neighbourhoods of the invalid entry and its header will say “sorry, ain’t got that”. I can decide to either reject the entry, or do a deep validation of Alice’s chain and the people she’s transacted with, etc.
Unlike with PoW/PoS blockchains, Holochain doesn’t have a built-in mechanism for reducing the impact of Sybils. For things like currencies, app creators should probably design membership validation rules that connect a public key to a real human ID somehow, and limit the number of agents that a human can generate.

Sol · November 21, 2019, 6:20am

@pauldaoust 2 quick questions:

is holochain eventual consistency considered a weak eventual consistency or a strong eventual consistency (with safety guarantee) ? I guess if holochain is having CRDT, it should be SEC? It will be great if you can elaborate more on this.
https://en.wikipedia.org/wiki/Eventual_consistency
When i validate a counterparty chain - do i just validate their source chain only or i also additionally check the public DHT based on my counterparty’s previous header to check for any conflicting transaction to my current transaction?

pauldaoust · November 22, 2019, 4:06pm

@Sol oh goody, I love questions about consistency, especially insightful ones that show a good prior knowledge of the subject It’s like my favourite Holochain subject! As you guessed, Holochain is almost entirely strong eventual consistency. We’ve modelled as many things as possible based on the CALM theorem, which says that as long as a program never retracts a statement, you don’t need any coordination protocol – you just keep adding facts. It’s the formal explanation for how all SEC systems work. Here are a few CALM points:

The source chain only gets new entries added to it; entries are never deleted.
The DHT keeps growing; even deletes and updates leave the original entry in place.
In cases where two agents try to modify one resource at the same time (delete a link, update an entry, etc) a CRDT can be set up to resolve the conflicting state. As you may know, CRDTs are unambiguous rules for resolving conflicts that don’t rely on coordination protocols.

One spot that I’m not sure about is source chain forks (AKA rollbacks or conflicting headers). My guess is that there will be a CRDT whose resolution says “both headers are now invalid”, because 95% of the time it’s an indication that someone is acting dishonestly. (5% of the time it could be because their computer crashed halfway through a commit, so we’ve got to cover that case.)

Now on to validating a counterparty chain. I’m not 100% sure, and the HoloFuel developers are too busy for me to ask them, but I would guess that you’d want to check for forks during validation. Eventually most of that will be handled ‘subconsciously’ as part of DHT validation (that is, if you try to pull an invalid proposal entry from the DHT you’ll see that its author has a warrant against them), but there is one interactive step in the middle that I’m not sure about.

Sol · November 22, 2019, 7:21pm

@pauldaoust I really learn so much asking questions and getting insightful replies from you.

I think a dedicated blogpost on strong eventual consistency, achieving data integrity without any notion of global clock, coordination would be a great educational resource for both holochain community and crypto community-at-large!

It is so refreshing to see holochain radically different agent-centric approach and ditching the need for global consensus/storage which then frees up many scalability/decentralization bottlenecks and yet somehow manage to ensure data integrity/not compromising on security.

It would be great if you could eventually help me confirm my question with holofuel developers once they have the time (hopefully soon).

Sol · November 22, 2019, 7:34pm

A side thought: i really wonder how long it will typically take to do queries via links/headers in a rrDHT with big population. Will design of rrDHT speed things up efficiently? Wonders how’s the latency like? Can’t wait to see actual results in practice!

Also wonder, say in holofuel, there are minimum redundancy of 25. Again, would like to know typically how long it will take to get this threashold of signature attestations? I know it depends on many factors. But would love to know an average time. Can’t know it now right?

pauldaoust · November 22, 2019, 8:43pm

Yes, I agree about needing an article about SEC, CALM, and intrinsic/distributed data integrity would be very valuable right about now. I’d like to include bits about the immune system too, although maybe that would be a separate article.

And I agree with you that performance will probably be impossible to determine before we have some heavily used apps in the wild to actually measure. But from what I understand, it shouldn’t take too many hops to find data. The DHT address space gets collapsed down to 32-bit numbers, which means there’s only 4 billion possible locations to store data. Here are some thoughts on the efficiency from the designer:

Imagine a worst-case scenario: a DHT network with 4 billion nodes. The network stores so much data that all nodes choose to only index an arc of 1 and keep a query arc of 2. A worst-case query should be O(log n), or roughly 22 hops.

But individual node references do not take up that much memory space, so nodes could, in fact, store a great deal more references than the above algorithm [keeping connections to 10 random nodes outside the query arc], and publish a much wider query arc than 2. These factors greatly reduce the number of hops to query. In most real-world applications, it should be trivial to achieve full query arc coverage, thus reducing the hops for any query to 1.

Sol · November 24, 2019, 4:51pm

Hi @pauldaoust @artbrock @zippy Holochain’s design is really 1 of a kind. Make a lot of sense in theory. But largely not proven in practice and at scale. So there are still a big element of uncertainty.

Have holochain team ever considered doing a security audit of not just the code but analysis of every portion of the design implementation?

I think having a very comprehensive security analysis by 3rd party neutral experts would be beneficial both for the team and educational for the community as well.

Besides holochain, i also followed the progress of Solana very closely. They are a high performance blockchain (without sharding) with innovation/performance optimizations focus at many levels including cryptographic global clock, mempool, networking/block propagation, database, VM, GPU parralel processing of smart contracts, proof/storage of shared ledger etc. I believe some of the holochain team members met Solana at the Rust Conference a few months back (they also programmed in Rust!).

They engaged Kudelski Security to do a comprehensive audit not just on the security but also on scalabiliity and decentralization metrics (against their claims). The findings was quite insightful. Inspite of having a very solid world class technical team, the audit still come up with many potential edge cases attack vectors that their design still can further improve on/make the security more resilient and prove more of the assumptions in practice.

Some findings that could be relevant to holochain are how they respond to attack vectors related to multiple network partitions scenarios?

Anyway, here is the link of the security audit: https://solana.com/wp-content/uploads/2019/11/Solana-Final-Report-Public.pdf

Some other very solid security audit firms in the blockchain space includes Trail of Bits and ChainSecurity (whom i know the team). Love to hear your thought on this and the team’s position on the consideration to do an audit anytime soon?