Questions about DHT, cryptography, and security

Let’s first take a look at why a malicious agent might want to influence randomness. In blockchain, if you can influence a seed such that you could be selected next as a validator, you could do all sorts of nasty things to that block. But with Holochain, validators tend to not be interested in the data they receive, because it doesn’t involve their economic interests — e.g., in a currency app, their account balance isn’t involved in the same chain of transactions as the transactions they’re validating.

But in a Holochain app you as a data producer may be interested in influencing the randomness that chooses a validator for you. If you want to get bad data to pass as good data, you need to find a validator to collude with you.

Depending on the data and the size of the DHT, this might not be very hard. If you control the entire contents of the data structure, you can totally influence the validator selection. A transaction might have ‘memo’ fields, with which you can ‘mine’ the hash into a bad neighbourhood that’s under your control. There wouldn’t be much value in introducing a verifiable-delay function here, because it’s meant to slow down a group of people, not a single person.

But there are some nice statistical things that make it kinda worthless to bother with this:

  • You can’t control who’s in the neighbourhood. Our next DHT design will collapse all addresses into a 32-bit space. “Wait,” you say, “that makes it even easier to mine an entry into a dishonest neighbourhood!” True, but it also makes it much more difficult to keep honest nodes out of that neighbourhood.
  • Most apps that require high level of validation confidence will be transactional apps, where a number of entries are produced for each transaction (I think HoloFuel produces up to five). As mentioned in a previous forum thread (can’t find it), that exponentially increases the difficulty of engineering enough bad neighbourhoods. Both the initiator and the receiver could be in collusion with each other for this to even work, and by the end of a five-step transaction flow (like with HoloFuel), you have to be in control of five neighbourhoods, which is 2⁵ (16×) more difficult than controlling just one. And you have to do it for every bad transaction!
  • If only one party is corrupt, they don’t have any influence over the data their counterparty produces in response. This means that the best they can do is create an invalid first entry, get it validated by corrupt peers, and then have the honest counterparty immediately catch the fraud.
  • I don’t know the exact plans, but I believe that in the future we’ll have automatic detection and revalidation of entries that were created in a partition, so this makes it more difficult for colluding nodes to create a partition for the purpose of ensuring that all their validating peers are corrupt.
1 Like

Hi @pauldaoust! Have seen you around in the currency design course. Its great!! I’ve felt its like i’m learning some deep secrets of the Universe that are mostly hidden in our culture! What do you think?

I want to ask a question, because I got really excited with the Holoports announcement and started to think again about the technical parts of Holochain:

Imagine Alice wants to publish an invalid entry for certain hApp, which requires 10 validations, so she follows this procedure:

  1. Simulate the entry creation process to obtain the hashes of both the header and the body.

  2. Create two neighborhoods of 10 sybil nodes each, that will respectively validate the body and the header.

  3. Submit the body and header respectively to one node in each neighborhood and get the entry validated.

Note that:

  • In step 2, if there exist, say 1000 nodes in the DHT, the address space is splitted in 1000, so its easy to construct the first node of each validating neighborhood: probability 1/1000 via brute force to make it look reasonable, in the sense that at most 1 node that should contain the transaction (the nearest honest node) may possibly not contain it. Also, the probability of finding the second one is also low (1/1001 because now the address space is divided in 1001 parts) and the third one (1/1002), etc… so in total I just expect to need around 1000+1001+…1009 = 10,045 Brute force searches to find each neighboorhood (20,090) in total for the 2 neighborhoods.
  • Also in step 2, precomputing the hashes and afterwards creating the validating neighborhoods is much easier than creating the neighborhoods first and then mining a nonce that will make the validating entry fall into both neighborhoods (in this case, the complexity is indeed multiplicative, not additive).
  • In step 3, the sybil nodes may have hacked their addresses tables, so they only propagate entries to the (just created) peers.

How could Alice’s attack be stopped and detected? Thanks for your answer!

Update:

I believe I had my doubt solved at the Telegram group!

As far as I understand, Holochain uses the concept of storage arcs for each node, in which every node keeps up to date regarding transactions on a certain address space: jmday

That way, if ONLY the sybil nodes validate a transaction, and there exists a honest node whose storage arc contains a transaction’s hash (so it supposedly should have validated it as well), 2 things: can happen

  1. The transaction gets invalidated when the honest node receives it and all involved nodes are blacklisted.
  2. The honest node never knew of the transaction (because the sybil nodes hacked their address tables). In that case, the DHT is in an inconsistent state regarding that transaction (because only some nodes know about it) and so, its easily detectable and proper measures may be taken (for example, a full recursive check of ledgers involved).

I believe that’s why uptime is key to the proper functioning of high security apps on Holochain, and in the specific case of HoloFuel, this is achieved by means of the holoports.

Also, I am starting to think that holochain does not use global consensus, but a more natural, “holographic” consensus, that depends on the scale. Sounds awesome!! For example, we know time is relative in the Universe… but here on :earth_americas: we have a local consensus about it.

What do you think?

1 Like

@pauldaoust - just curious if you guys have explored reputation backed validation circles? I think the reputation economy would help that tremendously. (I can share more)

1 Like

@pauldaoust another question! :slight_smile:

Everytime, a entry is created, its header includes a time-stamp.

  1. The question is, how is this time-stamp derived? By the time of the machine of the author?

  2. If yes, does it factor in timezone of where’s the author is based? As 1200 for someone based in Japan is different to 1200 for someone based in Singapore.

  3. how does the counterparty or validating nodes determine the accuracy of the time-stamp of the entry relative to their own time?

  4. What if the author’s try to “fake” the time-stamp of the entry?

  5. If the author tries to do 2 conflicting entries at the same time to double spend - how does the counterparty or validating nodes able to objectively know the correct order sequence of the 2 entries?

Basically, my end question is - for a agent-centric network like holochain with no global ledger or no “global-notion of time”, how do nodes able to objectively and easily tells the proper sequence of events? I think this is very important for security. Look forward to hear your views.

@sidsthalekar that would be pretty cool, and it’s essentially what Secure Scuttlebutt does — propagation happens through friend networks, and only valid data gets propagated. So far though I’ve only heard about validation via random hash-based selection. An old article by Art, though (pre-Holochain), talks about using trusted notaries rather than random peers. You could define ‘trusted’ as ‘I trust this person’s validation result because of their reputation graph’, although for the foreseeable future it’d be an application-level validation, not something at Holochain’s subconscious layer.

@Sol

  1. Yes, it’s derived from the machine’s clock.
  2. I believe it does factor in the TZ, it’s an ISO8601 timestamp which includes TZ information. (Of course, the accuracy depends on what TZ the owner of the machine has chosen.)
  3. Validating nodes typically don’t trust the timestamp as reliable (though I suspect they’d want to see an increasing series of timestamps). If your app’s validation rules require accurate timestamping, there are two options:
    1. Set up trusted nodes that do signed timestamping. This is a centralisation point.
    2. Use ‘network time’, which is the average of the timestamps of the first R validation signatures (where R = the resilience factor of the app). The expectation is that a random selection of validators will be likely to have system clocks that are within a range of correctness. This isn’t available at validation time though — only afterwards, for use in validating subsequent entries. And right now it’s just an imaginary feature.
  4. See 3.
  5. It depends on the implementation of the currency. There are a few scenarios:
    1. A currency that finalises transactions synchronously through node-to-node messaging. The entry is created and passed to the recipient, but not written to any source chain until both the initiator and the recipient sign it. At that point it’s written to both of their source chains. I’m not an expert on this and would need to think about it more, but it would involve both parties committing to ‘lock’ their chains until the transaction were finalised, and giving the recipient time to check that the initiator hadn’t completed some alternate transaction at the same time. The part I haven’t quite figured out yet is how the recipient would discover the alternate transaction.
    2. A currency in which each transaction step is written to the DHT. Holochain’s design is closer to this, and it has two advantages: you can safely initiate multiple transactions, and it’s easy to detect double spends — all the recipient needs to do is make sure the initiator’s entry exists on the DHT before confirming it.

Generally global notion of time isn’t necessary or even reliable when you’re trying to construct a correct sequence of events. It’s better (and easier) to simply prove that B happened after A. This is all blockchain does — its global block clock simply establishes happened-after relationships between transactions.

The difference with Holochain is that it recognises that you don’t need to construct a complete global order of events — all you need to do is construct an order of the events that you cared about. Sometimes that means parallel trees of history:

Let’s say you’re Hector, the purple guy at the very end. Do you care when Charlie sent money to Eve in relation to when Alice and Bob sent money to Diane? No; all you care about is that each bit of coin you’re being given has a valid trail of transfers behind it, all the way back to the original money creation events. In other words, you care that Alice had money to give before she gave it to Diane, and likewise when Diane gave to Frank and so forth. Timestamps and total ordering don’t matter, but logical ordering of the stuff you care about does.

Here are some more thoughts from the FAQ: How are timestamps and ordered timelines of events achieved on Holochain?

2 Likes

@GreatDragonian I missed your earlier post on this thread. I like the possibilities you’re exploring, and the consequences of those possibilities. I never thought of creating the entry first and then post-mining the Sybils to validate it. Some thoughts:

  • Yep, storage arcs say “I’m taking responsibility for this section of the DHT” so the evil Sybils are required to propagate the invalid data to the honest nodes in their neighbourhood. But you’re also right that the evil Sybils are under no obligation to actually share their data with the honest ones. They can hack their peer tables, or just simply be selective about what they gossip to their peers. FWIU here’s how it will look: I, an honest node, look at evil Alice’s source chain and say “hm, I wonder what my peers think of each of Alice’s entries.” So I’ll ask the DHT for the validation certificates of the entries on Alice’s chain. I’m going to go to the peers that I assume to be trustworthy, which Alice and her Sybils have no control over. The ones in the neighbourhoods of the invalid entry and its header will say “sorry, ain’t got that”. I can decide to either reject the entry, or do a deep validation of Alice’s chain and the people she’s transacted with, etc.
  • Unlike with PoW/PoS blockchains, Holochain doesn’t have a built-in mechanism for reducing the impact of Sybils. For things like currencies, app creators should probably design membership validation rules that connect a public key to a real human ID somehow, and limit the number of agents that a human can generate.
2 Likes

@pauldaoust 2 quick questions:

  1. is holochain eventual consistency considered a weak eventual consistency or a strong eventual consistency (with safety guarantee) ? I guess if holochain is having CRDT, it should be SEC? It will be great if you can elaborate more on this.
    https://en.wikipedia.org/wiki/Eventual_consistency

  2. When i validate a counterparty chain - do i just validate their source chain only or i also additionally check the public DHT based on my counterparty’s previous header to check for any conflicting transaction to my current transaction?

@Sol oh goody, I love questions about consistency, especially insightful ones that show a good prior knowledge of the subject :wink: It’s like my favourite Holochain subject! As you guessed, Holochain is almost entirely strong eventual consistency. We’ve modelled as many things as possible based on the CALM theorem, which says that as long as a program never retracts a statement, you don’t need any coordination protocol – you just keep adding facts. It’s the formal explanation for how all SEC systems work. Here are a few CALM points:

  • The source chain only gets new entries added to it; entries are never deleted.
  • The DHT keeps growing; even deletes and updates leave the original entry in place.
  • In cases where two agents try to modify one resource at the same time (delete a link, update an entry, etc) a CRDT can be set up to resolve the conflicting state. As you may know, CRDTs are unambiguous rules for resolving conflicts that don’t rely on coordination protocols.

One spot that I’m not sure about is source chain forks (AKA rollbacks or conflicting headers). My guess is that there will be a CRDT whose resolution says “both headers are now invalid”, because 95% of the time it’s an indication that someone is acting dishonestly. (5% of the time it could be because their computer crashed halfway through a commit, so we’ve got to cover that case.)

Now on to validating a counterparty chain. I’m not 100% sure, and the HoloFuel developers are too busy for me to ask them, but I would guess that you’d want to check for forks during validation. Eventually most of that will be handled ‘subconsciously’ as part of DHT validation (that is, if you try to pull an invalid proposal entry from the DHT you’ll see that its author has a warrant against them), but there is one interactive step in the middle that I’m not sure about.

3 Likes

@pauldaoust I really learn so much asking questions and getting insightful replies from you. :slight_smile:

I think a dedicated blogpost on strong eventual consistency, achieving data integrity without any notion of global clock, coordination would be a great educational resource for both holochain community and crypto community-at-large!

It is so refreshing to see holochain radically different agent-centric approach and ditching the need for global consensus/storage which then frees up many scalability/decentralization bottlenecks and yet somehow manage to ensure data integrity/not compromising on security.

It would be great if you could eventually help me confirm my question with holofuel developers once they have the time (hopefully soon). :wink:

3 Likes

A side thought: i really wonder how long it will typically take to do queries via links/headers in a rrDHT with big population. Will design of rrDHT speed things up efficiently? Wonders how’s the latency like? Can’t wait to see actual results in practice!

Also wonder, say in holofuel, there are minimum redundancy of 25. Again, would like to know typically how long it will take to get this threashold of signature attestations? I know it depends on many factors. But would love to know an average time. Can’t know it now right? :stuck_out_tongue_winking_eye:

Yes, I agree about needing an article about SEC, CALM, and intrinsic/distributed data integrity would be very valuable right about now. I’d like to include bits about the immune system too, although maybe that would be a separate article.

And I agree with you that performance will probably be impossible to determine before we have some heavily used apps in the wild to actually measure. But from what I understand, it shouldn’t take too many hops to find data. The DHT address space gets collapsed down to 32-bit numbers, which means there’s only 4 billion possible locations to store data. Here are some thoughts on the efficiency from the designer:

Imagine a worst-case scenario: a DHT network with 4 billion nodes. The network stores so much data that all nodes choose to only index an arc of 1 and keep a query arc of 2. A worst-case query should be O(log n), or roughly 22 hops.

But individual node references do not take up that much memory space, so nodes could, in fact, store a great deal more references than the above algorithm [keeping connections to 10 random nodes outside the query arc], and publish a much wider query arc than 2. These factors greatly reduce the number of hops to query. In most real-world applications, it should be trivial to achieve full query arc coverage, thus reducing the hops for any query to 1.

2 Likes

Hi @pauldaoust @artbrock @zippy Holochain’s design is really 1 of a kind. Make a lot of sense in theory. But largely not proven in practice and at scale. So there are still a big element of uncertainty.

Have holochain team ever considered doing a security audit of not just the code but analysis of every portion of the design implementation?

I think having a very comprehensive security analysis by 3rd party neutral experts would be beneficial both for the team and educational for the community as well.

Besides holochain, i also followed the progress of Solana very closely. They are a high performance blockchain (without sharding) with innovation/performance optimizations focus at many levels including cryptographic global clock, mempool, networking/block propagation, database, VM, GPU parralel processing of smart contracts, proof/storage of shared ledger etc. I believe some of the holochain team members met Solana at the Rust Conference a few months back (they also programmed in Rust!).

They engaged Kudelski Security to do a comprehensive audit not just on the security but also on scalabiliity and decentralization metrics (against their claims). The findings was quite insightful. Inspite of having a very solid world class technical team, the audit still come up with many potential edge cases attack vectors that their design still can further improve on/make the security more resilient and prove more of the assumptions in practice.

Some findings that could be relevant to holochain are how they respond to attack vectors related to multiple network partitions scenarios?

Anyway, here is the link of the security audit: https://solana.com/wp-content/uploads/2019/11/Solana-Final-Report-Public.pdf

Some other very solid security audit firms in the blockchain space includes Trail of Bits and ChainSecurity (whom i know the team). Love to hear your thought on this and the team’s position on the consideration to do an audit anytime soon?

4 Likes

@pauldaoust agree. I think security concerns in agent-centric architectures get addressed a great deal when reputation comes into play. At our end, we might create some standard reputation-based-validation-functions to drive this conversation with app-creators.

1 Like

Thanks for sharing that @Sol. I do know that there’s an intention to have security audits, though I don’t know to what extent and what timeline. I think I have heard talk of an audit of the assumptions that go into our security model though — don’t quote me on that though!

2 Likes

Hmm. This approach may be problematic if there is serious desire to support offline edits. If offline edits are wanted, then — I presume — there should be a way to solve the this at the application side. [ Or, alternatively, I don’t yet understand what is really going on here, which quite likely. In that case, my apology for the noise. ]

This may not be a typical holochain app, but let’s consider collaborative text editing. There is a strong desire to allow people to edit the same document at the same time, even when they are offline. In the typical case, most of the edits can be easily folded when coming back online (think about git merge), but they will typically need some help from the application level. Then there will be some edits that cannot be folded together. For the text editing app, the best option then is usually to show both edits, as “alternative realities,” and let the users decide.

Has there been any thought how to implement such functionality on holochain? (Please forgive my ignorance, I’m a newbie to holochain.)

The source chain is in principal not editable. What you do, if you want to edit an entry, is marking it edited (or deleted) and linking to the new entry.
That means you always have the full history stored.

What you can do in the case of the document, to get rid of the history, is having the parent App/UI spin up a new instance of the document and create a new DHT and only import the latest edits.

A rollback or editing of the source chains is not allowed.

@pauldaoust I don’t think the 5% in your example shouldn’t be possible. At least from my understanding you first write to the source chain and then publish the header. That means if the process gets interrupted half way, the header is just not published.
Or are you talking about a system crashing and restoring to an earlier backup without the latest entry into the source chain? That case should definitely be covered.

@raphisee Your example (restoring to an earlier backup that doesn’t contain the latest commit) is a perfect illustration of what I’m worried about. I can’t think of any more examples, just a general unease about things that are supposed to be atomic but can get screwed up by babies unplugging power cables or pouring juice into keyboards :slight_smile:

2 Likes

@PekkaNikander a bit of disambiguation:

The case in question is about the source chain, which is a data structure with precise validation constraints: for each application, each agent keeps their own journal of writes in an hash chain (effectively a very simple Merkle tree). The only two constraints are that it be unbranched, and that each entry is signed with the private key whose public component is found in entry #2 of the chain. From an application perspective, it’s not a very interesting structure — it’s about as sexy as the write-ahead log (WAL) in a relational database.

Why does it need to be unbranched? My guess is that it has something to do with the fact that some applications (currencies, etc) pretty much demand a linear history, and it’d be too hard to reason about whether an agent is in a consistent state if you had to worry about branches. Why isn’t this an application-level constraint then? Not sure, but probably because it doesn’t really hurt anyone to enforce that constraint on every application.

Holochain relies on the CALM theorem for coming to consistency — Merkle trees and DHTs can both be modelled CALMly. However, AFAICT proving the non-existence of something (e.g., an alternate branch on a source chain) is not something that can be done CALMly. Holochain does the next best thing – it makes it hard to keep secrets within a group of agents, thereby increasing the chance that someone will discover the conflicting branches. When this happens, I guess you could say it’s like a CRDT that ‘tombstones’ both branches (marks them both as deleted) and ‘seals’ the source chain as it existed before that point.

So if the individual agents’ source chains are equivalent to a WAL, where does the materialised state (equivalent to SQL’s rows/tables) live? Uhh, not exactly anywhere, because the implementation would be dependent on the application’s needs.

So to make a short story long @PekkaNikander the application author would be responsible for creating a CRDT algorithm (or importing a lib) that would get the semantics they’re looking for. Holochain has built-in support for updating and deleting an entry, and I think their CRDT-ish semantics are “delete wins, otherwise first write wins”. I suspect that, as an un-branch-able history, they’re too basic for most collaborative work and are subject to the problem you bring up. Instead, they seem most useful in cases when an agent wants to update/delete their own entries and therefore doesn’t have to worry about conflicts/coordination. I’m excited about seeing libraries that implement useful CRDTs for collaborative editing applications. So there has been lots of thought on how to implement this functionality on Holochain, but the only code written has been in support of this most basic use case.

Just to clarify: Having the full history is also the case with (state-based) CRDTs. At the semantic level, you never edit the history. When you “edit,” you insert to the history a time change that “deletes” or “modifies,” whatever that means at the application level.

Hence, adding (at least state-based) CRDT algorithms even on the top of the source chain should not be that outlandish, I presume.

What I was (probably foolishly) envisioning and talking about is a case where there is a CRDT-based app on the top of the source chain, and then someone goes offline for a longish time (hours, days) and wants to do app-level edits. With CRDTs, the edits end up as new entries in the history. Consequently, the data structure will be (for a while) more like a bush, or tangle, lattice, or a DAG if you wish. It won’t be a linear history. There will be events that have are semantically parallel.

The tricky part is when that someone comes back online. Then those app-level edits — i.e. the parallel history — need to be folded back to the rest of the history. That is where the CRDT “magic” helps. At the CRDT level you can always merge such “parallel” histories. By definition, the CRDT just guarantees you that. Otherwise it is not a CRDT.

The difficulty that I spoke about is at the app level. There, thinking of a git merge conflicts helps. When there are app-level conflicts on the top of a CRDT, you need to do the equivalent of such a merge. Sometimes you have to rely on humans on that.

But, in most cases you can define app-level semantics to take care of such merging. For example, you can compare a git merge, where a person has to do the resolution, to an editor that displays the conflicting edits as alternative versions of the text.