Is a replication-factor of 100 sane?

The-A-Man · February 25, 2021, 8:06am

Public entries are replicated to [x] number of peers. The peers (or the peer-addresses) are chosen as per a scheme that favors “closer” peers than “farther” ones (i.e., numerically closer to the hash of the entry in question; nothing to do with geographical proximity). However, pre-calculating the peers that your entry will fall under the responsibility of is a left-hand play! In fact, it’s something that the algorithms that power these distributed systems do all the time.

That said, for a sensitive entry (such as a claim on something valuable; assume that the claim is false, and we (the evil actor) wish to get it validated successfully), we can simply reformat our claim (or restructure it a bit, or append some gibberish numbers to it) so as to end up with an entry that hashes to a neighborhood that we control (assume we are a “guy-with-the-means” to pull something like this off). Or we may not restructure the entry and rather pre-calculate the addresses of the peers that the entry would fall under the responsibility of, only to bribe them (to install a binary on their device that would pretend to be talking the Holochain-language (i.e., speak the protocol) and pass the entry as “all-good; nothing of a suspect”). [Assuming that we the attacker can first locate the owner of those peer-addresses, something which on some future dark-meetup-forum on the dark-net might be the norm…]

That said, the only defense against such a vector is to make it harder for the attacker to pull it off.

One has two options:

Never do valuable claims on Holochain (and rather prefer Blockchain for such use-cases that need maximum consensus, which I’m very reluctant to)
Increase the replication-factor

From a distributed-databases background (Cassandra and the likes), the only replication-factor I feel comfortable configuring is 3. [Haha, I’m such a miser!]

But those attack-vectors just described are simply not possible in the gated/sealed/protected silos where Facebook (and the likes) keep their databases, thanks to centralized-control and encryption!

However, knowing that we do not have that luxury (namely, of centralized-control), we are forced to increase the replication-factor to as high as possible.

Imagine the Claim entries in total make up 1 TeraByte of data.
A replication-factor of 100 would mean 100 times more load on the users of the (h)app on average (roughly speaking).

Is that inescapably? Is that the price we have to pay to divorce from client/server (central control)?

The-A-Man · February 25, 2021, 8:21am

Hey! Can the replication-factor be set to be dynamic? Can it be done technically? @thedavidmeister?

Can I instruct (to the DNA or whatever) something like “The entry of type “Claim” must have, at any given time, a replication (thereby validation) factor of one-tenth of the total peers in the (h)app’s network”? Is there something that technically stops you from implementing such a behaviour? Would it lead to some internal chaos that I’m unaware of?

Because if it’s possible, then I guess we (Holochain) would be able to reasonably handle every use-case that currently would be better suited on Blockchain. For instance, a replication factor of “100%” would mean every node would own the entry and would validate it, and the apparatus as a whole would behave much like Blockchain… Though a replication-factor of 100% would definitely slow us down to the same levels as that of Blockchain… Still, I believe the end-developer should get to make that choice: the choice between speed and truth…

thedavidmeister · February 26, 2021, 7:35am

on global replication factor vs. agent centric arcs:

the maximum redundancy is that every agent holds all the data for a given app

already this is lower than blockchain which requires every agent to hold all the data for all apps

no agent has a perfect visibility into the network participants so it isn’t possible to efficiently enforce that a specific number of copies are held

simple example

say we could know that we have 100 copies of something on the network
now Alice turns off her machine to clean the fan and apply some updates
now 99 copies are visible on the network but Alice still has a very high latency copy of it (let’s give her 3 hours…)
ok, so we set a timeout and don’t give her the 3 hours, then we make a new copy so there is 100 ‘on the network’
when Alice boots again we see 101 copies, so we tell someone to delete their copy
this ‘someone’ is likely to be the same person we asked to add a copy as they are right on the cusp of the neighbourhood of 99/100, let’s call him Cusbert
now every time any of the 100 users times out, Cusbert needs to thrash copying and deleting the same data
now scale this up to every item of data on the network and many peers leaving and joining, at some point everyone is Cusbert for many items of data

the marginal benefit of 99 copies vs. 100 copies or 100 copies vs. 101 copies is arbitrary and negligible, but more copies = more chances for individuals to drop out and more co-ordination overhead

rather, we let each agent determine for themselves what system resources can be dedicated to each app (which, centralized or not, is always non-zero), then work backwards from this to achieve an ‘arc’ which translates roughly to a percentage of all data held locally, which is correlated to both storage and bandwidth - there is no exact algorithm because every app will have different bandwidth and storage properties (e.g. elemental chat does a lot of ‘push’ to get a snappier UI but an archival DHT will be mostly passive gossip) but the plan is to make the internal heuristics smarter over time to balance various overheads and tradeoffs to find that nice equilibrium between all agents

the ideal UX is that there are sane defaults set per-happ and across all happs, and users rarely review the settings, maybe when their device fills up, much like how mobile devices already work - maybe you need to delete or foist some photos onto the cloud once or twice a year

rather than the happ developer dictating from above ‘thou shalt hold 500 datas at all times’, the onus is on them to write applications that are efficient and make good use of system resources, then users will rarely feel the need to take or limit any action that might threaten the network health

grossly inefficient happs are naturally going to be less popular, and some amount of inefficiency will be acceptable in return for more valuable or unique functionality

if a user sees an app they use infrequently and get low value from using 10x more resources than their favourite app, they will probably just delete it or severely restrict the arc - one user doing this won’t affect a reasonably sized, healthy network, but if many do this then the network will fail, just like a torrent dies if everyone leaches and nobody seeds

it’s up to the happ developer to respect the user’s needs, not the user too contort themselves to fit the needs of the happ

note also that the happ developer can be a user themselves, they can choose to maintain ‘archival nodes’ on any network that they have visibility into and set their arc to 100% - this is similar to running a centralised server, so if they were willing to do that in the past they should be willing to run a node in the future

thedavidmeister · February 26, 2021, 8:10am

on more storage redundancy = more security = safely facilitating higher value transfers:

to make sure i understand the premise i will restate it to try and steel man it:

sybils can brute force arbitrary keys
so can position themselves arbitrarily close to any given entry on the DHT
so can brute force arbitrary validation
nobody else will run the validation so it will be accepted

And yes, in a purely theoretical sense this is true, but there are several mitigations.

sybils cannot produce arbitrary public keys.

sure they create arbitrary private keys, but working backwards from a desired public key to a private key is equivalent to breaking the underlying cryptography.

getting arbitrarily close to a given key is analogous to the difficulty factor in proof of work, brute force will only get you so far

inorganic networks are detectable and false positives are safe

every agent can measure and calculate the relative tightness of the agents clustered around any particular datum and compare it to the network norm

if sybils brute force themselves to 10, 100, 1000x etc. the organic distribution (which is cryptographically guaranteed to be evenly spread) of the rest of the network, we can detect that and be suspicious.

being suspicious means we could opt in to running validation ourselves to double check a high-value transaction against what the anomalous cluster is telling us, if we agree, no harm, no foul

an overly tight cluster that appears organically should heal itself through gossip over time

the tight clustering is a mathematical relationship between the public keys and entry hashes, so doesn’t require any particular data storage redundancy to detect

only one honest agent is needed to confound many sybils

for deterministic validation, as a recipient we can treat any discrepancy as highly suspect, we know that at least one node is compromised or in error, and we immediately look to warrant someone(s)

if the sybils just try to ‘brute force a little bit’ to evade the tight cluster anomaly detection (above) they don’t really achieve much, as they have to be closer to a given entry than all other network participants now and in the future

this is also the reason why trying to brute force an entry hash won’t achieve much, because that simply moves it to a different location within an evenly spread network of otherwise honest (or at least not colluding for this entry) agents

the network has membranes

the heuristics above rely on overall network size, it is true that a small network with very few honest participants is going to be easier to sybill based on a generic algorithm

however, small networks typically also have stronger membranes to entry

consider a chat room between friends, the humans behind the machines already know their friends and a sane GUI will show to the humans who is in the chat room

one problem with the server based approach is that clients of the server do not know who has access to the server, but the peers on a network who are trusting each other with validation do know who each other are, it’s not possible to say that we can see someone for the purpose of validation but we cannot make their presence known to the human using the happ

mechanical membranes like auth tokens, cryptographic challenges, etc. can be coded into smaller networks to prevent sybils

elemental chat works this way already, we have a social membrane protecting the relatively immature systems while we continue to build them out

every agent can opt-in to validation

if we wanted to create an opt-in validation for specific entries, such that it is performed on every get (then likely cached locally), we could do this

we aren’t doing this because it’s quite a big hammer (would slow things down too much to be default behaviour at least), but it could be done if there was a real need for it

note that you can simulate this yourself as a happ dev by running the validation callbacks against data you get (because validation callbacks are regular functions) explicitly after you get it - you’ll just need to do the legwork for any dependencies

i wouldn’t recommend this normally, but it is an option for the paranoid

recursively tight clusterings are non-linearly suspicious

ok, so let’s assume you manage to pull off a single tight cluster around an entry without detection

in the case of typical value transferal systems (e.g. a ledger) your transaction comes at the end of a long line of transactions back to the genesis of the system

the suspiciousness of 2, or 3, or N tight clusters that depend on each other is exponentially more suspicious than a single cluster on its own, this is a way bigger red flag than two unrelated tight clusters, if an agent ever sees chained clusters it should probably immediately start trying to warrant someone(s)

agents don’t need to ask the absolute closest agents to the data they want

agents only need to ask someone close enough to the data that they can function as an authority, we don’t have to try and boot out authorities to get ‘the most authoritative authority based on closeness’

network partitions mean there can never be a perfect list of who is the exact authority for something globally anyway

some looseness here allows for some natural randomness that helps flush out sybils, and we only need to hit one honest participant

a sybil can squeeze themselves arbitrarily close to some entry but it doesn’t actually guarantee that 100% of all network traffic will be routed to them for that entry

humans are part of the equation

ok, so you’re doing a high value transfer… to who??

scammers are gonna scam

the network can apply algorithms to detect people breaking validation rules to some degree of confidence (even proof of work is statistical) but at some point effort is better spent to help humans contextualise their transactions

ironically and sadly, the irreversible nature of blockchain makes it a natural breeding ground for scams

not that i want the ability to rollback blocks, but making a transaction 0.0001% more ‘final’ has diminishing returns vs. making a human 10% less likely to send money to paypel.com instead of paypal.com

there are other heuristics that can detect sybils

beyond mathematical clustering of keys, a smoking gun for a sybil is that they tend to only participate in a network to the absolute minimum they can get away with in order to be a sybil

what this means is going to be app specific, but for example:

being wary of authorities with relatively little activity of their own
being wary of agents many degrees of separation away from the agents you usually interact with (web of trust)
being wary of agents with very small arcs (low effort resource-wise)
being wary of a cluster of agents with similar keys all created in a similar timeframe or joining the network at a similar time when there was not a spike in new users with different keys

The-A-Man · February 26, 2021, 9:48am

I don’t know why, but I’m very reluctant to do anything even remotely close to “membranes”… Anyway,

Yuck!

Yeah… Exactly.

Made my day!

The-A-Man · February 26, 2021, 9:52am

Yeah, that’s what I was missing. Obviously, how could I miss that! I should have foreseen that… Anyway, thanks a great deal for pointing that out. Plus I confused the degree of validation with the degree of replication, which your explanation really did help in distinguishing between.

Yeah…

Yup!

guillemcordoba · February 26, 2021, 10:02am

I wanna thank you @thedavidmeister for this wonderful explanation and for all the work and effort that you put in.

Also, a much more superficial factor that maybe @The-A-Man knows about or not, is the fact that for each action in the source chain there are multiple DHT-Ops going around to different neighborhoods. See here: https://holochain-gym.github.io/developers/intermediate/dht/. So, if a sybil wants to break things, they need to start controlling maybe 3 complete neighborhoods from now on and into the future, which just adds to the reasons that David was exposing.

^ this made my day

The-A-Man · February 26, 2021, 10:04am

Wow! I didn’t know you’d updated the gym! Last time I was there, there wasn’t much. Thanks really for taking me there. Would check everything out immediately!

The-A-Man · February 26, 2021, 10:06am

Are you talking about something like:

Plus I didn’t know about the following too:

guillemcordoba · February 26, 2021, 10:11am

More like this:

You can see the different messages sent to different neighborhoods, and then the yellow highlighted nodes indicate where each piece of data is stored. In here the “redundancy factor” is 3 nodes.

Sure, the workflow diagram is present here, you can see the appvalidation processes running in the nodes before they start holding the entries. You can activate more workflows in the visible workflows dropdown.

The-A-Man · February 26, 2021, 10:13am

Thanks really! Gonna check your videos ASAP on how to operate that page myself (I’m so dumb…)! Your work is absolutely cool!

thedavidmeister · February 26, 2021, 10:53am

well let’s call it ‘defence in depth’ then?

i suspect you’re getting tripped up by the name ‘membrane’ rather than the idea of adopting common security tactics for systems/data like:

requiring a password or key
maintaining allow/block lists of known good/bad actors
requiring an existing participant to vouch for new participants
techniques like captcha to frustrate bots
requiring a specific interaction or cooldown period after joining to post
limiting network access
etc. etc.

for mere mortals we cannot produce perfect systems from whole cloth, even bitcoin and ethereum have experienced and weathered their fair share of attacks only due to the ongoing efforts of humans to harden them and the limited nature of the attacks at the time relative to the maturity of the network.

if you take the sophistication of actors in the bitcoin/ethereum network as it stands right now and throw them at the early alpha/beta versions to attack it, those old clients, nodes and miners would be crushed. We can observe what forks have experienced including double spends, network spam, hash wars that cripple block production, etc…

the intent behind the concept of ‘membrane’ is to give ourselves permissions to have conversations that are normal in the application development space, but seem to be difficult to have in the p2p space where ‘everything MUST be public and open all the time’.

well what if i want to write a P2P app that nobody but myself and close friends can use??

if that is what i want then i will need a membrane.

but i think the sticking point with “membrane” is that what people hear is:

‘i don’t have a solution to some hard problem so i’m going to hand wave your concern away with a vague biology reference’

then the same people will happily put cloudflare in front of their website to minimise bots, or a captcha on their blog, or put pants on before they go to the supermarket…

i guess what i will say then, is that damage to any system of any attack is as much a matter of degree as it is about the nature of the attack

there are plenty of attacks that are merely inconvenient or easy to detect/mitigate if they are of small degree even if their nature is quite malicious and would be crippling if done on a large scale, so a sensible membrane helps us put an upper bound on both the risk (probability) and impact (degree) of an attack hopefully without inconveniencing legit users too much

then, in addition to the membrane we take steps to address the problem directly in an application specific way knowing that we’ve set ourselves up for success by putting rough constraints on the scale of the problem

The-A-Man · February 26, 2021, 10:56am

Doesn’t scale.

Welcome GPT 3 (AI)!

Does nothing at all.

thedavidmeister · February 26, 2021, 10:56am

see this little bit of text that says ‘official team member’

that is a membrane

now you know that i’m not an impersonator trying to trick you with false information

The-A-Man · February 26, 2021, 10:57am

I like this one… Similar to Ripple-style!

Yeah, that’s the bitter truth. And it’s hard to digest…

thedavidmeister · February 26, 2021, 10:59am

vertically

thedavidmeister · February 26, 2021, 11:00am

the problem is that you’re taking a suggested solution to a specific problem and trying to invalidate it by showing how it doesn’t solve a different problem in a different context

i’m not in the game of one-size-fits-all solutions

The-A-Man · February 26, 2021, 11:03am

Oh, of course, then you’re lucky! But what if I want to design a solution for the HUMANITY, not my close friends-circle?

thedavidmeister · February 26, 2021, 11:05am

then you need to be more conservative in terms of what functionality you can offer and your engineering will be more difficult

thedavidmeister · February 26, 2021, 2:54pm

the thing is that ‘close friends circle’ already did scale to about 7 billion people independently across all nations and cultures, and across all human history, and is a core social ‘membrane’ that allows you to survive and have a good, long life

your friendship group will never scale, yet most people have at least one or two other humans they can call ‘friend’ and we’re somehow all connected within about 6 hops

it’s not ‘lucky’ to have friends, and it’s not a trivial use case, it’s a basic survival skill for every human, so expressing ‘friend’ in our digital life (without big zucker in the middle, or having it sound more like ‘comrade’) is an important thing to be able to do

most successful consumer applications focus around humans collaborating in small groups, ostensibly to the exclusion of everyone else, but in reality the companies that create the tech tend to abuse their role as facilitator

to be able to facilitate small groups without central servers, the conversation does needs to include appropriate ‘membranes’ to define ‘in the group’ and ‘not in the group’

even proof of work is a membrane, ‘the group’ of ‘miners’ is anyone who can prove they did work, and you can implement PoW as a membrane in holochain if you want, as documented in the HDK

you don’t need to include membranes, you can still build systems that are open, but try not to talk down as though doing so is somehow ‘bad’ for other people to do, or there is some idealistic reason to avoid them