About the Technical Watercooler category

thedavidmeister · August 11, 2020, 12:28pm

Totally open, general discussion—just like the watercooler—but for the kind of technical talk that tends to confuse or alienate people who aren’t familiar with it.

There are no specific rules around what that means—use your best judgement.

How exactly is this different than the other categories we already have?
What should topics in this category generally contain?
Do we need this category? Can we merge with another category, or subcategory?

pqcdev · December 3, 2020, 1:24pm

I feel at home in this category. There are many technical aspects I wish to discuss, especially pertaining to the TODO sections in the whitepaper.

perhaps I should DM you?

Im happy to facilitate helpful discussion for the in-depth techies. My knowledge base is in game theory, economics, probability and statistics, cryptography, distributed networks, plus a little bit of programming (I will become proficient in Rust very soon)

thedavidmeister · December 3, 2020, 7:11pm

sure, this whole section of the forum is for discussion

this thread is just a pinned placeholder/message

pqcdev · December 10, 2020, 10:09pm

Section 3.7 of the whitepaper, what is the gossip function? I am very curious to understand more about how the gossip protocol is going to work. Thanks

thedavidmeister · December 11, 2020, 7:34am

there are authorities who are responsible for holding and validating data based on a comparison of their pubkey and the data’s hash

when a ‘get’ is performed the agent randomly queries several other agents it believes are an authority in parallel, either some data comes back or the queried agent responds with an agent they believe is a better authority, when the getter receives ‘enough’ responses that match each other it accepts them, any difference in the responses triggers an investigation into why

that’s for directly getting some data by its hash

for open sets of things, e.g. links, information about the links needs to be sent to the authority of the link base as well as the authority for the link itself - that way when an agent runs ‘get links’ against the base, those authorities have already been sent information about links that reference the base, the agent can combine and cross-reference the responses (unreliable network means authorities can honestly have different opinions on the set of links against a base at a point in time) from multiple link authorities, then call get in parallel from there to find information about the link targets from the target authorities, as above

CRUD information is similar to links in that there is an open set of updates etc. that may be propagating around the network so we need to gossip both to the authority of the revision and to any ‘base’ references of the revision

pqcdev · December 11, 2020, 8:38am

Thanks for the info David, that was a really helpful answer

How are authorities determined? is there an algo?

Are you familiar with the witness concept in DAG theory? Kind of like how the Hashgraph guy does it. If so, what is your opinion? Does this work similar? I see merit in these principles but anticipate many security vulnerabilities as their model stands, but I could imagine relevant use-cases in the Holo system where membranes are smaller and can have delegated trusted authorities or even supernodes, akin to dPoS without the stake part (maybe the ‘stake’ is reputation in a system where Sybil is impossible due to biometric…)

Do you think there is value in randomly selecting an authority from all available candidates? Or perhaps nodes are assigned some type of a value metric? –
Im trying to understand, are you talking weighted authority? Pseudo-random selection?

Again, many thanks!

thedavidmeister · December 11, 2020, 9:29am

yes, i oversimplified a little bit, the main goals of witnesses in a DAG as i understand:

overlay a credibly neutral and reliable total ordering on top of partially ordered data (i.e. the DAG)
prevent attackers from hiding data then releasing it later, potentially forcing the DAG to re-order after the fact, leading to e.g. double spends, by forcing data through the hashing done by the public witnesses before it ‘counts’, as hashing de factro requires the data to exist and be known

in holochain every agent also has their own authorities for the headers of their source chain, the ‘agent authorities’

the agent authorities act as witnesses for the agent’s source chain, but they only hold the headers and validate the ordering and linearity of the chain, not all the data

if an agent cares about something sensitive to ordering and forking, e.g. a ledger, they can query the agent authorities to ensure that the head of the chain they are being presented by the counterparty matches the network’s opinion of the head of that chain (witnessed)

most of the issues i know of re: witnesses in a DAG come back to that ‘credibly neutral’ requirement - which is where we start talking about PoW or PoS etc. and there are hybrid models that mix blockchain and DAG for this reason, so that the blockchain witnesses the DAG

so yes, there is an algorithm that determines the ‘distance’ between data and agents, and it’s ‘random’ in that it is based on comparing hashes and pubkeys

the subtlety in ‘randomly selecting an authority’ is who is doing the selecting? for example, there can always be an eclipse attack on a DHT where absolutely everyone you ever interact with lies about the state of things and you’ll never achieve any further security from that point (that problem also applies to bitcoin and ethereum etc.)

what we want to do is setup a situation where ‘the good guys’ have a huge advantage over ‘the bad guys’ - e.g. as long as an agent can reach at least one honest authority they can detect bad behaviour and resolve conflicts (e.g. doing their own validation) from potentially many bad authorities

the first thing there is that validation should be as deterministic as possible, so a true/false is reliably available from the input data without any further network interactions for a given validation - this allows all agents to decide for themselves who they think is honest once they have some data (where the integrity of the data is provided by its hash)

even then we need to know who to talk to about some item of data, and don’t want to simply trust the author

in holochain each agent opts in to some percentage, e.g. 10% of the data on the network which is their ‘arc’ - this means that they will validate and hold the ‘closest’ 10% of data to themselves, and also let everybody else on the network know their arc at the same time as they broadcast their network location to allow incoming connections

so if alice broadcasts ‘10% arc’ to bob, then bob can calculate whether he should expect that some data that hashes to within her arc to be held by alice, if so then she is on bob’s list of agents that bob can get data from - alice has no control over which data falls in her arc, only the size of the arc, her pubkey vs. the data hash determines what is in her arc, an honest alice has no choice but to validate and hold everything assigned to her and respond to queries about it

pubkeys are randomly spread across the space of all hashes by their nature, so a healthy network will have many arcs ready to receive any new hashes for new data, organically spreading data between peers according to their ability to handle it (similar to seeders and leechers in bittorrent)

we’re expecting that small DHTs start with everyone at 100% arc and their arcs start to shrink as the DHT gets more heavily populated and more data (agents set their arc per-DHT)

bob chooses for himself a random set of N agents that he believes should have the data he is looking for, based on the agents he is aware of and the data he is looking for, he queries them all in parallel, each will either respond with the data or better candidates for bob to query (thus giving bob a better view on who is in the network) - once bob receives M responses (where M could be a bit less than N) he can move forward in cross-referencing and using them - bigger M and N is more conservative/defensive but implies more network overhead, that’s also a tradeoff that can be made per-app

pqcdev · December 11, 2020, 9:55am

Very interesting. It looks good. J/c did you design this or who?

Will hosting capacity give nodes advantage? What determines size of arc? just to clarify
Is is split evenly? 5 nodes equal 20% arc each?

I’m still trying to understand how authority is established.

More nodes, as long as theyre honest + stay online, should strengthen the whole system. Yes? Is there a number where it’s more ideal to divide the cells? Should things be breaking off into specialize functions eventually, is that the plan?
I asked in another post, but I’ll ask again here, should we all build inside the main membrane first and then transition later? Any info on interoperability would also be appreciated.

Thanks a bunch for entertaining me lol

thedavidmeister · December 11, 2020, 10:17am

i didn’t design it no, but i’ve been working with the team to refine and implement it in the current ‘rsm’ conductor

each agent chooses their own arc per DHT, it’s just like bittorrent seeding, more arc = more storage, cpu and bandwidth used, if nobody seeds then the DHT dies

expectation is that many apps will have such modest requirements (e.g. moves in a chess game) that an arc that represents a very high redundancy can be maintained by pretty much everyone

apps with heavy requirements (e.g. docker box hosting) might require some additional incentives layer (e.g. holofuel for holo hosting) to subsidise network health

if you’re holding data you’ve validated you can serve it out, and you basically advertise that fact so other’s know they can come to you if they want, then everyone selects for themselves who they want to fetch data from based on those advertisements

the network is healthy as long as each participant has a reasonable expectation they can find validated data when they need it, and that a dishonest response can be detected by cross-referencing it in real-time against readily available honest responses

once you have a healthy network, the validation rules themselves become more important than trying to pile on more and more redundant storage/validation

you can’t just ‘build inside the main membrane’ because different validation rules = different network, every app has a new DHT

but you can start with a single simple DHT and split into several DHTs as your needs become complex, and as a developer you can pre-seed a baby DHT with some high availability 100% arc nodes to help users access valid data while the DHT matures and decentralises itself (like seeding a low-ratio torrent file)

pqcdev · December 11, 2020, 10:21am

Gotcha. This is cool! Keep up the good work y’all

When will the RSM be ready for Mac developers? Its Linux only right now, correct?

thedavidmeister · December 11, 2020, 10:24am

it probably works on mac even if that isn’t ‘official’

if not just run it in a docker box, you should be fine

pqcdev · December 25, 2020, 1:18pm

hey my bad I just realized I screwed up your placeholder, can you please make a topic out of this?

thedavidmeister · December 28, 2020, 7:11pm

not sure how to do that, maybe @pauldaoust can?

pauldaoust · January 6, 2021, 10:19pm

@pcqdev looks like it’s still a pinned topic; you may have just unpinned it for yourself? unless I misunderstand what you’re worried about

Love this conversation. I think the thing that took me a while to understand when I was a Holochain greenhorn, and then articulate when I started working with the team, is that there are two pillars to data integrity in Holochain:

Intrinsic validity (unbrokenness of hash chains, data follows validation rules) – this is deterministic
Peer witnessing (authorities for source chains as well as individual bits of data) – this is probabilistic

The first pillar answers the question, “Given all the data I know about, does this piece of data look correct?” And this is a question that anyone with an identical data set should be able to deterministically answer yes or no to, without help from anyone else.

But it’s not enough, because of the “given all the data I know about” bit. That’s where peer witnessing comes in: it answers the question, “is all the data I know about actually all the data there is?” It can’t answer that question deterministically – the universe is an open world of possibilities – but, given enough samplings, you can approach 100% confidence.