Conditionally adding data to your node

rlkel0 · December 7, 2019, 9:35pm

I see a lot of information about the DHT, but I’m wondering if there’s a pattern to conditionally add something to your node to prevent sharing content you don’t want to. So, you post a blog, I like your blog, I opt to share your blog from my node.

Does this pattern exist?

pauldaoust · December 9, 2019, 8:23pm

Good question! Nothing very solid right now, but I do have a few ideas. To make sure I understand you correctly, you’re talking about choosing what pieces of the DHT to replicate and make available to others; is that right? And I’m guessing the subtext is that maybe I don’t want to hold your 2GB vacation video just because I’m the DHT node that got elected to store it, or maybe your stuff is illegal or against my ethics.

A few thoughts that come to mind:

Anyone who asks for a DHT entry is actually holding it in their shard, but because it’s outside of their advertised ‘indexing arc’ (the range of addresses they’ve committed to be responsible for) it’s not as easy for other nodes to discover that they hold it. Perhaps they could advertise by linking from the content to their own agent ID, and committing to not clear out their cache for n days (where n is in the link’s ‘tag’).
The author publishes the blog as a private entry, adds a ‘stub’ to the DHT (this might be as simple as a hash, or as big as the title and intro paragraph). Anyone who wants the article can get it from the author by direct message. Once they get it, they re-publish it as a private entry on their own chain and advertise that they’re sharing it by linking from the stub to their own agent ID. Cons: it’s not great for discoverability, you can’t un-share an entry, and the author has to stay online for longer, ‘seeding’ their content.
Plausible deniability: encrypt all public data so nobody knowingly shares evil stuff. I don’t think this is even slightly reassuring to people with ethical qualms though.
Spin up mini-DHTs for each piece of content, with a high (100%) redundancy factor. To read or share the content, all you do is join the DHT. This is the same as the Throwaway DHT pattern, but serves a different purpose. Pros: you can seed and un-seed an article at will, and discovery of other seeders requires no extra logic beyond what Holochain provides. Cons: each new DNA instance is a performance burden on your machine, and the author still has to stay online for a while to do the initial seeding.

None of these ideas are particularly satisfying to me as a one-size-fits all solution, but I feel like there are the seeds of some good ideas in there. Would love to see what these ideas inspire.

I would love to know if you’ve been chewing on any ideas of your own @rlkel0; this is unexplored territory!

rlkel0 · December 9, 2019, 10:18pm

It sounds like we’re on the same page.

I think the last example you gave makes the most senses, but I’m not very familiar with interoperability. One of the features of Holochain I think would be useful on a lot of community based projects is the ability to vote with your node. For example, if I wanted to make a decentralized package manager for python, I could make the packages with their hashes members of the DHT signed by the authors, but then each node would only distribute the packages they themselves care about. So when a malicious package comes around, it would only be distributed by nodes that opted in to sharing. So nodes are actually increasing availability of content by choosing what content they like.

The vacation video example is also really great, it’s like drawing the short straw.

I think it makes sense to create something like the throwaway DHT pattern, except it’s more like, the main chain would store hashes of the content and some identifier for the side chains, and users would choose which of these side chains to opt into. Do you have a Minimal example of building something with two interoperable chains I could look at so I could dig into this a bit further?

pauldaoust · December 10, 2019, 7:52pm

I’m sorry, I don’t have any minimal example, but the main DHT + side DHTs is exactly what I’m picturing Here are a few resources that might help:

Each of the side DHTs’ DNAs can be templated from a single DNA. You ‘fork’ the DNA by passing in either a new UUID or some ‘properties’ (arbitrary key/value pairs) that get mixed into the DNA’s properties. I would imagine that the property would be something like "content_being_tracked": "<hash_of_content>". That way, other people can reconstruct the same forked DNA by passing in the same property. They would get that property’s value from your hash-based content identifier in the main DHT.
The user’s UI has to take responsibility for forking and instantiating these side DNAs so they can create or join the DHT (depending on whether they’re author or consumer), and then the UI can talk to the main DNA to register a DHT. Check out the admin API function reference, which sorely needs better documentation than a comments block in a source file
1. admin/dna/install_from_file to fork the existing blob storage DNA for a particular blob
2. admin/instance/add to create or join that blob’s network
3. admin/interface/add_instance to add the newly created instance to the WebSocket RPC interface that the main DNA is currently listening on
4. admin/instance/start to fire up the instance and become part of the network (you might want to get clever and start/stop instances based on whether the user actually wants the blob in question, but that harms overall resilience/seeding of the blob, and may have a startup performance hit as the instance connects to other nodes in the DHT).

rlkel0 · December 10, 2019, 9:38pm

this is interesting. So I read through this, it looks like I was misunderstanding how a node works. Holochain appears to encourage a user to run a single node with a bunch of different instances. And an instance is an agent with DNA? So if I was using an app, it could have a bunch of different DNAs and my agent could join and leave them as it sees fit?

So it seems like there isn’t interoperability, but the conductor facilitates having a variety of instances and it’s the job of the application to manage the integration with the conductor?

Does that make sense?

rlkel0 · December 11, 2019, 3:16am

Just in case anyone is lurking, I found this helpful https://developer.holochain.org/docs/glossary/#agent

So from reading this:

Conductor¶

The service that lives on a user’s device and hosts all of their DNA instances, stores their data, and handles network communication between their instances and other users’ instances.

So I misunderstood the role of the conductor and DNA. I think this was the missing link. A holochain DAPP could potentially consist of many DNA strands, each with their own unique architecture, and a single DNA that unifies them. Very cool, I think this makes a lot more sense to me.

pauldaoust · December 11, 2019, 4:18am

Yup, you’ve got it nailed down this is a foundational piece of knowledge for understanding Holochain, and so far we haven’t done a great job of explaining it. So I’m glad you discovered it.

One thing I will say, though, is that there is built-in interoperability: running DNA instances can talk directly to each other without the mediation of the UI. We call it ‘bridging’, where we set up an explicit communication channel whereby one DNA instance can call another instance’s ‘zome functions’ (API).

The reason I described the UI handling a lot of the cross-DNA communication in this scenario is that the UI is allowed to install new DNAs, instantiate/start them, and create bridges between them, whereas DNAs are not. Once those DNAs are instantiated and bridged, though, they can talk directly to each other. (Right now you can only define a one-way bridge to avoid circular dependencies, though we aren’t necessarily going to keep this as a hard requirement.)

So if both the UI and the conductor can both facilitate cross-talk between an agent’s running DNA instances, why choose one over the other?

Doing it through the UI is more flexible; you don’t have to go through the rigamarole of asking the conductor to set up bridges.
Doing it via bridging is more trustable; DNAs that talk directly to each other through the conductor’s internal plumbing can trust that they’re talking to a real DNA instance and can depend on the truthfulness of its responses.

rlkel0 · December 15, 2019, 9:42pm

I just figured out how the conductor works, so much easier than I thought. I wrote a python wrapper for it to make my life a little easier. It took me a while to crack the code on jsonrpc_core, but it’s surprisingly simple to add and remove zomes.

I guess the next questions are:

is there any built in auth?
is there a limit to the amount of DNA instances that conductor can handle? I will test this but I’m curious what you’ve found.

pauldaoust · December 17, 2019, 7:18pm

Depends on what you mean by auth:
1. System authentication (login) isn’t needed; you self-authenticate by proving that you possess your private key whose public key is in your agent ID entry (the second entry in your chain for a given DNA).
2. Human authentication (who am I, and can I prove it?) will be different for every app. Nothing built-in, although there is an Identity Manager DNA that lets users store their own personal information and supply them to other apps.
3. Authorisation (am I allowed to do this thing?) Two things:
  1. Joining a network and accessing public data: The second entry in your source chain for a given app is your agent ID — it contains your public key and potentially other stuff (e.g., an invite code). As an entry, it has a validation function; if that function fails, you’re denied membership. Not all of the plumbing is there (yet): there’s no way to pass data into the agent ID entry at startup time. So all you can do right now is validate based on the agent’s public key — things like “is this key in the whitelist?”
  2. Writing, editing, and deleting public data: Every type of entry has a validation function; you can use it to decide whether someone’s allowed to perform an action. No built-in support for easier-to-use primitives like roles or ACLs, but I hope to build some drop-in libraries that build on validation functions.
Not sure what the limits are yet I know the core team has been doing a pile of testing on this — both how many instances can a conductor handle, and how many nodes can a network handle — but I don’t know the concrete results. And they’ll keep changing cuz we’re starting to focus on optimisation in a big way.

rlkel0 · December 17, 2019, 7:48pm

thanks for the detailed response!

so one thing you didn’t cover, the json rpc client. It feels insecure that I can add and remove instances without providing any kind of authentication. I would imagine I would need to put a server in front of it so people could query my node but wouldn’t be able to make changes remotely.

pauldaoust · December 17, 2019, 8:55pm

heh, yeah. I would never expose the JSON-RPC socket on an ethernet interface! The owner of the node should be the only one making calls of any sort, whether they’re to a running DNA’s API or to the conductor API. The only time someone else should be able to make function calls on your DNA instance is if they’re part of the DHT themselves and are contacting you through node-to-node messaging. At that point you can check their public key and deny their request (they’re required to supply their public key whenever they send you a message, and your receive callback can access it).

There are some gray areas here — for instance, as a HoloPort owner I can load up a UI over my local network, which does make zome calls — but then, I’m the owner of the HoloPort, my laptop, and the network plumbing. Another example is some theoretical P2P weather station app; I might want to expose my DNAs’ APIs (but not the admin API) to all the temp and wind sensors in my yard. Again over the localnet only, and probably with some checks — putting a server in front of it that checks those sensors’ credentials would be a great idea.

One thing that does concern me, though, is malicious UIs on the user’s own device. How does the user grant/deny access similar to app permission dialogues in iOS, Android, and OAuth? I don’t know that we’ve explored that fully, and to me it’s worth serious consideration.

pqcdev · December 20, 2020, 9:58am

I think I know what you mean, but can you elaborate on this please?

I’ve thought alot about what if someone clicks a link and downloads malware. Not sure if this is related. I feel like this question describes a challenge that any app or network faces?Correct? Am I wrong to think this is something that biometrics ie fingerprint login (and/or) 2FA etc can solve

Holo will need to run off a user’s browser? Yes? I assume there are no plans to build a mobile ‘Holo App’ integrated UI? Would something like that be impossible bc of the intricacies?

Maybe Im not understanding fully this concern and also probably not understanding how a user will access Holo from their device.

pauldaoust · December 22, 2020, 5:29pm

Way back when I wrote that, the conductor had no way of authenticating any client (e.g., locally running UI) who made calls to a zome. Now at least there’s a plan: capability grants can restrict who can make that call. The conductor doesn’t enforce this yet – it treats all client calls over the local WebSocket RPC interface as authenticated – but soon it will be enforced. Here are the three different kinds of capability grants your user can protect their zome functions with:

Unrestricted: anyone from anywhere can call it – local clients on your device, other DNA instances (cells), or other nodes on the same network.
Transferrable: only those who can produce the right capability secret are allowed to call the function. This is similar to OAuth2 bearer tokens.
Assigned: callers must produce the right capability secret and prove possession of the right private key, as proven by their signature on the function call.

As an app dev it’s your job to design all the ways that a user can grant/deny access to their functions. As yet I don’t know how a UI might ask Holochain to have its public key registered as a valid client, but I suspect that the conductor will pop up a little dialog saying “Did you approve this UI?”

Anyhow, I think that a combination of good DNA design and assigned grants can protect against the malware attack vector. You’re right that any app or network faces this challenge, and the level of damage an attacker can do is relative to the user’s privilege in the app.

I feel like there’s a slight extra risk with Holochain, in that the attacker gains full access to all the shared DHT data, including IP addresses of other DHT members. This is comparable to a system-level attack for a centralised service. This will be mitigated somewhat by the fact that all DHT and source chain data will soon be encrypted at-rest so the attacker would have to intercept the user’s keystore-unlocking password in order to actually get at that data.

Now, re: your other question:

For now, yes, but there are a number of devs actively building Holochain apps who will want to also build native mobile apps that use Holo (at least until we can manage to compile Holochain for mobile and figure out the delicacies of battery usage for an app that expects itself to be always connected to the DHT). So I believe/hope that Holo can be integrated into, say, a Flutter or React Native app, and that we’ll be building the plumbing to make that happen.

pqcdev · December 25, 2020, 6:05am

Awesome man, super helpful answer. I think biometrics will be important. Is there a way for DNA to require fingerprint login? Guess I just know nothing about how that would work on the traditional API end. Would be instrumental to one the dapps I’m building if I can reduce Sybil to near 0.

Anything built for an admin approval / KYC process? ie link source chain to person’s ID. Require verification for login, etc.

What libraries would people like to see right now?