Holochain Forum

Conditionally adding data to your node

I see a lot of information about the DHT, but I’m wondering if there’s a pattern to conditionally add something to your node to prevent sharing content you don’t want to. So, you post a blog, I like your blog, I opt to share your blog from my node.

Does this pattern exist?

Good question! Nothing very solid right now, but I do have a few ideas. To make sure I understand you correctly, you’re talking about choosing what pieces of the DHT to replicate and make available to others; is that right? And I’m guessing the subtext is that maybe I don’t want to hold your 2GB vacation video just because I’m the DHT node that got elected to store it, or maybe your stuff is illegal or against my ethics.

A few thoughts that come to mind:

  • Anyone who asks for a DHT entry is actually holding it in their shard, but because it’s outside of their advertised ‘indexing arc’ (the range of addresses they’ve committed to be responsible for) it’s not as easy for other nodes to discover that they hold it. Perhaps they could advertise by linking from the content to their own agent ID, and committing to not clear out their cache for n days (where n is in the link’s ‘tag’).
  • The author publishes the blog as a private entry, adds a ‘stub’ to the DHT (this might be as simple as a hash, or as big as the title and intro paragraph). Anyone who wants the article can get it from the author by direct message. Once they get it, they re-publish it as a private entry on their own chain and advertise that they’re sharing it by linking from the stub to their own agent ID. Cons: it’s not great for discoverability, you can’t un-share an entry, and the author has to stay online for longer, ‘seeding’ their content.
  • Plausible deniability: encrypt all public data so nobody knowingly shares evil stuff. I don’t think this is even slightly reassuring to people with ethical qualms though.
  • Spin up mini-DHTs for each piece of content, with a high (100%) redundancy factor. To read or share the content, all you do is join the DHT. This is the same as the Throwaway DHT pattern, but serves a different purpose. Pros: you can seed and un-seed an article at will, and discovery of other seeders requires no extra logic beyond what Holochain provides. Cons: each new DNA instance is a performance burden on your machine, and the author still has to stay online for a while to do the initial seeding.

None of these ideas are particularly satisfying to me as a one-size-fits all solution, but I feel like there are the seeds of some good ideas in there. Would love to see what these ideas inspire.

I would love to know if you’ve been chewing on any ideas of your own @rlkel0; this is unexplored territory!

It sounds like we’re on the same page.

I think the last example you gave makes the most senses, but I’m not very familiar with interoperability. One of the features of Holochain I think would be useful on a lot of community based projects is the ability to vote with your node. For example, if I wanted to make a decentralized package manager for python, I could make the packages with their hashes members of the DHT signed by the authors, but then each node would only distribute the packages they themselves care about. So when a malicious package comes around, it would only be distributed by nodes that opted in to sharing. So nodes are actually increasing availability of content by choosing what content they like.

The vacation video example is also really great, it’s like drawing the short straw.

I think it makes sense to create something like the throwaway DHT pattern, except it’s more like, the main chain would store hashes of the content and some identifier for the side chains, and users would choose which of these side chains to opt into. Do you have a Minimal example of building something with two interoperable chains I could look at so I could dig into this a bit further?

I’m sorry, I don’t have any minimal example, but the main DHT + side DHTs is exactly what I’m picturing :+1: Here are a few resources that might help:

  • Each of the side DHTs’ DNAs can be templated from a single DNA. You ‘fork’ the DNA by passing in either a new UUID or some ‘properties’ (arbitrary key/value pairs) that get mixed into the DNA’s properties. I would imagine that the property would be something like "content_being_tracked": "<hash_of_content>". That way, other people can reconstruct the same forked DNA by passing in the same property. They would get that property’s value from your hash-based content identifier in the main DHT.
  • The user’s UI has to take responsibility for forking and instantiating these side DNAs so they can create or join the DHT (depending on whether they’re author or consumer), and then the UI can talk to the main DNA to register a DHT. Check out the admin API function reference, which sorely needs better documentation than a comments block in a source file :cry:
    1. admin/dna/install_from_file to fork the existing blob storage DNA for a particular blob
    2. admin/instance/add to create or join that blob’s network
    3. admin/interface/add_instance to add the newly created instance to the WebSocket RPC interface that the main DNA is currently listening on
    4. admin/instance/start to fire up the instance and become part of the network (you might want to get clever and start/stop instances based on whether the user actually wants the blob in question, but that harms overall resilience/seeding of the blob, and may have a startup performance hit as the instance connects to other nodes in the DHT).

this is interesting. So I read through this, it looks like I was misunderstanding how a node works. Holochain appears to encourage a user to run a single node with a bunch of different instances. And an instance is an agent with DNA? So if I was using an app, it could have a bunch of different DNAs and my agent could join and leave them as it sees fit?

So it seems like there isn’t interoperability, but the conductor facilitates having a variety of instances and it’s the job of the application to manage the integration with the conductor?

Does that make sense?

Just in case anyone is lurking, I found this helpful https://developer.holochain.org/docs/glossary/#agent

So from reading this:

Conductor

The service that lives on a user’s device and hosts all of their DNA instances, stores their data, and handles network communication between their instances and other users’ instances.

So I misunderstood the role of the conductor and DNA. I think this was the missing link. A holochain DAPP could potentially consist of many DNA strands, each with their own unique architecture, and a single DNA that unifies them. Very cool, I think this makes a lot more sense to me.

Yup, you’ve got it nailed down :clap: this is a foundational piece of knowledge for understanding Holochain, and so far we haven’t done a great job of explaining it. So I’m glad you discovered it.

One thing I will say, though, is that there is built-in interoperability: running DNA instances can talk directly to each other without the mediation of the UI. We call it ‘bridging’, where we set up an explicit communication channel whereby one DNA instance can call another instance’s ‘zome functions’ (API).

The reason I described the UI handling a lot of the cross-DNA communication in this scenario is that the UI is allowed to install new DNAs, instantiate/start them, and create bridges between them, whereas DNAs are not. Once those DNAs are instantiated and bridged, though, they can talk directly to each other. (Right now you can only define a one-way bridge to avoid circular dependencies, though we aren’t necessarily going to keep this as a hard requirement.)

So if both the UI and the conductor can both facilitate cross-talk between an agent’s running DNA instances, why choose one over the other?

  • Doing it through the UI is more flexible; you don’t have to go through the rigamarole of asking the conductor to set up bridges.
  • Doing it via bridging is more trustable; DNAs that talk directly to each other through the conductor’s internal plumbing can trust that they’re talking to a real DNA instance and can depend on the truthfulness of its responses.

I just figured out how the conductor works, so much easier than I thought. I wrote a python wrapper for it to make my life a little easier. It took me a while to crack the code on jsonrpc_core, but it’s surprisingly simple to add and remove zomes.

I guess the next questions are:

  1. is there any built in auth?
  2. is there a limit to the amount of DNA instances that conductor can handle? I will test this but I’m curious what you’ve found.
  1. Depends on what you mean by auth:
    1. System authentication (login) isn’t needed; you self-authenticate by proving that you possess your private key whose public key is in your agent ID entry (the second entry in your chain for a given DNA).
    2. Human authentication (who am I, and can I prove it?) will be different for every app. Nothing built-in, although there is an Identity Manager DNA that lets users store their own personal information and supply them to other apps.
    3. Authorisation (am I allowed to do this thing?) Two things:
      1. Joining a network and accessing public data: The second entry in your source chain for a given app is your agent ID — it contains your public key and potentially other stuff (e.g., an invite code). As an entry, it has a validation function; if that function fails, you’re denied membership. Not all of the plumbing is there (yet): there’s no way to pass data into the agent ID entry at startup time. So all you can do right now is validate based on the agent’s public key — things like “is this key in the whitelist?”
      2. Writing, editing, and deleting public data: Every type of entry has a validation function; you can use it to decide whether someone’s allowed to perform an action. No built-in support for easier-to-use primitives like roles or ACLs, but I hope to build some drop-in libraries that build on validation functions.
  2. Not sure what the limits are yet :slight_smile: I know the core team has been doing a pile of testing on this — both how many instances can a conductor handle, and how many nodes can a network handle — but I don’t know the concrete results. And they’ll keep changing cuz we’re starting to focus on optimisation in a big way.

thanks for the detailed response!

so one thing you didn’t cover, the json rpc client. It feels insecure that I can add and remove instances without providing any kind of authentication. I would imagine I would need to put a server in front of it so people could query my node but wouldn’t be able to make changes remotely.

heh, yeah. I would never expose the JSON-RPC socket on an ethernet interface! The owner of the node should be the only one making calls of any sort, whether they’re to a running DNA’s API or to the conductor API. The only time someone else should be able to make function calls on your DNA instance is if they’re part of the DHT themselves and are contacting you through node-to-node messaging. At that point you can check their public key and deny their request (they’re required to supply their public key whenever they send you a message, and your receive callback can access it).

There are some gray areas here — for instance, as a HoloPort owner I can load up a UI over my local network, which does make zome calls — but then, I’m the owner of the HoloPort, my laptop, and the network plumbing. Another example is some theoretical P2P weather station app; I might want to expose my DNAs’ APIs (but not the admin API) to all the temp and wind sensors in my yard. Again over the localnet only, and probably with some checks — putting a server in front of it that checks those sensors’ credentials would be a great idea.

One thing that does concern me, though, is malicious UIs on the user’s own device. How does the user grant/deny access similar to app permission dialogues in iOS, Android, and OAuth? I don’t know that we’ve explored that fully, and to me it’s worth serious consideration.