Throwaway DHT

pauldaoust · October 3, 2019, 8:57pm

Problem

Not all historical data is interesting, but a Holochain DHT never deletes an entry. This can cause the DHT’s storage requirements to increase over time and overburden people’s devices.

Solution

Create child DHTs for temporary content storage. Allow them to be created and destroyed ad-hoc. Offload such content from a more persistent DHT to this DHT, referencing its content by address. When people lose interest in the data, they leave the DHT. When the last person leaves, the data disappears.

Implementation

Create a DNA whose only purpose is to store ephemeral data of the kinds that your more persistent DHT needs. This DNA can be used as a template for as many short-lived DHTs as are needed. Individual separate DHTs are created by creating a new DNA from this DNA, changing one insignificant parameter such as the UUID or a special value in the properties section of the DNA. The persistent DHT understands how to reference data in DHTs created from this DNA, using some sort of scheme like (dna_uuid, address_hash) for foreign addresses.

Currently (as of holochain v0.0.30-alpha6) there’s no way for an agent’s persistent DNA instance to talk directly to their throwaway DNA instances. This is because:

A DNA instance can’t bridge to multiple child DNA instances unless it knows exactly how many it needs and what their names are, which thwarts our need for ad-hoc DHT creation.
A DNA instance can’t add bridges after it’s been created.

This means the front-end has to take responsibility for storing and retrieving data in the throwaway DHTs on behalf of the persistent DNA instance.

Introduce a mechanism for recognising when a DHT can get thrown away. For instance, if an entity is marked as deleted in the persistent DHT, it can tell the throwaway DHT to mark it deleted as well. If you keep track of what data exists and what data has been deleted in the throwaway DHT, you can ‘garbage collect’ the DHT (instruct the conductor to remove the instance and delete the DNA) once all tracked entries are marked as deleted.

It’s easy to create a new DHT with the same rules as an existing DHT by changing an insignificant detail in the DNA, such as the UUID or a value in the properties section. This ‘forks’ the DNA. You can do this three ways:

If you’re a developer, you can pass a new properties JSON object to hc package using the -p flag.
If you’re an end-user, you can set the uuid property in the dnas config section for a DNA, creating many DNAs from the same file.
The reference conductor’s admin API function admin/dna/create_from_file lets you install a DNA file many times over, every time with a new UUID and/or set of properties. The front-end can secretly call this function whenever it needs to create new temporary data.

Warnings

Because you need to move data through the front-end rather than a bridge, you lose some guarantees over data integrity because the front-end can’t provide the same assurance that the conductor can.
Decoupling connected concerns into separate spaces forces you to think hard about your dependency graph. It’s easy to introduce a tight two-way coupling between both DNAs; is there a way to design it such that one DNA uses signals to broadcast information so it can be ignorant of the DNAs that depend on it?
Decoupling also introduces the opportunity for data to get out of sync. Your app will have to manually manage referential integrity for related data in either DHT.

mikeg · February 9, 2020, 10:53pm

Hi Paul, the need for this is still relevant, but I find it confusing: how can an (immutable) DHT be ‘throwaway’? What does it mean when “the last person leaves, the data disappears”?

This is sort of a broader issue, that eventually a HoloPort’s HDD will fill up. What happens then? When does Holo get ‘permission’ or a trigger to overwrite and reclaim HDD space?

pauldaoust · February 12, 2020, 10:37pm

Hey Mike. Yeah, counterintuitive indeed! The DHT only exists in the ‘minds’ (or rather devices) of those who care about it enough to participate in it. If the DHT has zero members, it effectively doesn’t exist. What sort of tooling there will be in the conductor, I don’t know — I would expect that if a DNA is removed from the conductor config, the conductor should be allowed to garbage-collect it after a while.

Re: HoloPorts and throwaway DHTs that they’re hosting… good question. How exactly do you broadcast an intention for a DHT to stop existing so that other participants (including headless ones like HoloPorts) can use that signal to free up space?

mikeg · February 12, 2020, 11:00pm

That is… pretty ephemeral. More thoughts here: How to deal with accumulation of obsolete data over time in a world with finite resources

There could be a market to serve the purpose of archive.org as the keeper of the global memory…

pauldaoust · February 12, 2020, 11:18pm

Absolutely would be a great idea. Maybe something incentivised like FileCoin — which would reduce the need to figure out how to get HoloPorts to shut down their DNA instances.

marcus · February 13, 2020, 11:55pm

Well yes @mikeg, once every nodes deletes its data, the data is gone. Same works for blockchain or anything supposedly permanent. Gotta love entropy

Thanks for bringing this to attention @pauldaoust. Definitely a question I’ve had for a long while but have held off asking until I thought I might actually need it. At meetups and other events, I’ve brought up the solution of “Oh, you can always just spin up a new DHT automatically to solve X problem”, but it’s a half-truth because I haven’t seen how this is actually done. Spinning up new DHTs for private chat rooms, for instance, definitely needs this functionality. Many other private interactions/transactions will require similar functionality.

mikeg · February 14, 2020, 12:08am

Seems a bit sacrilegious (against everything we believe about ‘immutable’ ledgers), but I guess you are both right… nothing stops people from deleting.

marcus · February 17, 2020, 9:44am

Mutual Sovereignty

pqcdev · December 27, 2020, 7:08am

Would ghost accounts be a problem here? Is there away to somehow throw them away?

maybe all lobbies are temporary and when the last admin of a privileged DHT leaves then the rest is discarded, idk something like that

pauldaoust · February 25, 2021, 9:56pm

sorry, I missed this question. Can you tell me more about ghost accounts and what they look like? A DHT isn’t able to discard any data; it just accumulates it (well, until/unless we get ‘true’ DHT purge operations in the future – they are on the roadmap – but still, it’s P2P so it’s just a polite request for your peers to scrub the data from their devices). What I’m referring to re: the data disappearing is simply that, when there’s nobody in the DHT anymore, there’s no DHT, and hence no DHT data.

pqcdev · February 26, 2021, 2:44am

someone enters DHT and then goes afk

ie unable to get them to leave

I figure just solve this by setting a node sync (“inactive”) time limit

pqcdev · February 26, 2021, 2:49am

if you can keep me updated on the status of this, I would appreciate it

or if you want help lmk

pauldaoust · February 26, 2021, 9:10pm

Ah, got it. Well, I guess the DHT would then remain in existence for a while – the question is, could anyone still find that node (not sure how often Holochain pings the bootstrap server to say “I’m still alive”. But the pattern above is for reducing participants’ storage burdens by allowing them to opt into and out of content-specific DHTs (either automatically or manually), so if a person chooses to keep holding all that data… well, that’s their choice