Holochain Forum

Holochain storage app like Dropbox

I’m new to Holo and Holochain apps development, If I would like to create an app to simply upload/download files, is that something possible right now with Holochain? How much storage space could it have access to?

Thanks!

Yes this is a project that is in its infancy right now. More here: File Storage Zome

3 Likes

In response to how much storage space it could have access to… as much storage space as its users are willing to give it, divided by the resilience factor of the app + 1 (the amount of redundancy that each piece of data is expected to have, plus the original copy stored by the uploader, because for every file you upload, it’s still taking up space on your own machine). So let’s say we have ten users, each with 500 GB of storage. The resilience factor of the app is 2. How much storage do we collectively have?

500 * 10 / (2 + 1) = 1.67 TB

1.6 terabytes collectively. Which isn’t so much, considering each person chipped in almost a third of that. But you get massive redundancy, and you can share your uploads with others. There are other, possibly better ways to do this – you could chunk the data into pieces, then use some sort of error correction. This doesn’t result in any gains in storage space efficiency, but it does make the data way more resilient to hardware failures and offline nodes. That’s what a company called Cubbit is doing with their personal backup devices. It makes the data 50% bigger, but you can then drop down the resilience factor to 1. (Check out the technology page; scroll down to ‘Redundancy’.) So in that case:

500 * 10 / (1.5 + 1.5) = 1.67 TB
1 Like

I just voiced some thoughts about file storage on Holochain over here. But it might be better to address the issue in this thread, since it is about the exact same topic.

The issue I have is with the fact that a Holochain DHT never forgets. Files can be marked as deleted, but will remain on the DHT for eternity (or until a new version of the hApp is being released and chooses to leave files behind that have been marked as deleted).

So my question is this:

Could there be a purely Holochain solution that allows for the true deletion of unwanted / outdated content?

Say we have two DHTs: One monotonic for all the meta-data (publishing / updating / deleting entries) and one non-monotonic DHT for the actual content. Then specify in the hApp DNA, that items can be removed from the data-DHT ONLY IF a corresponding delete-request has been posted to the meta-data DHT.

That way, we would make sure that all relevant data is accessible, while unwanted data can savely be forgotten. Not only would this save a lot of storage space. It would also allow to remove illegal content.

@pauldaoust, I remember us discussing the topic of file storage a year back. Do you think something like my above proposal could work?

2 Likes

At the moment you are correct that files cannot be truly deleted. However we are planning to introduce “Garbage COLLECTION” that will allow entries to be truly deleted if the rules of the DNA are followed.

3 Likes

Thats exciting news!

@philipbeadle oh! That is exciting news! I’m guessing it has something to do with respecting the remove/update directives/aspects/metadata as they come in for an entry?

@jakob.winter I also have an idea in my head about single-blob DHTs that get destroyed when the blob is obsolete. But this feels kinda resource-heavy and is only useful for true deletion in the absence of garbage collection.

(But it is still useful for controlling access to a single resource or for setting up an incentive/compensation system for people who want to help seed your data!)

@pauldaoust what’s about single-blob DHTs? Can you elaborate please? I think something different. We have to make another DNA only for deletion. When an author deletes an entry, it makes a signal to this “delDNA” to make it public. After that, the validating agents won’t store & replicate it further and if saved earlier, it was deleted from the storage. That means every time when a node validate to make a copy of the entry, it first check whether it is in the delDNA. When an entry becomes obsolete, it is removed from this.

I like that idea Paul. Ill bring it up tomorrow, we had a good talk about file storage hApps today and decided a the DHT would point to how to retrieve the file. A File DHT would mean that someone who wants the file would also host it when they join the DHT.

1 Like

Also how about we just build a file system straight on a DHT and see what happens??

Do we really want to delete out of date info?

let’s find out

just to clarify: I think it is vital that all agents tasked with storing a certain file need to do so.

Because if only people who “want” a file are the ones storing it, then what if at some point no one wants it and everyone deletes it?

So its important that agents only delete a file when they are “allowed” to do it

  • Because the file has been marked as deleted by the owner/author

  • Because we’ve reached a higher than necessary redundancy level

1 Like

I just came across a crowd funding project that promises cloud storage. They had some nice features which I would like to mention here, in case a similar solution gets built on Holochain:

  • High download speed thanks to simultaneous connections:
    By having a redundancy level of say 10/30 instead of 1/3 (so you need 10 of 30 nodes online instead of 1 of 3) not only do you make a critical outage mathematically less likely, you also allow for parallel downloads from several nodes

  • Speedy Up-/Download in local networks
    If your phone is logged into the same local network as your HoloPort, upload is super fast (via Wifi). The port will do the syncing to the cloud afterwards.

  • Get what you serve
    You get as much cloud storage as you provide, divided by the redundancy factor you desire. So I could set up one 2TB cloud drive with a redundancy factor of 2 by providing 4TB to the network and another 500GB cloud drive with a reduncancy factor of 4 by providing another 2TB.

Tiered trust networks is also going to be a hot feature for any file or data storage:

  1. Personal devices with full redundancy.
  2. Trusted family+friends with high redundancy.
  3. Community+orgs with some redundancy.
  4. Countries with some redundancy.
  5. Global with minimum redundancy.
1 Like

@premjeet when I say single-blob DHTs, I’m thinking of one DHT per file — that is, the only purpose of DHT X is to store file Y. The people who want that file, join its DHT. With this pattern, we get:

  • per-file granular access control
  • ‘human’ garbage collection (if you’re not interested in it anymore, leave the DHT)

Hope that explains my thinking. With your idea, are you picturing this is possible right now, or later when/if Holochain adds a removal feature?

I think I have a guess about scenarios you’re thinking of, but could you share them? I can see this as necessary for some scenarios and unnecessary for others.

I like the stuff you shared about the other project. It’s not Cubbit, is it? Particularly I like the ‘get what you serve’; feels like a great cooperative game theory economic.

Well, its just that we would have to figure out a good way of determining who’s allowed to delete their copy of a file. Take as an example a music streaming service.

My node is tasked with storing a certain album. Other people like the album and listen to it on a regular basis. So its better for them to keep a copy in local storage. That would give me the opportunity to delete the file from my system, since there are enough copies around.

But what if people grow weary of the album and start deleting it? Is it a “whoever holds the data last must keep it” scenario? Then people might start to delete files as quickly as possible.

Or do we introduce a streaming currency? You get credits for providing data and spend credits for requesting data. If in question the agent with more credits is allowed to delete the files… and when additional agents are needed to store a file, those with the lowest credit balance get picket…

Surely we could find some fair solutions for that problem.

Yep, thats the one.

Thanks for fleshing out a use-case — in this scenario I can totally see the rationale for some sort of controlled garbage cleaning / preservation. @dhtnetwork and I were chewing on this in the springtime for someone we consulted with, and we came to a similar conclusion as you: incentivising storage would probably help nicely. We imagined something where the creator/publisher of the song would pay those who hold the credits, but I really really like the idea of them just being exchanged among downloaders and hosters. This works nicely with mutual credit principles!

It’d be interesting to work out the protocol for proving storage and delivery. The ‘proof’ part of proof-of-storage is easy: all you need to do is periodically send a message to an agent who claim’s holding a chunk, along with a ‘salt’ that they’re expected to concatenate with the chunk they’re holding, then produce a hash from the salted chunk. If their result matches what you expect, you know they’re holding it. The part I haven’t quite figured out is who asks for that proof? I’m guessing the person who’s got the most interest in seeing it survive; in this case that’d probably be the creator/publisher of the song.

Proof-of-streaming is a bit more tricky. If anyone has some clever ideas, I’m all ears.

Another thought is that maybe that file should disappear if people aren’t any interested in it anymore. It’s kind of a harsh Darwinian setup, but it might work. The risk there is that things that are intrinsically valuable might not have its value recognised, and then society would lose it. But there’s one person who would always be interested, as long as they were alive: the creator of the work itself. If they care about it enough, they could have an always-online node.

This might also be the work of a group like the Internet Archive — they might keep files alive by running DHT instances.

There are more things to think through. The above provable/incentivisable storage kind of runs against the DHT-in-commons idea, so you’d be building your own stuff on top of the existing Holochain tools to make it work. Otherwise you might get the ‘virtuous’ nodes who store an entry because it’s in their DHT neighbourhood vs the ‘incentivised’ nodes who store it because they want compensation :man_shrugging:

2 Likes

Probably the party intending to delete their copy. They have to make sure there are enouth copies out there before they are allowed to delete theirs.

Only if the creator marks a file as deleted, then the minimal redundancy goes to zero - so anybody still holding a copy could delete it.

Yeah, thats a tricky one and will probably hugely depend on the hApp in question. I am curious what kinds of models will surface in future hApps.

1 Like

So, in single-blob DHT, a file can be accessed by its 'DHT X, not necessarily by ‘file Y’

1 Like

Correct!