Holochain storage app like Dropbox

just to clarify: I think it is vital that all agents tasked with storing a certain file need to do so.

Because if only people who “want” a file are the ones storing it, then what if at some point no one wants it and everyone deletes it?

So its important that agents only delete a file when they are “allowed” to do it

  • Because the file has been marked as deleted by the owner/author

  • Because we’ve reached a higher than necessary redundancy level

2 Likes

I just came across a crowd funding project that promises cloud storage. They had some nice features which I would like to mention here, in case a similar solution gets built on Holochain:

  • High download speed thanks to simultaneous connections:
    By having a redundancy level of say 10/30 instead of 1/3 (so you need 10 of 30 nodes online instead of 1 of 3) not only do you make a critical outage mathematically less likely, you also allow for parallel downloads from several nodes

  • Speedy Up-/Download in local networks
    If your phone is logged into the same local network as your HoloPort, upload is super fast (via Wifi). The port will do the syncing to the cloud afterwards.

  • Get what you serve
    You get as much cloud storage as you provide, divided by the redundancy factor you desire. So I could set up one 2TB cloud drive with a redundancy factor of 2 by providing 4TB to the network and another 500GB cloud drive with a reduncancy factor of 4 by providing another 2TB.

1 Like

Tiered trust networks is also going to be a hot feature for any file or data storage:

  1. Personal devices with full redundancy.
  2. Trusted family+friends with high redundancy.
  3. Community+orgs with some redundancy.
  4. Countries with some redundancy.
  5. Global with minimum redundancy.
2 Likes

@premjeet when I say single-blob DHTs, I’m thinking of one DHT per file — that is, the only purpose of DHT X is to store file Y. The people who want that file, join its DHT. With this pattern, we get:

  • per-file granular access control
  • ‘human’ garbage collection (if you’re not interested in it anymore, leave the DHT)

Hope that explains my thinking. With your idea, are you picturing this is possible right now, or later when/if Holochain adds a removal feature?

I think I have a guess about scenarios you’re thinking of, but could you share them? I can see this as necessary for some scenarios and unnecessary for others.

I like the stuff you shared about the other project. It’s not Cubbit, is it? Particularly I like the ‘get what you serve’; feels like a great cooperative game theory economic.

1 Like

Well, its just that we would have to figure out a good way of determining who’s allowed to delete their copy of a file. Take as an example a music streaming service.

My node is tasked with storing a certain album. Other people like the album and listen to it on a regular basis. So its better for them to keep a copy in local storage. That would give me the opportunity to delete the file from my system, since there are enough copies around.

But what if people grow weary of the album and start deleting it? Is it a “whoever holds the data last must keep it” scenario? Then people might start to delete files as quickly as possible.

Or do we introduce a streaming currency? You get credits for providing data and spend credits for requesting data. If in question the agent with more credits is allowed to delete the files… and when additional agents are needed to store a file, those with the lowest credit balance get picket…

Surely we could find some fair solutions for that problem.

Yep, thats the one.

2 Likes

Thanks for fleshing out a use-case — in this scenario I can totally see the rationale for some sort of controlled garbage cleaning / preservation. @dhtnetwork and I were chewing on this in the springtime for someone we consulted with, and we came to a similar conclusion as you: incentivising storage would probably help nicely. We imagined something where the creator/publisher of the song would pay those who hold the credits, but I really really like the idea of them just being exchanged among downloaders and hosters. This works nicely with mutual credit principles!

It’d be interesting to work out the protocol for proving storage and delivery. The ‘proof’ part of proof-of-storage is easy: all you need to do is periodically send a message to an agent who claim’s holding a chunk, along with a ‘salt’ that they’re expected to concatenate with the chunk they’re holding, then produce a hash from the salted chunk. If their result matches what you expect, you know they’re holding it. The part I haven’t quite figured out is who asks for that proof? I’m guessing the person who’s got the most interest in seeing it survive; in this case that’d probably be the creator/publisher of the song.

Proof-of-streaming is a bit more tricky. If anyone has some clever ideas, I’m all ears.

Another thought is that maybe that file should disappear if people aren’t any interested in it anymore. It’s kind of a harsh Darwinian setup, but it might work. The risk there is that things that are intrinsically valuable might not have its value recognised, and then society would lose it. But there’s one person who would always be interested, as long as they were alive: the creator of the work itself. If they care about it enough, they could have an always-online node.

This might also be the work of a group like the Internet Archive — they might keep files alive by running DHT instances.

There are more things to think through. The above provable/incentivisable storage kind of runs against the DHT-in-commons idea, so you’d be building your own stuff on top of the existing Holochain tools to make it work. Otherwise you might get the ‘virtuous’ nodes who store an entry because it’s in their DHT neighbourhood vs the ‘incentivised’ nodes who store it because they want compensation :man_shrugging:

3 Likes

Probably the party intending to delete their copy. They have to make sure there are enouth copies out there before they are allowed to delete theirs.

Only if the creator marks a file as deleted, then the minimal redundancy goes to zero - so anybody still holding a copy could delete it.

Yeah, thats a tricky one and will probably hugely depend on the hApp in question. I am curious what kinds of models will surface in future hApps.

2 Likes

So, in single-blob DHT, a file can be accessed by its 'DHT X, not necessarily by ‘file Y’

2 Likes

Correct!

1 Like

In this single-blob DHT, Is it possible to ‘link’ files?

Sure, but it’d be pretty roll-your-own (at least until someone creates a generic library for this). You’d probably do the linking in some other DHT, where those blobs are represented by a ‘reference’. This is what it might look like in the ‘main’ DHT. The UI would know about the blob’s DNA hash, because it would be responsible for creating the blob DHT. It could then feed it into the create_blob_dht_ref zome function.

struct BlobDhtRef(Address);

#[zome]
mod my_zome {
    #[entry_def]
    fn blob_dht_ref_entry_def() -> ValidatingEntryType {
        entry!(
            name: "blob_dht_ref",
            description: "A reference to a single-blob DHT",
            sharing: Sharing::Public,
            validation_package: || {
                hdk::ValidationPackageDefinition::Entry
            },
            validation: | _validation_data: hdk::EntryValidationData<BlobDhtRef>| {
                Ok(())
            }
        )
    }

    fn create_blob_dht_ref(blob_dna_address: Address) -> ZomeApiResult<Address> {
        let blob_dht_ref_entry = Entry::App(
            "blob_dht_ref".into(),
            BlobDhtRef(remote_blob_dht_address).into()
        );
        commit_entry(&blob_dht_ref_entry)
    }

    #[zome_fn("hc_public")]
    fn link_to_blob(base_address: Address, blob_dna_address: Address) -> ZomeApiResult<Address> {
        let blob_dht_ref_address = create_blob_dht_ref(blob_dht_haddress)?;
        link_entries(base_address, blob_dht_ref_address, "link_to_blob", "")
    }
}

@pauldaoust I am asking whether it can be linked with other files within the same single-blob DHT, means inter-linking of files. Because of the entries within this DHT are very much local to the agents, link entries would also be local, isn’t so? Or, is there a way to link entries between two different blob-DHTs?

@premjeet ohhh, I see what you’re saying. The point of a single-blob DHT is that its purpose would be to store only one file — no other files to link to. Your filesystem would look like a bunch of separate DHTs, possibly united by one ‘file table’ DHT.

You could link entries between two different blob-DHTs, but right now it would be roll-your-own because there’s no built-in link type that points to another DHT. It would probably look something like (dna_hash, entry_address, link_type, link_tag). @pospi and @lucksus both have some interesting and useful reflections on this subject: Cross-DNA links

1 Like

BTW,can you explain about ‘link_type’ & ‘link_tag’ please.

@jakob.winter I just wanted to chime in to clarify for you (& everyone else) something that as a former privacy activist I think is quite important:

In a distributed system with untrusted nodes, you cannot reliably delete anything. Ever.

It matters precisely zilch that your system has some clever way of garbage collecting. Or encrypting. Or anything else. If you have sent data to someone, once it arrives at their machine then it is theirs. They might have hacked code. They might be directly accessing the Holochain storage backend. You have no way of knowing, unless perhaps you’re running signed, binary-compiled code on TPM hardware. Even then… that stuff has been hacked before, too.

It’s worth pointing out that this is the reality of today’s internet, except that currently it’s only shadowy figures and Orwellian government organisations who get to keep such information. All that systems like Holochain, Scuttlebutt and others do is level the playing field.

So, what’s coming for society at large is a very frank discussion about which data it is “polite” to access. Is it ok to read someone’s old profile data? Probably depends on the app, right? It’s a curly issue and a lot of people will probably get hurt before we figure it out. I’ve already hurt people in this way, to be frank. And I’m someone who thinks about this stuff a LOT :confused:

7 Likes

True. As @artbrock says: One has to assume that at some point in the future cryptography might be cracked. So any data submitted to a public DHT should not be considered save indefinitely.

Thats why it would be nice to have a freely configurable Dropbox-like hApp, where I could for example set up one “save” directory for which I personally pick each node. So the DHT of that directory might only run on the HoloPorts of myself, my brother, my best friend and my lawyer.

Thus you reduce the attack surface from hackers (they’d really have to want to attack you or your friends/family specifically) while minimizing physical risks to your data (e.g. losing your data in a fire).

That way a Holochain hApp could make any risks perfectly visible and be honest about them, giving full agency back to the user.

5 Likes

TOTALLY AGREE! This is a tough sell but ultimately necessary, I think. We want to equip people to be thoughtful about the consequences of their actions; we’re only warning them about things that centralised services are abstracting away under a user-friendly UI. I keep thinking about the weightiness of publishing in the Secure Scuttlebutt’s Patchwork UI. Let’s say we’ve got some insensitive bozo who thinks he knows everything. In a flurry of blind rage, he rattles off some dumb post about antipodean food nomenclature:

That extra step, with the warning and the ‘confirm’ button, have caused me to stop so many times. Not just because I thought about whether my remarks were emotionally sensitive, private, or risky. As often as not, it’s been because I realised I wasn’t really contributing anything novel or helpful to the conversation! It’s a lovely zen practice.

9 Likes

Have read this thread and it becomes a little technical. :dizzy_face:

Is it possible for a DropBox or Cubix style hApp to be created and used on Holochain?
Does Holochain support IPFS?

I noticed that in the pricing modelling for hosting there is a $/-GB per hour storage, is that what this would be related to?

Thanks.

A file storage hApp:
Yeah, it should be possible, especially if you use IPFS as the data storage layer. So you store all data via IPFS (where it can be deleted, if desired) and you handle all the organizational stuff (who is supposed to store which data) on the Holochain side.

As @pauldaoust suggested earlier:

IPFS would also bring another advantage: Since it is a single address space (as opposed to BitTorrent, where each file has its own), IPFS has dedoublication-functionality, meaning: Duplicates of files are automatically deleted.


Storage Negotiations:
Some thoughts about the issue of who’s required to store which bit:

  • The author publishes the data and he initial validators are required to store it as well.

  • A user can check the “keep this file” box for any data they voluntarily wanna keep. Like, I would check that box for all the music I regularly listen to (so I have a local copy and don’t have to re-download them all the time)

  • If a user (even an initial validator) doesn’t want to store a file, they can uncheck the “keep this file” box, at which point they will see a little “…finding other custodian for this file…” message.

  • The file will only be deleted from the system after A: the software has found that there are already more copies in circulation than required or B: the software has found another custodian.


Making it attractive to store stuff:
But what incentivizes people to store files they aren’t interested in themselves?

One nice way would be a transferrable reputation currency (hybrid). You earn points for storing files (with a higher reward for files close to the lower redundancy limit - the ones nobody else wants to store) and you earn points for providing that data (bandwith).

You lose points for consuming content.

So at this point its just a standard currency. But here comes the reputation part:
Accounts with lots of points will get preferencial treatment when requesting data. So if you have more points than I and we both request the same data, you’ll be served first, giving you a nicer experience, especially when streaming content.

So if my streaming experience isn’t great, I have two options:

  • I contribute more to the system (like running a HoloPort at home) to gain more points

  • I buy some points from others on a regular basis (to balance my consumption)

I particularly like this idea, because it works for both tech-affine people and users who just want to enjoy good bandwith for a price.

Some open questions I still have:

  • How do we prevent people from gaming the system by uploading shitloads of useless data, just to earn points?

  • Where do the points come from? Are they just being generated by hosting data and burned by consuming data?

I still have to think heavily about this. I have a feeling there is a gread solution. We just have to think of it :sweat_smile:

2 Likes

Regarding deletion what I imagine at least is information about ownership of data being distributed and then it would enable the owner permission to mark the data as deletable, that happening on some lazy distribution to reduce overhead. I am reading about current implementations of deletion just now though