Deleting an entry complies with GDPR right to be forgotten?

yeff · November 9, 2019, 10:13am

I understood an entry into the DHT can be marked as deleted, but the data is not removed, as it is immutable.
If an actor is capable of doing this on its own data, it should be able to do this everywhere such data is stored, i guess Holochain has covered this, but i like to be sure of that.

Secondly, the data that is marked as deleted should not be readable anymore by no one, or the claim of deletion cant be made. I am totally not sure if Holochain can guarantee this, so i like to get an answer from core devs on this.

If deleted data somehow stil can be made readable, i am afraid European privacy lawyers will never advice using Holochain.
I am talking here about any kind of public data that a user has published on any kind of public app. It is a feature of HOlochain to link a persons data with its agents’ identity, so this means - in GDPR law terms - that any kind of the linked data is also considered as personal data and the same rules apply.

So even if an Individual thinks he did not publish anything personal, legally this still becomes personal data.
So this deleting is a big thing.

dhtnetwork · November 12, 2019, 12:13am

Hi @yeff I added your question to our weekly community report/inquiry. I’ll be meeting with our devs this week and will ask if they have time to respond. In the meantime, others are welcome to provide their input.

pauldaoust · November 12, 2019, 11:31pm

Hi, @yeff. This is a tricky question indeed, and a lot of ink has been spilled both in Holochain land and beyond. Here’s how I see it:

All data is subject to being copied and retained by others, and can’t truly be deleted once it’s out there. This is true of centralised services, decentralised services, even notes written on paper. Even if you can’t take a screenshot or photo, you’ve got it in your mind and can spread it as gossip.
New distributed technologies, however, can amplify the spread of information much more than the tools previously available to us.
GDPR is meant to create fiduciary responsibility for organisations who are in a power asymmetry with the people whose data they host (IOW: protect users from big companies).
It doesn’t have anything to say about people sharing things amongst themselves – AFAIK, it doesn’t have the power to compel someone to flush out their email archive.
Distributed tech is a blind spot in GDPR. It’s got the peer-to-peer qualities of human social interactions, but the vast data distribution power of big platforms.

My feeling is that distributed systems occupy an awkward middle ground between personal interaction and client/corporation relationships. Because they can spread personal data much more quickly than personal interactions in ‘meat space’, and they can spread data into domains where they aren’t necessarily expected to go, we need to think about ways to wield this power responsibility.

Here are the facts about Holochain:

When data in a DHT is deleted, it isn’t truly deleted; it’s only marked as obsolete.
Even if there were an obligation to actually scrub the bits from your hard drive, there’s no verifiable way to prove it to the satisfaction of the person who asked you to delete the data. (This may change with future CPU features, but isn’t available right now.)
The DNA of a Holochain app allows the developer to exercise discretion in what can be returned to the UI — for instance, if you never allow deleted records to be retrieved via get_entry(), then the user will never see it.
However, a motivated individual could look in the database that holds their local shard of the DHT; there’s a chance that it holds the deleted data they’re looking for.
I have heard that Holochain might in the future offer peers the option to garbage-collect deleted data so at least they aren’t opened up to legal liability. Don’t quote me on this though; I’m interested in hearing the core devs’ response.

This is an issue for users of every digital system, centralised or distributed. The most that a piece of technology can do is make promises and hope that the people actually operating it will uphold those promises. This is a social problem, not a technological one.

yeff · January 10, 2020, 9:37am

I still needed to thank you, @pauldaoust for this detailed answer. I know people are now very busy with Heloport issues and so, but it would be good to get a sort of vision on the last point from core devs, on that garbage collection of so-called deleted data.

pauldaoust · January 13, 2020, 10:08pm

thanks for your thanks @yeff! I don’t know if I have a clearer picture yet, but it seems that the GC would be on validation dependencies only; example scenario:

Entry B is valid only if entry A is valid and contains the word “beep”
Validator pulls entry A from the DHT and checks its validation signatures
a. If it’s invalid, fail validation for entry B and proceed to step 4
b. If it’s valid, proceed to step 3
Validator passes entry A into entry B’s validation function, which returns a result
Regardless of the result of previous steps, validation is now finished and entry A can be GC’d.

I’m hoping that one day we’ll learn that Core intends to allow nodes to optionally GC deleted entries, but I suspect that won’t come soon — the nodes are supposed to have a ‘covenant’ that they’re holding every entry within their advertised neighbourhood for the purpose of validation, not just the live ones.

pauldaoust · May 29, 2021, 4:19am

The state of the art is advancing, and I think collectively we’re getting better ideas of what privacy means. Here are some updates to this topic:

There are plans to eventually add two new DHT operations, purge and withdraw, in addition to the existing create, update, and delete. No timeline on this though. Here’s my understanding:
- Purge will instruct DHT peers to scrub the entry data from their shards, but retain headers that show who wrote it. This is meant for illegal and/or nasty material. Anyone should be allowed to purge anyone else’s entry data, as long as the validation rules allow it.
- Withdraw will instruct DHT peers to scrub the header that shows that an entry was written, and I presume will also scrub the entry data if it only had one header attached to it. You’ll only be able to withdraw your own headers. This is to fix mistakes on your own source chain, like “whoops, I didn’t mean to publish that”.
- In either case, you’re still at the mercy of your DHT peers to be operating in good faith (that is, that they’ll actually do what you ask) and to not accidentally be storing extra copies on a backup drive. And any other data that depends on the deleted data will no longer be validatable – validation functions will just return a “Missing dependencies” error and eventually give up trying. So as a dev you’ve gotta be careful about introducing situations that cause a cascade of dependent data to end up in validation limbo.
Some profs and I are publishing a paper in a journal next month, and it’s all about Holochain, GDPR, and participatory design (that is, design that involves both developers and users). I don’t know if I’m allowed to send the draft, but I’ll try my best to remember to update this thread when it lands. (Ha ha, I’m the principal author and they’ve all got PhD’s and I haven’t even got a high school diploma. In a former life I’d be going through some serious impostor syndrome here.)

jvanbockryck · May 29, 2021, 6:27pm

This is very interesting news, Paul. Keep me in the loop.

stephenpurkiss · May 29, 2021, 7:48pm

Just saw this update - the developers I worked with mostly in the Drupal world have done a lot of work on GDPR and encountered all sorts of issues, mostly that nobody really knows all the issues as they won’t be fixed until court issues happen. Other highlights are:

Anonymising of data that’s required for things like e-commerce functionality (reports etc.) - you can’t just delete all the data - as you alluded to in your update, however people have been through this before so hence sharing the little I know!
Backups - they ended up flagging delete requirements so when backups restored the requested delete stuff doesn’t restore. It’s close but not sure it’s a final answer
Complexity increases exponentially - as most of their clients were churches with thousands of people attending events (this was a while back…) to configure workflow/requirements for each field is quite time consuming so sensible defaults required
Good thing was versioning of agreements so you know what version of terms people have agreed to
Roles and permissions means you can create logins for GDPR assessors, managers, etc.
Cross-CMS privacy group was formed to share common functionality however just looked and doesn’t seem much has progressed in 3 years…

On top of that, in a business group a couple of days ago someone had a GDPR issue as people who worked for them (can’t remember whether freelance or not) had taken customer data and approached them direct. No computer system is going to stop that sort of violation.

Here’s an old post with a little demo

bic · October 12, 2021, 5:29pm

Remembered most important privacy (GDPR also) issues and if they can be solved by the system or by the application, excluding anonymity or pseudonymity which isn’t a privacy issue as long as they can be easily done:
0- We are excluding those privacy leaks from any kind of security issue, considering that security it’s way enough or not really a privacy concern;
1- first of all, privacy isn’t a security issue, but an app-level matter of design (also Holochain architecture supports. E.g. modularity, separate DHT’s for groups, and so on);
2- Can’t be solved: then, assuming that every agent have their own source-chain which is definitely a block-chain, then it’s normal to not really can scrub data within, just not available, and that is specific to a distributed network (very scalable);
3- Can’t be solved: also is that even suppose to purge (data, not header) and withdraw (even header), even validated by another entry already valid, it cannot really scrub any data, just hidden it and hinder it at the conductor level, no access from the app-level. Just by hacking the conductor, instructing others that malicious conductor is a regular application, with the same code hash;
4- Can’t be solved: remained data deleted or purged-withdrawn will actually stay there until completely restore again into a new version (of DHT),
5- Can’t be solved: yet been distributed the bad actors can still steal others’ data prior to being detected by posting something and being warranted as malicious. But here also can be done by hacking the conductor (again). Or even read them another way, however, a motivated individual could look in the database that holds their local shard of the DHT; there’s a chance that it holds the deleted data they’re looking for.
6- It is normal to be so: finally being open to all and F/LOSS (e.g. a big social network) what can you really prevent? excluding the deleting own data before propagated to anyone else by bridging.
7- Actually not needed: Whatever it could be GDPR is meant to create fiduciary responsibility for organizations who are in a power asymmetry with the people whose data they host (IOW: protect users from big companies). It doesn’t have anything to say about people sharing things amongst themselves – AFAIK, it doesn’t have the power to compel someone to flush out their email archive. So it’s good the feeling that distributed systems occupy an awkward middle ground between personal interaction and client/corporation relationships, as long as the community (actually developers) have their own interest and concern to protect their own data to be used in various bad scopes.

It’s bad to say it, but here are weak reasonings for not being privacy (GDPR) compliant:
A- Excluding bad actors which can read others’ data or those who will be warranted at the first try, if they did not actually delete the data;
B - Reasons as that an agent holds a little amount of data, even it would be more bad agents, they cannot find so much, as long as from a graph database;
C- Many privacy issues cannot be solved completely, in respect of 0- that is not a security issue. Or that they are too complex (excluding social issues, big communities, big events where a lot ‘knows too much’ vs pseudonymity)
D- But in general, distributed tech is a blind spot in GDPR. It’s got the peer-to-peer qualities of human social interactions, but the vast data distribution power of big platforms. Even in Holochain’s DHT agent-centric approach, even with its higher security.
E- That it depends on social, not on tech (excluding those kinds of issues when someone saw it and remembered it or even had stolen it because was granted before)
F- Reasons as low chance to happen from GDPR compliance verifications, as encountered all sorts of issues, mostly that nobody really knows all the issues as they won’t be fixed until court issues happen.
G- worse than this, most in centralized systems: even if there were an obligation to actually scrub the bits from your hard drive, there’s no verifiable way to prove it to the satisfaction of the person who asked you to delete the data, as long as they’re centralized and can have a lot of hidden backups to share to other entities for financial purposes. But actually not addressed to really distributed networks. But even so, this may change with future CPU features but isn’t available right now, and also the centralized system cannot prove that they deleted everything.

Finally: Cross-CMS privacy group was formed to share common functionality however just looked and doesn’t seem much has progressed in 3 years.
So, Holochain seems to be 99.9% privacy compliant with higher security and at the discretion of the application level atomicity. And even near GDPR compliant, as it’s not needed to protect ourselves from ourselves, in respect to developers code, and not even so, as long as a really open for community and for developers large distributed.