Writing a validation rule that checks the entry author against the data being added (the entry)

Connoropolous · December 19, 2019, 9:24pm

(Update: this code is not updated for Holochain RSM, it was for holochain-rust / redux, so it will no longer be appropriate for RSM)

(disclaimer: for anyone who is still unaware, I am no longer an employee of Holo, since August 1, 2019. I currently run a business in which we specialize in developing Holochain applications)

It can be tempting, and “intuitive” to use hdk::AGENT_ADDRESS in validation rules, but most of the time, you probably shouldn’t.
If what you are trying to do is match a field, like entry.address to the agent who is committing the entry…

validation: | validation_data: hdk::EntryValidationData<Profile>| {
  match validation_data{
    hdk::EntryValidationData::Create{entry,..}=>{
        if entry.address!=AGENT_ADDRESS.to_string() {
            Err("only the same agent as the profile is about can create their profile".into())
        }else { Ok(()) }
    },
    _ => { Ok(()) }
  }
},

This only works in the context of the original author. Entries that pass the original authors validation rules will fail when it gets gossiped to other nodes because hdk::AGENT_ADDRESS will be equal to their agent address, and thus will produce the Err case.

Instead, the true agent that you likely want to reference is the one found here, in the validation_data struct that wasn’t previously being used:

validation: | validation_data: hdk::EntryValidationData<Profile>| {
  match validation_data{
      hdk::EntryValidationData::Create{entry,validation_data}=>{
          let agent_address = &validation_data.sources()[0];
          if entry.address!=agent_address.to_string() {
              Err("only the same agent as the profile is about can create their profile".into())
          }else {Ok(())}
      },
      _ => { Ok(()) }
  }
},

To help make sense of this, it’s helpful to check the source code of sources() function on validation_data
https://docs.rs/hdk/0.0.40-alpha1/hdk/struct.ValidationData.html#method.sources

impl ValidationData {
    /// The list of authors that have signed this entry.
    pub fn sources(&self) -> Vec<Address> {
        self.package
            .chain_header
            .provenances()
            .iter()
            .map(|provenance| provenance.source())
            .collect()
    }
}

It accesses the ChainHeader in the ValidationPackage that’s associated with the validation callback, in order to access its “provenances”. The People/Agents who committed the entry and chain header.
A provenance is a fancy word for the original author of an entry/chain_header…
it includes their Crypto Signature, and their Agent Address

pub struct Provenance(pub Address, pub Signature);

calling .source() on a Provenance returns the Address.

Really, in most cases, there are not multiple sources… so just take the first source.
validation_data.sources()[0]

This is something I think the Holochain project needs much better documentation about. I hope this is helpful to others.

Connoropolous · December 19, 2019, 9:36pm

For background on the context of this, we are in the process of writing the validation rules for an open source Holochain application:

pauldaoust · December 20, 2019, 4:11am

So stoked about Acorn becoming a proper app — and especially a Holochain app. @Connoropolous I wonder if this’d find more eyeballs in #learning-library somewhere… I don’t know the answer TBH.

rlkel0 · December 23, 2019, 6:35am

I found this helpful, I also would like to see more documentation around validation logic and data integrity. I actually found this looking for some information on how to verify the originator of a post.

Connoropolous · December 23, 2019, 5:09pm

What kind of validation around the originator of a post were you looking to do @rlkel0?

rlkel0 · December 23, 2019, 7:05pm

pretty much the same as you. I’m just researching building a name server, and have been working on figuring out how to ensure uniqueness.

Here’s an example of a recent post:

I believe that it is possible to use reputation and some basic proof-of-work to limit name registration, although I’m not sure if that’s the best solution. I think the utility of the DHT is limited if a malicious user could just spam the DHT with meaningless entries, but to keep it truly decentralized it feels like the best solution is to make common sense validation logic that prevents superfluous entries.

Maybe a good question to pose is, could a proof of work blockchain be reasonably implemented on a holochain?

pauldaoust · January 6, 2020, 9:40pm

@rlkel0 something I learned recently is that when you validate an entry, you’re actually validating a commit — that is, you’re validating one agent’s act of adding the entry to their source chain. That means that the data passed into the validation function is supplied by the entry author and doesn’t reflect the state of the entry on the DHT (e.g., headers from other agents that authored the same entry). And that in turn means that you can’t enforce uniqueness of authorship in the validation function. And even if you could, the DHT isn’t guaranteed to be in a consistent state because data takes a while to move around.

So FWIU what we’re planning is:

allow app devs to specify a uniqueness constraint (e.g., only one author per entry)
indicate whether an entry has reached ‘saturation’ (a quorum of validator signatures), before which an entry shouldn’t yet be considered accepted by the network
provide a conflict resolution function in case (a) two entries clash before either of them reaches saturation, or (b) two entries clash after a partition has healed. This function may have a simple rule (e.g., award the name to the agent with the lowest public key) or require user intervention (e.g., start an auction for the name), but the important bit is that it unambiguously resolves the conflict for anyone who is able to see both entries and any follow-up information (which should include all honest nodes that aren’t partitioned).

You’ll see that Holochain does whatever it can to avoid global consensus, instead opting for rules that let everyone decide for themselves which data is correct based on their copy of the DNA. Consensus on rules, not data This is what prevents the DHT from being spammed with meaningless entries — as you say, use some validation logic to stop them.

Actually, a PoW auction might be a neat way of automatically resolving name registration conflicts. Mine until you win the name.

guillemcordoba · January 18, 2020, 5:15pm

Hi @pauldaoust! Can’t you configure the entry’s validation package to be the full source chain, and check the agent address of the person commiting the entry? At least this is what I’m doing in some validation rules, it works well since its deterministic and available in any happ, though you lose performance since the entry’s metadata can be much bigger. Would be this be recommended by the core team?

pauldaoust · January 22, 2020, 6:35pm

@guillemcordoba you can indeed check the agent ID by including the full source chain in the ValidationPackage, but you’re right; it may be more data than necessary. You can check the author ID by looking at the validation package; for every type of EntryValidationData (Create, Update, Delete) the struct will let you access the author regardless of what kind of PackageCreator you specify. Here’s how you can access it:

validator: |v| -> {
    match v {
        Create{ validation_data, .. } => if validation_data.package.chain_header.provenances[0] == AGENT_WHO_IS_ALLOWED_TO_WRITE {
                Ok(())
            } else {
                Err("No write permissions")
            },
        // repeat for Update and Delete
    }
}

Some things to note:

provenances is a list of Provenances, or agent ID / signature pairs, of the agents who signed this entry
The first item in provenances is always the author of the commit
The other agents are not people who happened to commit the same entry to the DHT; they are signatures collected by the commit author (e.g., for a multisig transaction)

This guarantees that the author of an entry is allowed to write it, but there’s no way, at validation time, to determine whether they’re the only author of the entry. That’s because the validation data is supplied by the author, not the DHT. For uniqueness guarantees on the authorship, you need either a coordination protocol (whether that’s trustless like blockchain or trust-based like a central notary agent) or some sort of conflict detection fallback (not yet implemented but planned).

guillemcordoba · January 22, 2020, 7:10pm

Oh thank you so much for this, was not aware and totally changes things (I was aware of the provenances, we can get them easily with sources()). This makes it trivial to know who commited the entry.

Would be awesome to have these nuances referenced somewhere… Or maybe there is and I haven’t seen it. Anyway thanks!

pauldaoust · January 22, 2020, 7:26pm

I know; this is a gap in the documentation. There are a lot of gaps in the documentation right now, unfortunately.

btakita · February 5, 2020, 1:44am

I’m wondering about edge cases for this approach re: identity. Is AGENT_ADDRESS specific to the device that runs the node? How do we ensure that the nodes is run by a specific person? Is there a login?

If a person loses their device, what would recovery of their identity look like?

pauldaoust · February 12, 2020, 10:57pm

@btakita all very important concerns, but they’re out-of-scope for Holochain itself. A couple basic answers:

AGENT_ADDRESS is the device-specific public key, so one human being might have multiple agent addresses representing them.
To ensure that the node belongs to a person, they’d have to provide proof of personhood. This proof should be part of the AgentID entry and validated by existing DHT participants. If validation fails, participants refuse to talk to the new agent (don’t know how much of this is implemented yet). The AgentID entry has a nick field that’s populated by the name line in the conductor config block that defines the agent (don’t ask me why they’re not both called name or nick). This is available to the agent ID validation function. This is probably an abuse of that field, but it’s all we’ve got for now
No logins, which IMO is a UX plus.
We’re planning on building a source chain backup app. It would work like this:
1. Regenerate private key from a seed phrase.
2. Rebuild source chain from a backup (the DHT already holds all the headers and public entries, so all they need is their private entries).

btakita · April 28, 2020, 7:31pm

@pauldaoust Thank you for the clarification. How do I access the current AgentID struct to get the nick field?

pqcdev · March 9, 2021, 7:48am

@dellams @lucast @walter.almeida @Capri @nphias @kristofer

RSM using agent_info ?

dhtOps DeepKey

and capability grants for CRUD access ?

-enhanced entry_defs features
-create any entry type. create_link to hash address
-generate_cap_secret

-QUIC instead of TCP/IP
-share cached WASM
-WASMER

@thedavidmeister @Connoropolous @guillemcordoba @nphias please share other key RSM Update highlights

pqcdev · March 9, 2021, 7:53am

@btakita maybe @guillemcordoba can advise

DefaultJSON becomes SerializedBytes
standardized interchange across all of these boundaries with a new binary format that still leverages existing standards (using MessagePack)

entry_defs now require some additional fields such as visibility (public/private) and num_validation_receipts (how many validation receipts are required to build a receipt bundle).

change to headers:

"The largest change to internal data structures in this version of Holochain is a shift in the importance of headers. Previously, each new addition to a source chain was a header-entry pair. The header tied the new entry into the chain by referencing the entry hash and created the chaining effect by referencing the hash of the previous header. In the new version of Holochain, we’ve made header structures more sophisticated such that the system data is embedded directly in the header , meaning that all system-defined entries – except for agent keys and private entries such as capabilities grants – no longer need entries at all. "

//only required validation callback

#[hdk_extern]
fn validate_vote (_vote: Vote, base: Entry) -> 
ExternResult<ValidateCreateLinkCallbackResult> { 
if let Ok(_) = photo::try_from(&base) {  //Hash Address Base ?
Ok(ValidateCreateLinkCallbackResult::Valid) {
else if let Ok(_) Comment::try_from(base) {
Ok(ValidateCreateLinkCallbackResult::Valid) {
else {
Ok(ValidateCreateLinkCallbackResult::Invalid(
"votes can only be photos or comments".to_string() //Serializedbytes ?
  ))
}
}
// capabilities based security model
#[hdk_extern]
pub fn add_cap_grant(secret: CapSecret) -> ExternResult<headerhash> {
let mut functions: GrantedFunctions = HashSet::new();
let this_zome: ZomeName = zome_info!()?.zome_name;
functions.insert((this_zome, "find_things".into()));
Ok(create_cap_grant!(CapGrantEntry {
tag: "".into(),
access: secret.into(),
functions
})?)

//more concise HDK
#[hdk_extern}
pub fn find_things{
_: ()
) -> ExternResult<ElementVec> {
let elements: ElementVec = 
query!(QueryFilter::new()  // Chain Query
.header_type(HeaderType::Create))?; // Note Header Mechanism Changes
Ok(Elements)
}

@spirit

pqcdev · March 9, 2021, 9:22am

" data representation has been massively simplified in rrDHT such that a node represents the range of addresses it is responsible for with a single 32-bit integer . Now, if you know the address of the node and its arc range, you know exactly what range of addresses you can ask it for. This simplifies the codebase and architecture significantly, while providing performance characteristics on par with complex, binary-tree representations of the DHT space."

thedavidmeister · March 9, 2021, 10:25am

wasmer actually, not wasmi

guillemcordoba · March 9, 2021, 10:45am

Actually if the entries are public CRUD “access” should be controlled by the validation rules, not with capability tokens (at least in the normal just-publish-to-DHT case). Capability tokens say: “hey if you want you can call this function in my node”, so they don’t actually validate that some agent is able or not to do an action in the DHT.