How to avoid duplication of entities when updating the and retrieve only the latest instances?

I have this:

 struct MyEntity {
     val name: String,
     val age: u64
 }

Then I do this in js tests:

let {addr1: Ok} = await alice.call("my_app", "create_my_entity", { ....});

//1
let res1 = await alice.call("my_app", "update_my_entity", {addr: addr1, age: 33});
let {addr2: Ok} = res1;

//2
let res2 = await alice.call("my_app", "update_my_entity", {addr: addr2, age: 23});
let {addr3: Ok} = res2;

//3
let res3 = await alice.call("my_app", "update_my_entity", {addr: addr3, age: 11});

Now this

let myEntities = await alice.call("my_app", "get_all_entities", {  });

will return a vector of identical MyEntity the same amount of times Iā€™ve updated it - 3 times + 1 instance for creation. Except ā€“ the Addresses will be different.

If I had, say, 5 different MyEntity and updated them several times, the function would return (5*times_each_one_was_updated). This isnā€™t what Iā€™d need, Iā€™d need only 5.

"get_all_entities" is implemented as `hdk::query("my_entity".into(), 0, 0)`

Q: How can I get it to return only the latest instance/version of each MyEntity?
In other words, after I update an Entity, it returns a new address and a new, updated record gets created, right? How can I retrieve only the updated, latest record of each individual Entity?

https://developer.holochain.org/api/latest/hdk/api/fn.get_entry.html

As far as Iā€™ve had experience, thatā€™s exactly what get_entry does, it returns the latest updated version of the entry.

Youā€™d need to share the handler youā€™ve used for the get_all_entities zome function

thereā€™s no get_entry(...) call in my code

Use hdk::get_entry rather than hdk::query

Note: hdk::query only queries an agentā€™s local source chain, and as @marcus implies, it will get all entries with the same entry type, regardless of whether they were initial entries or updates. hdk::get_entry will query your source chain and the DHT for only one entry by its hash ā€” if the hash you give is an old entry, Holochain will automatically follow the chain of updates until it gets to the newest entry. That is, if you:

  1. Store an entry
  2. Update it twice
  3. Call a theoretical get_my_entity zome function that uses hdk::get_entry three times, once with addr1, once with addr2, and once with addr3

it would always return the newest entry.

So how do you implement a get_all_entities function? It depends; do you want to get all entities from the userā€™s source chain or from all users across the DHT?

I donā€™t need one entry. I need all entries.

I need a list of ā€œmy_entitiesā€. All of them. I donā€™t need a single entity. How will ā€œget_entry(ā€¦)ā€ help me? Show me.

Iā€™ve shown that.

@alx I think we need more clarifying context to determine what these functions are trying to do. One question, and one clarification:

I need a list of ā€œmy_entitiesā€. All of them.

From across the DHT or just one agentā€™s chain?

So how do you implement a get_all_entities function?

Iā€™ve shown that.

Sorry, lack of clarity on my part. That was meant as a hypothetical question. What I should have said was ā€œGiven how hdk::query and hdk::get_entry work, how would we reimplement get_all_entities to work the way you expect? Well, it dependsā€¦ā€

I donā€™t know which. I call get_all_entities() as a zome function from js tests. I guess Iā€™ll need only agentā€™s ones for now ā€“ to ensure that all works via the js test. How do they differ? @pauldaoust

I know the end result I need:

Suppose I had a list of articles for a blog, Article being Entity.

What I want to get is a list of those Article to show on main page of a blog. Thatā€™s it.
I donā€™t need multiple versions of each Article ā€“ only the latest version of each Article

One agentā€™s chain will typically only hold their own actions in a network. (The exception is a countersigned transaction, which reflects another agentā€™s participation too.) The DHT holds copies of all agentsā€™ public chain entries and is a global view of the entire community of participants.

So it depends on what you want ā€” when a userā€™s UI calls get_all_entities() does the user expect everyoneā€™s blog articles or just their own? I suspect that this decision is important for ensuring that your JS test is testing something relevant to a business goal.

Hereā€™s how querying a local source chain differs from getting an entry from the DHT:

hdk::query() hdk::get_entry()
Retrieves my own data only Retrieves my data and everyone else's
Can retrieve multiple entries at once (data is local, so lookups can scan entire data set quickly) Can only retrieve one entry at a time (data is remote in unstructured hash space; lookups can only be done if you know the hash of the entry)
Retrieves deleted and obsolete entries Traverses revision history to newest version of entry (default; behaviour can be configured to retrieve deleted/obsolete entries to)

Now, you can do something very similar to hdk::query() on the DHT. First, a bit of background: if the DHT is just an unstructured, unqueryable hash space spread across many machines, you need to know the exact location of any entry in order to retrieve it. So how the heck do you do anything useful? You can link data together. Hereā€™s a writeup on how links on the DHT work and why theyā€™re necessary: https://hackmd.io/ZEOwR3aIQN-971lGYWo-zw?view

One thing the article doesnā€™t talk about is link ā€˜tagsā€™, arbitrary bits of content attached to links that let you do ad-hoc queries. The query language is pretty primitive right now ā€” just exact matches and regexes ā€” but it allows you to filter results or prefetch important bits of an entry without having to do 1+n lookups (which could get expensive on a DHT).

My suspicion is that, in most cases, youā€™ll want to show your own articles and other peopleā€™s articles with the same function. So perhaps your ā€˜get allā€™ function would have this signature:

fn get_all_blog_posts(author_address: Address) -> ZomeApiResult<Vec<BlogPost>>

and would work something like this:

  1. Get all the links of type author_to_blog_post that are attached to the authorā€™s address.
  2. Map over all those links, calling get_entry to get the blog content. Two notes:
    • This will by default get only the newest copy, regardless of whether itā€™s your article in your source chain or someone elseā€™s article on the DHT.
    • This is an expensive 1+n query, so if all youā€™re generating in the UI is a list, you might instead want to consider including everything you need for a summary in the link tag. Something like:
      {"title":"How to feed a bison","publish_date":"2019-08-27T15:36:00"}
      
      One thing I donā€™t like about hdk::update_entry() is that its abstraction isnā€™t very clean ā€” in the case of links, those links donā€™t get updated when the target entry gets updated. They still point to the old entry, which is fine if youā€™re calling get_entry() which follows the update chain, but not if youā€™re just relying on the link tag to still hold relevant information. If you want the link tag content to stay fresh, itā€™s up to you to delete the old link and create a new one on entry update.
  3. Return the blog posts (or just their summaries).
2 Likes

Suppose, there was only single author on the blog ā€“ me. No other authors could post there, and they couldnā€™t sign up for the blog either. How would do I retrieve the list of all the articles?

@pauldaoust

There are two ways you could do this. Thereā€™s an advantage to querying your own source chain because you know all the data you want is there, so itā€™s fast. But querying doesnā€™t automatically respect updated/deleted status, so youā€™d have to:

  1. Call hdk::query() to get the addresses of all the entries you want.
  2. Map over each result, calling hdk::get_entry() on each.
  3. hdk::get_entry() gives back an empty result for deleted items and follows the update chain for modified items, so filter out all duplicates and empties.

Iā€™ll try to.

After having filtered out empty items, all duplicated items will have a) identical values in their fields b) different Address/HashString s

right?

How will find out which item/Address is the most recent one?

They should have identical hash strings, because get_entry will have followed to the newest version. Oh hey! I have an idea for streamlining this even further:

  1. When youā€™re mapping on the addresses to turn them into entries, check the address of the entry you just received against the address you were looking for. If theyā€™re different, that means the original address is for an old version, and you could just return an empty result like get_entry does for a deleted entry. Hereā€™s an example of a closure you could use in a mapping function.
    | addr | => {
        let latest_entry_result = get_entry(addr);
        match latest_entry_result {
            Ok(Some(_, entry)) => match entry_address(entry) {
                // The address of the entry we received matches the address of
                // the entry we asked for. Return it.
                addr => Some(entry),
                // The addresses don't match; that means the entry has been
                // updated. Don't include the updated entry, because it would be a duplicate.
                _ => None,
            },
            // Either a None (entry has been deleted) or an error. Return None.
            _ => None,
        }
    

(Note, I haven't actually tried to compile this code, so there are probably syntax errors.)

By hash strings do you their addresses? No, I had all the address different. But all other fields were identical ā€“ they had the latest updated values.

Hm, you mean when you incorporated that mapping function I gave? Should be:

  • different addresses = entry address received by query is an old one; return empty value
  • same addresses = entry address received by query is a new one; return value received by get_entry