Limiting the size of my 'chain'

simwilso · November 16, 2019, 10:53pm

In part of my configuration I have a setup where entries are updating committing to the DHT every couple of minutes. But I only need to retain a copy of the last 3 at any point (I don’t need the entire history to be available).

Is there a way I can limit the size here?
What would be the correct approach to achieving that?

simwilso · November 17, 2019, 11:52pm

found this thread from Paul which tackles how to do this:

pauldaoust · November 19, 2019, 4:15pm

Hey @simwilso can I get more insight into the use case for this? I can think of lots of reasons that a source chain might not be useful, or might be an impedance to a larger goal (e.g., storage constraints) but I’d like to delve more into the specifics of your case. A long running source chain is useful because it allows you to rebuild state from a series of state changes. If you only need to retain your last three entries, maybe some other solution would be better than a throwaway DHT.

simwilso · November 20, 2019, 10:01am

Hey Paul this thread above is awesome help by the way!

The use case is; I have a range of price signal dht’s. All that each of these DHT’s hold is the current ‘energy price’ which is updated routinely (every 2.5 mins in my first example).
I then have a community of ‘energy devices’ who will select/subscribe and monitor the ‘price dht(s)’ that are interesting to them.
These energy devices (an air conditioner in my build) watches that DHT and if the ‘price’ moves above certain thresholds will shift/move/switch power.

The devices have some logic that requires them to see a very short history of the price moves (~7mins is enough).

The other reason I’m keen to limit the chain size is that the devices that commit these ‘price’ signals or the device reading from the DHT will often be IoT devices with really small storage and memory. This won’t always be the case and we can I guess run conductors only on the devices with enough but I’m keen to test and try different concepts we could use to adapt to different device types.
I’m going to try to replicate this throwaway configuration you’ve detailed but would love to get your thoughts/view if you reckon this is appropriate.

cheers - sim

pauldaoust · November 20, 2019, 4:23pm

Thanks very much for the extra context; it’s so helpful to understand your constraints. I expect you’ll want to tune the number of price signals — the balance here is storage requirements vs the cost of spinning up new DHTs. There is real overhead to spinning up a DNA instance (and maybe keeping it warm too; I don’t know); think of the conductor as a thread pool manager in a traditional web server.

Within a given pricing DHT, who is publishing signals to that DHT? Just a single node that has the authority to create the signals, or a group of nodes? With this throwaway DHT idea I would recommend a single publisher and many subscribers, just because otherwise it could become a big coordination hassle. It’s much simpler to give a single person the authority to say “okay, everybody pack up your bags and move to this other DHT”. But I’d warn again about the overhead of many tiny DNAs, cuz if each DHT only hosts one node’s signals, then an A/C would probably be subscribing to a lot of them at any given time, I’m guessing?

I want to rewind and ask, what sort of node is publishing these price signals? energy producers or someone else?

Here’s a suggestion, based on core/HDK features that I know are coming down the pike: if you do go with the single-publisher-per-pricing-DHT idea, it’ll be easy for the publisher to coordinate the migration of their subscribers to their new DHT. Just publish a ‘migration’ entry. Something like:

struct MigrationPointer {
    // You can't actually instantiate a DNA by hash alone; it doesn't tell you
    // where to find it (at least not until we have the HCHC DNA, which will
    // work a bit like a package manager). And anyway, subscribers will be
    // creating DNAs from a single 'pricing DNA' template, and those DNAs'
    // hashes will diverge from that template because you're poking in a
    // variable that makes them 'fork'.
    dna_hash: Address,
    // As an alternative to a UUID, maybe just use an incrementing counter; the
    // publisher starts on counter 0 and increments by one every time they
    // migrate to a new DNA.
    uuid: String,
}

simwilso · November 21, 2019, 4:09am

thanks Paul.
Yes that’s right it will just be a single publisher. For my initial build that is just a replay of the market price for energy. Meaning the ‘signal’ node is just a server that is relaying the price of energy from an API.
This use case probably doesn’t justify the use of a DHT but in future we see that these ‘signals’ will be published from many sources which will make the use of a DHT relevant. Some possible sources of these signals that ‘device agents’ will react to include:

a solar PV system or battery with energy to share
a transformer on the distribution grid (that may signal to it’s local community asking them to reduce/shift energy usage as it is overheating)
a person or entity in the community who needs power
a market signal from a retailer, grid operator or energy manager.
even a motion sensor in a home that indicates occupancy or not etc.

From the A/C agents perspective you’ve nailed it. The approach is that the AC will subscribe and listen to one/some/many of these signals (which I as the owner of that device will subscribe it to). When subscribed the A/C will act just like cells in our body in that our cells reacting or emit chemicals to their neighbors which invoke some sort of reaction.
The throwaway DHT seems like a nice fit here.
Having said that a big challenge I see is as you say the compute on some of the devices we want to adjust power behaviour for is really tiny. So one big problem I’m grappling with is how do I allow a really simple device agent to participate and listen/subscribe to DHT’s in a trusted way in this system. Like is there some sort of trust or validation approach that will allow me to include these really simple IoT agents in this system?

I should add that another part of the system and model will be inclusion in each agent of a mutual credit and currency features to account for actions. The idea of this bit is so that we can reward that A/C for listening to these signals, and eventually create a system in which clean energy is a tradable and redeemable currency.

that’s the goal!

pauldaoust · November 21, 2019, 5:04am

This sounds like an ideal use case for Holochain So it sounds like pricing is merely one signal; other signals are demand, oversupply, lack of demand, emergencies, etc. This is a really interesting use case for the ‘throwaway DHT’ idea; hadn’t even occurred to me to use it for something like this. What I like about it is that you aren’t throwing everybody’s signals into one global DHT — I suspect that might hurt performance if all these resource-constrained nodes were relying on other nodes halfway across the world to store copies of their data simply because they happened to DHT peers. So it would make sense to break it up into geographic DHTs, but then… where do you define the boundaries? And how do you create the necessary overlap that reflects the mushy reality of the grid, especially a P2P grid?

How are you intending to let nodes discover each other? Will there be a big global DHT for that?

What sort of trust guarantees does your model need? Does the subscriber need to trust the identity of the publisher, or vice versa, or both, or something else? I have a few thoughts boiling in my head as a result of talking with some people about this very problem a year and a half ago.

simwilso · November 21, 2019, 8:58pm

Cheers Paul it does feel like a really great fit.
That’s right the price is just a trigger and one of many that will sit within the ecosystem that a device, or community of devices react to. Exactly like you say above.
It may be a local or locally configured trigger like occupancy in a building, weather, time of year/day, or a broader one like a grid emergency of some sort…
The signals will come from different sources and also come at different intervals eg. an emergency maybe happens 5,6 times a year vs ongoing small price spikes which happen 100 times a day.
Similarly for the different use cases with these signals some may have a need to carry a little more history while others don’t care i.e. an agent listening/watching for occupancy might want to track 10-15 entries of ‘motion detected’ before doing something as opposed to one tracking pure electricity price who only probably cares about the next 5 minutes so also the history needs to be modular.

Re. the neighborhood and discovery questions it’s funny you mention that. This is one I’ve been trying hard to work out. Chatting with the Holochain guys, Mike G and internally it needs to work similar to routing domains but precisely the challenge you mention is; where is the boundary and how are they set?
What you suggest was the way I was heading in my mind to have probably not a top level global discovery DHT but possibly a local/community discovery DHT for each implementation and level.
My thought was that; we are building a UI for this as although I wish they didnt have to devices still need to be managed by an owner/energy manager to define their policy and the signals they subscribe to.
Terms of discovery the nice thing is that this ‘user portfolio domain’ could be used to define the boundaries of their local discovery.
Taking that a step up when enrolling a device in this UI the user selects a locale or community in which they sit (which would set a broader discovery domain like a suburb) in which there devices could discover see peers from other owners under certain circumstances… then there may be broader domains again for; region, state, nation, entire ecosystem etc. I’m not sure. Idea being that each which would have rules for agent discovery. Thinking is that bridging for certain agent types is perhaps a nice way to handle this.

Terms of trust this is the other big one for sure. The system we want will reward agents for good/efficient/clean and community oriented (commons) behavior. Those rewards are redeemable for real value so there does need to be a level of trust in the system ideally full trust.
But there are so many different device types that trust will be variable and subjective. The devices we are interacting with atm are via API so for them the trust we can give that they are doing what they say they are doing is low, down the track when we open the software the hope is people/devs will embed the agent libraries we create for this IoE directly in their devices so they will be more trusted… but still some devices will be just embedded chips whereas others will be more sophisticated so trust is never going to be one size fits all.
Reputation and validation therefore I think will be key to building as much trust as we can in the different integration models here to combat that variablility but I still need to map out how that will work.
Tom and I chatted about this before Barcelona and where going to plan a workshop to discuss validation models for the different scenarios. Can’t wait for that convo!
For me I think the inclusion of Reputation metric for each agent was an interesting way and this is how I’m bootstrapping it to test. Reputation is harder to game, but how to set that metric, who judges/sets it, and what are the entry/exit criteria is a really hard problem. I don’t know the answer but I think this is one of those things that might emerge through trying some different things in our first field testing

Trust in the signal providers is also a challenge, but again my plan/hope is that we can build a common library for reputation/currency solution for all agents in this IoE system. Then we try to maximize trust through validation rules which I want to make as simple and standardized as possible but I think will be a really unique value of the HC arch.
At the top level my thoughts/initial approach to setting up validation are something like; lets say I am an Airconditioner… I’m telling the system that I’ve turned off for 30% of the hot day and given the grid back X KWh. The system rewards me for that action in IoE currency and reputation value. But who knows if I am lying? I can say what I like here. My theory was that if this AC is a lone device the ‘trust’ will be low so it needs to start with a low reputation figure… If however say as the owner of that Airconditioner I by a IoE enabled smart meter and add it to my portfolio, there is now another device that can vouch for the AC’s claims… i.e. AC claims to have given the grid back X KWh, Smart meter validates that it indeed saw its expected usage decrees by X KWh… in that scenario both agents can be more trusted.
That could get very complicated but I think works.

Anyway, great to have your thoughts on this so thanks Paul!
Maybe you Mike G and I could grab a call to chat about it a bit more? would be cool to share with you the sprint we are trying to deliver and maybe if there was some interest see how your bandwidth or appetite is to help us nut out some of these things for it?

cheers - sim

simwilso · November 22, 2019, 11:02pm

Hi @freesig hope you had fun in Barcelona! Wanted to tag you in on this thread above. @pauldaoust and I have been chatting about his ‘throwaway dht’ model above which is really cool. I was planning to try implementing it tonight as it fits my model pretty well which is this:

In my setup I will have a number of different ‘signal’ DHT’s that energy devices listen to and act upon.
the devices providing the signal, and content provided by them will differ pretty wildly in my target state.
One big thing is how much of the history they can or need to store. With that in mind I’m looking to find a way that I can easily in each case configure the amount of entries held for the chain rather than defaulting to the entire history.
For my need in most cases I only really need 2-3 entries to provide agent surety but again my needs will differ for each particular ‘signal’ use case people write.

My question:
Paul’s model provides a way to achieve this but I’m wondering what your thoughts are on how this might best be done?

I’m also wondering if this might be a common consideration that might come up so maybe it’s a feature that could be considered or included in the macros for writing zomes if it isn’t already?

cheers - sim

pauldaoust · November 26, 2019, 8:18pm

Chewing on all this some more…

Trust

My hunch is that there will be two different kinds of embedded devices: ones that have the power to run a full HC node (e.g., RasPI Zero W) and really tiny embedded devices that send data points to a full HC node. In both cases, there’s at the very least enough memory and muscle to:

store a keypair
store a manufacturer’s certificate ‘blessing’ the keypair as legit
hash a data point
sign the data point with the stored private key

This might be a useful start for getting the device to prove authenticity of manufacture. How to prove that it hasn’t been tampered with, or that it isn’t a clone created by someone who figured out how to extract the private key from memory… Not sure Best to reduce the surface area, I think: have the signing happen as close to the sensor as possible, and have an additional sensor that senses compromise of the physical enclosure. Not sure; I’m not a hardware guy (although I like to think about it!)

Your ideas about non-binary, reinforced trust are also intriguing and very Holochain-esque

Discovery

I like that you’re already thinking about the difficulty of mapping a space of mushy boundaries to the sharp boundaries of a DHT. At one point there will probably have to be an oversimplification of reality just to make the discovery DHTs small enough. What’s the ‘user portfolio domain’? Sounds like an important concept for me to understand. Re: overlapping/fractally-sized domains, what sorts of rules for membership are you thinking of? And how are you thinking of applying bridging here?

One challenge that occurs to me now is how to point a subscriber to a publisher’s most recent signal DHT. Obviously this happens in the discover DHT, but how? If the publisher writes an entry in the discovery DHT with every migration of the publish DHT, and the publish DHT only has three entries per, then you’re still publishing one signal one entry for every three signal entries. There’s a storage savings, but it’s only 66%.

Another alternative might be to have a node discovery DHT, a signal DHT discovery DHT (yeah, I know, complicated), and finally the signal DHT. The mid-range DHT is thrown out every day, and only requires daily writes to the agent’s chain in the ‘big’ discovery DHT. Obviously you’d want to tune numbers like ‘daily’, ‘3 signals’, etc based on real-world observations… What do you think?

Other options are using node-to-node messaging and signals (of the Holochain type, not the pricing type), rather than published DHT entries, to find out what signal DHT an agent is currently publishing on. Can it be assumed that most publishing nodes will be online most of the time? If this is viable, the publisher agent would still have to have some sort of persistence so at least they know what DHT they’re using. If you’re not using a source chain for that, you’d have to keep it client-side somewhere — some sort of daemon, maybe, that knows what publishing DHT the agent is using and can respond to requests for that info. I presume the architecture already a daemon that automatically grabs signal info for publishing.