Celo Discord Validator Digest #4
Uptime score, missed signatures, useful info and new community tools.
One of the troubles we have faced as a validator for Celo is keeping up with all the information that comes up in the Celo's Discord discussions. This is especially true for smaller validators whose portfolios include several networks. To help everyone stay in touch with what is going on the Celo validator scene and contribute to the validator and broader Celo community, we have decided to publish the Celo Discord Validator Digest. Here are the notes for the period of 11-17 May 2020.
Discussions
Use of ledger
james | censusworks shared his feedback on using Legder:
ledger is great for any high-risk keys that are difficult/impossible to change (eg., RG beneficiary keys).
ledger is a pretty bad fit for any keys you want to rotate or use in automation (e.g., group signer and vote signers).
Even though you can store the group signer, group vote signer and validator vote signers on the ledger (I do this), it makes flows like voting on governance with all your validators very difficult, and actually discourages key rotation of these since it's difficult to automate and burns a slot in your ledger keyspace (by "very difficult" I just mean lots-of-button-clicking, it's not terrible).
There's also the issue of waiting 10+ seconds to run a celocli command with
ledgerAddresses=10
although that can be avoided by caching the index of the key you want to use
asa | cLabs noted that the cLabs team was working on a feature that would enable celocli
to "remember" derivation path of a ledger key:
I believe Ponti | cLabs has a PR out for review (or merged?) that adds the ability for the CLI to remember an address -> key derivation path mapping, so you don't need to specify the
ledgerAddresses
flag or wait those extra 10s.
Uptime score calculation:
Rob | Polychain raised a question of whether 13 missed blocks counts as 1 instance of downtime or 2:
Does anyone know if the liveness score is computed as an average over a sliding window of 12 blocks (so e. g. 13 sequential missed blocks would count twice) or if this is calculated over sequential groups of 12 blocks so e. g. 24 missed signatures counts twice, but 23 misses would only count once?
@timmoreton | cLabs: It's a sliding window -- so if you miss 13 sequential blocks, that counts as 1, 14 counts as 2, 24 missing counts as 12 etc. You basically get a minute's grace, and every time you sign a block again, you get another minute's grace.
Some folks were confused if 12 missed blocks count as downtime so Tim had to look up the parameter in the code:
OK from a quick check I believe you have uptime not incremented if you have not signed in any of the previous 12. So the 13th block would be effectively downtime. This is the line if you're interested: https://github.com/celo-org/celo-blockchain/blob/a258abb20e2cf342603b388bb3f64faa03057591/core/blockchain.go#L1307
He also noted that:
You don't get penalized in the first 12 blocks or the last 1 block of an epoch.
Answering a question if the parameter could be changed by governance, asa | cLabs said:
I think it can be made governable in the future without too much trouble, but yeah, not governable right now.
Missed signatures:
The missed signatures saga still goes on as it keeps affecting numerous validators on a regular basis. BisonD suggested that the issue might be linked to AWS US West 2 region:
I have independent confirmation from other validators in the group of those with sporadic missing signatures that their nodes are located there as well.
I cannot yet say whether this is an AWS thing or a regional thing or even if there is a single problematic actor in that cloud/region that is causing issues.
If anyone is running a validator in, say GCP west, please DM me if you are seeing issues as well.
Would like to understand if this affliction is geographic or cloud-based.
SlavaMo however noted that other providers and regions were affected too:
... my servers (dedicated, not vps) are located in Hetzner's Helsinki dc and I was also affected by this.
On Friday, we witnessed a huge drop in signing rate for many validators:
In Sami&Helga's opinion, validators that missed signatures were in a cluster with low latency between each other. victor | cLabs, though, thought that it was not the case:
As long as everyone is operating correctly, latency between that DC and and the other validators would need to be greater than 5s for their signatures not to be included in the parent seal. So it shouldn't cause any issues.
Maybe something caused their connections to break and it took some time for it to reestablish? Peering is an open TCP connection, so if something were to happen that killed this connection (e. g. an IP address change) it would need to be reestablished, which may not happen immediately. It's a shot in the dark, but it may be worth looking into. 🤷♂
syncnode (George Bunea) assumed that it could have been an attack. victor | cLabs didn't rule out such a possibility.
Peter [ChainLayer.io] shared an error message that appeared on their validators during the non-signing periods:
WARN [05-06|12:21:58.858] Would have sent a commit message for an old block address=0x7F4afdAE66b590a90F2250e2D78bD27b4294a5dD func=handlePreprepare tag=handleMsg from=0x4A03C4c2E101AC4612d89b79f61c9C5BDd51929D cur_seq=238781 cur_epoch=14 cur_round=0 des_round=0 state="Accept request" address=0x7F4afdAE66b590a90F2250e2D78bD27b4294a5dD msg_num=238778 msg_hash=dd063b…b54516 msg_seq=238778 msg_round=0
alchemydc noted that this was also the case for their validator but not during all missed blocks:
... What we see is that missed signatures are SOMETIMES correlated with 'Would have sent a commit message for an old block' warnings from geth, but that these messages are ALWAYS correlated with a missing signature (within a block or two). Said differently, every time we see a 'Would have sent a commit message for an old block' we see a missed signature or two, but not all missed signatures are correlated with that warning.
SlavaMo shared his observations:
Today my validator have missed 400+ blocks which are forming a pattern. Mostly blocks are missed with the same miners:
46 0xc6f916ad6e360651bb95f8e67c1c28805745d084
40 0x5cab520442de9babc290b25e5e2e6a1194ec6707
38 0x341dec14b7a56c242ce9cf939815ec7bb1104244
36 0x43882141555003b3e71110f567373b59ac4cb0bd
32 0x6632c91b891e26229c86c59340917591be732028
32 0x2eb79345089ca6f703f3b3c4235315cbeaad6d3c
28 0x4e986af5c4796432bfa2a3fb7ade1b0e9cda988b
25 0x7eec94733d16b96c6fe877464630bb5be1e5c3f2
22 0xd8c68ebecb6f074ac5c4fb66a690ac0ad38a5a3c
22 0x89457b24524c94f97c0cc03f794c3e80b3dfd30b
21 0x4cb90ebba92141ed3021f5dc4e6c8bb642095846
18 0xb952930a3656a9cbab21df5919f94c61a495bf79
12 0x43599e7cfad195f39826646466782c230ac51595
11 0x5f897eed6797c0e7dfce3d97c48bf7ed032ee321
3 0x56259f876eb6a7264d9f1a59952baad599ff9640
2 0xdd0f3f7beb37fe9d4496f8098446b65ddfb1fa02
2 0xd507309fd69635aa37810a65a4da27ec47a1ba05
2 0x198958f0b860ab0e3937f468fe366aac9eebad2e
1 0xbc6963fc0e2f5547ba949ed39e80b8388321104f
1 0xb166964c69a4fb4c218c577cb227e02df7a65422
1 0x90684bc3ded2f69d8853d791bdc57eea0a84c9d0
1 0x3edbd914a59a79417d0274fad7d80e39f8b219f7
1 0x09b353c4e4c4d836b4a4ca4686370b78bc7a2152
Useful info
The new image for attestation service with better privacy features was pushed on May 11:
@nambrot: Rossy | cLabs has lead a great effort to improve the privacy of the phone number mapping. We will hopefully have more documentation on this soon but in effect there will be a service that generates deterministic salts for a given phone number. It exists to prevent the large-scale harvesting of phone numbers in the previous version of the identity protocol. For that, there was a breaking change in the attestation service to have the user convey the salt to the attestation service. Additionally, the attestation service no longer stores the phone number in the DB for added privacy and actually periodically will vacuum the DB for older attestation requests (great work by aslawson | cLabs).
timmoreton | cLabs started collecting useful metrics, tools and best practices for monitoring nodes. You can share yours here.
nambrot published a proposal on Metadata and off-chain storage: https://github.com/celo-org/celo-proposals/pull/11
According to marek | cLabs, the stability reserve will include roughly 20% ETH and 20% BTC, and the reserve addresses will be public.
brian | stakevalley.com recommended that operators running any "mission critical nodes" should use machines with over 4GB RAM:
Hey all - noticed that Celo nodes tend to stabilize right around 4GB RAM usage after they've been running for a while.
Would recommend any mission critical nodes (which includes attestation) run on machines with more then 4GB of RAM. I remember during the stakeoff people were using smaller instances for attestation, but I wouldn't recommend it.
Community
The CeloWhale.com by dsrv labs now provides telegram alerts for Celo Mainnet RC1 and Baklava. You can get a notification when there is a newly registered/unregistered validator and when score of validator decreased. Coming soon: an alert when validator's signature rate falls below 90% for the latest 120 blocks.
zviad | wotrust.us presented a new tool/service at https://celovote.com/ that makes the voting process for cGLD holders much easier:
It is still in alpha since we are not in mainnet yet, but you can connect your ledger and get addresses without needing to install
celocli
or anything complicated like that. It provides support for ReleaseGold contracts so you will be able to lock gold and delegate your voting directly through web UI without needing to install anything locally.... There are no fees or commission (there isn't a way for celoVOTE to charge users anyways), not only that, the service actually pays for gas cost for all voting transactions that it does too.
As for governance, celoVote will never vote on any governance issues on behalf of the users. For regular addresses, users can still vote for governance stuff by using the address directly, for ReleaseGold contracts, CeloVOTE will expose a way for users to proxy their governance votes through the service. But all of that will be manually authorized by user, nothing automatic ever happens with governance.
Ryabina.io and warfollowsme | Celomap.io created a new Telegram bot for Celo that tracks any event with any Celo account. The bot has numerous pre-configured notifications for regular users, validators, group or governance events. Check it out here: https://t.me/Celo_Ryabina_bot
Like what we do? Support our validator group by voting for it!