We want Osmosis users to have a snappy UX. There are several strategies to achieve this:
- Making block proposers able to give users “pre-confirmations”, and frontends display conditional executions based on this
- Making mempool based trade execution systems, and making the frontend update once the user-tx lands into the mempool
- Getting users to make multiple independent txs at once
- Lowering block times
These are all great ideas, but “hard” guarantees will always depend on block time, and there is low hanging fruit here. So in this thread I would like to discuss whats the blockers to lowering block time, and strategies to getting there.
Suggested plan
- (Already done) Make Osmosis state compatible releases with significant speedups, including IAVL v1.
- Make an Osmosis (state compatible) minor release with IAVL v1, which also lowers the block time from 5s to 4s for validators who upgrade. (Expected observed block time becomes 4.5s due to not-all-of-the-valset-upgrading)
- In osmosis v24.x, lower the block time to 3 seconds, and monitor for 1 week.
- If no issues are observed, make a (state compatible) minor release that lowers it to 2.5 seconds, and monitor for 2 weeks.
- If no issues are observed, make a (state compatible) minor release that lowers it to 2 seconds, and continue monitoring. At this time assess if further lowering to 1.5 seconds feels prudent.
If at any point issues are observed (either the ones listed as barriers, or new unforeseen issues), we stop reducing the block time, and elongate it slightly in a new minor release.
Note that at today’s achieved sync speed on v23.0.4, we would be 6.6x faster sync rate than block production w/ 2s blocks. We predict sync rates to be significantly faster on v24.x.
Please comment on how you feel about this plan!
Barriers to lowering the block time
Its been pretty clearly proven across Cosmos chains, that CometBFT works at achieving consensus in widespread globally distributed validator sets at 1.5-2s blocks. So whats stopping us from going to 1.5 second block times right now?
My overarching framing is: Osmosis nodes were doing not-that-well on node stability until sometime in December. Many long standing problems have gotten fixed though, with fewer nodes randomly crashing, and reduced peering issues. [^1]
As we lower block times, we should not degrade these from the more stable spot we just achieved. I perceive the barriers to lowering block time as:
- What is the sync speed for nodes in the network
- What is the disk growth rate
- What is the latency to process a new block
- Correlated: latency to get block data (e.g. events) streamed elsewhere
- How well can existing infrastructure serve queries while blocks are processing
Next we detail suggested parameters for each of these, and then my perspective of where we were in ~December, where we are now, and where I think we will be in v24.
Suggested requirements to maintain for lowering block times
- Sync speed rate is at minimum 5x faster than block production rate.
- Archive nodes grow by at most 35 GB / day
- RPC nodes that are maintaining one week of state, grow by no more than 20 GB / day if pruning were disabled
We should also generally “monitor” that query serving infrastructure is able to handle the higher block production speeds. There are no theoretical issues or breakthroughs that are needed to handle this, but there could be API’s we discover that wait for a block to fully process, that instead need to switch to live “streaming” data out of a node while its processing a block.
Rationale for requirements
Sync speed
At the beginning of v22, the sync speed on Osmosis had gotten quite slow – 0.8 blocks per second (bps), and this was at ~5s block times. This meant that syncing was 4x faster than block production, and it felt quite slow.
To contextualize this, a sync rate being x
times faster, means that in one hour of syncing, I am (x - 1)
hours closer to the head of the chain. A typical situation is people recover from daily snapshots, so lets say average case 12 hours of sync time. At a 5 times faster sync rate, it will take 4 hours to catch up to the head of the chain, which is still slow but is the functioning state of affairs today.
As we seek to lower block time, we must also remember that gas and resource bounding in Cosmos is not well designed, so its very prone to accidental or malicious patterns causing slower block processing. (E.g. the Bananaking attacks we saw in December caused massive increased disk load at cheap gas. We have since remedied that particular vector). This suggests conservativeness is required around sync rates.
Disk Growth rate
From disk growth rate benchmarks I have access to, I’m seeing that archive nodes are getting ~30GB of new data storage per day, and standard full nodes are getting ~14GB per day on v23.0.0.
I think its actually ok for these numbers to be higher at faster block times, but we should definitely work towards making it all “useful data”.
On v23.0.4 with IAVL v1, we are seeing this reduced at:
- Archive node: ~24GB/day
- Full node: ~18GB/day
In v24.x, we are removing many of the state writes per block and many of the events being written to disk. I am expecting both to be 2x reduced, but there are many non-linear overheads and transaction load affects this greatly. Conservatively, I am expecting this to reduce disk rate growth by a further 25%.
Latency to process a new block
I am not tracking active benchmarks for this to know where we are now. (We know consensus gossip time + CPU time right now just by looking at block times at least)
However most of the work done thus far on sync speed improvement has been in lowering the block execution time. Furthermore, Osmosis / CometBFT block sync does not do parallel block processing. Thus we conclude that we are genuinely reducing block processing latencies.
How well can existing infrastructure serve queries
This one isn’t as easy to track. Query serving infrastructure should not be blocked during block processing. As block times are gradually lowered, infrastructure providers can report if they are seeing increased amounts of query failures.
Where is osmosisd
on these fronts?
There has been a lot of ongoing work on a number of performance improvements. On Osmosis mainnet using v23.x and standard peers from the “default address book”, we achieved a sync rate of 2.2 BPS on IAVL v0, and an IAVL v1 branch achieved 3.3 BPS. On empty blocks, we are approaching 5BPS right now.
v24’s branch has a number of block processing speed improvements, that we currently anticipate would improve average block processing times by another 50%. (We do not have a good way to load test this with the mainnet cosmwasm workloads, to truly know alas) This would get us to average sync rates hypothesized to be in the 4-5 BPS range.
We do not yet have newer disk growth rate benchmarks, but the performance work is conjectured to notably lower the disk growth rate for the following reasons:
- IAVL v1 uses LevelDB far better, so will lead to less “wasted” data on disk
- We have removed Events from the SDK tx logs (as is done in SDK v0.50), which should ~halve the size of tx responses written to disk
- in v24.x the amount of state writes per block is dramatically reduced. (600 writes per block)
- in v24.x the amount of events and state writes per swap is notably reduced
So the conjecture is that even though we will produce more blocks per second, the disk growth rates should not grow beyond acceptable bounds.
Risks
- We could potentially see increased peering issues at lower block times that hamper nodes from being able to sync
- I haven’t heard reports of this from other chains at low block times, but it could be the case
- If Osmosis gets more filled blocks, its possible sync speed degrades significantly under more load
- I think this is a real risk, but I also think our current sync speeds give us a lot of “slack” room for the system getting unexpectedly slower
- There has not been enough testing of IAVL v1 performance with “live” migration from IAVL v0
- Performance testing with significantly improved speeds IAVL v1 speeds depend on “IAVL v1”-only databases, e.g. from copying a snapshot
- However state gets updated to the IAVL v1 format after every write, so most state will get the improvements.
- New nodes should hopefully be syncing from pure V1 snapshots, or state syncing, which will fix this problem.
- Performance testing with significantly improved speeds IAVL v1 speeds depend on “IAVL v1”-only databases, e.g. from copying a snapshot