Lower blocktimes in accordance with Sync Speeds

We want Osmosis users to have a snappy UX. There are several strategies to achieve this:

  • Making block proposers able to give users “pre-confirmations”, and frontends display conditional executions based on this
  • Making mempool based trade execution systems, and making the frontend update once the user-tx lands into the mempool
  • Getting users to make multiple independent txs at once
  • Lowering block times

These are all great ideas, but “hard” guarantees will always depend on block time, and there is low hanging fruit here. So in this thread I would like to discuss whats the blockers to lowering block time, and strategies to getting there.

Suggested plan

  • (Already done) Make Osmosis state compatible releases with significant speedups, including IAVL v1.
  • Make an Osmosis (state compatible) minor release with IAVL v1, which also lowers the block time from 5s to 4s for validators who upgrade. (Expected observed block time becomes 4.5s due to not-all-of-the-valset-upgrading)
  • In osmosis v24.x, lower the block time to 3 seconds, and monitor for 1 week.
  • If no issues are observed, make a (state compatible) minor release that lowers it to 2.5 seconds, and monitor for 2 weeks.
  • If no issues are observed, make a (state compatible) minor release that lowers it to 2 seconds, and continue monitoring. At this time assess if further lowering to 1.5 seconds feels prudent.

If at any point issues are observed (either the ones listed as barriers, or new unforeseen issues), we stop reducing the block time, and elongate it slightly in a new minor release.

Note that at today’s achieved sync speed on v23.0.4, we would be 6.6x faster sync rate than block production w/ 2s blocks. We predict sync rates to be significantly faster on v24.x.

Please comment on how you feel about this plan!

Barriers to lowering the block time

Its been pretty clearly proven across Cosmos chains, that CometBFT works at achieving consensus in widespread globally distributed validator sets at 1.5-2s blocks. So whats stopping us from going to 1.5 second block times right now?

My overarching framing is: Osmosis nodes were doing not-that-well on node stability until sometime in December. Many long standing problems have gotten fixed though, with fewer nodes randomly crashing, and reduced peering issues. [^1]

As we lower block times, we should not degrade these from the more stable spot we just achieved. I perceive the barriers to lowering block time as:

  • What is the sync speed for nodes in the network
  • What is the disk growth rate
  • What is the latency to process a new block
    • Correlated: latency to get block data (e.g. events) streamed elsewhere
  • How well can existing infrastructure serve queries while blocks are processing

Next we detail suggested parameters for each of these, and then my perspective of where we were in ~December, where we are now, and where I think we will be in v24.

Suggested requirements to maintain for lowering block times

  • Sync speed rate is at minimum 5x faster than block production rate.
  • Archive nodes grow by at most 35 GB / day
  • RPC nodes that are maintaining one week of state, grow by no more than 20 GB / day if pruning were disabled

We should also generally “monitor” that query serving infrastructure is able to handle the higher block production speeds. There are no theoretical issues or breakthroughs that are needed to handle this, but there could be API’s we discover that wait for a block to fully process, that instead need to switch to live “streaming” data out of a node while its processing a block.

Rationale for requirements

Sync speed

At the beginning of v22, the sync speed on Osmosis had gotten quite slow – 0.8 blocks per second (bps), and this was at ~5s block times. This meant that syncing was 4x faster than block production, and it felt quite slow.

To contextualize this, a sync rate being x times faster, means that in one hour of syncing, I am (x - 1) hours closer to the head of the chain. A typical situation is people recover from daily snapshots, so lets say average case 12 hours of sync time. At a 5 times faster sync rate, it will take 4 hours to catch up to the head of the chain, which is still slow but is the functioning state of affairs today.

As we seek to lower block time, we must also remember that gas and resource bounding in Cosmos is not well designed, so its very prone to accidental or malicious patterns causing slower block processing. (E.g. the Bananaking attacks we saw in December caused massive increased disk load at cheap gas. We have since remedied that particular vector). This suggests conservativeness is required around sync rates.

Disk Growth rate

From disk growth rate benchmarks I have access to, I’m seeing that archive nodes are getting ~30GB of new data storage per day, and standard full nodes are getting ~14GB per day on v23.0.0.

I think its actually ok for these numbers to be higher at faster block times, but we should definitely work towards making it all “useful data”.

On v23.0.4 with IAVL v1, we are seeing this reduced at:

  • Archive node: ~24GB/day
  • Full node: ~18GB/day

In v24.x, we are removing many of the state writes per block and many of the events being written to disk. I am expecting both to be 2x reduced, but there are many non-linear overheads and transaction load affects this greatly. Conservatively, I am expecting this to reduce disk rate growth by a further 25%.

Latency to process a new block

I am not tracking active benchmarks for this to know where we are now. (We know consensus gossip time + CPU time right now just by looking at block times at least)

However most of the work done thus far on sync speed improvement has been in lowering the block execution time. Furthermore, Osmosis / CometBFT block sync does not do parallel block processing. Thus we conclude that we are genuinely reducing block processing latencies.

How well can existing infrastructure serve queries

This one isn’t as easy to track. Query serving infrastructure should not be blocked during block processing. As block times are gradually lowered, infrastructure providers can report if they are seeing increased amounts of query failures.

Where is osmosisd on these fronts?

There has been a lot of ongoing work on a number of performance improvements. On Osmosis mainnet using v23.x and standard peers from the “default address book”, we achieved a sync rate of 2.2 BPS on IAVL v0, and an IAVL v1 branch achieved 3.3 BPS. On empty blocks, we are approaching 5BPS right now.

v24’s branch has a number of block processing speed improvements, that we currently anticipate would improve average block processing times by another 50%. (We do not have a good way to load test this with the mainnet cosmwasm workloads, to truly know alas) This would get us to average sync rates hypothesized to be in the 4-5 BPS range.

We do not yet have newer disk growth rate benchmarks, but the performance work is conjectured to notably lower the disk growth rate for the following reasons:

  • IAVL v1 uses LevelDB far better, so will lead to less “wasted” data on disk
  • We have removed Events from the SDK tx logs (as is done in SDK v0.50), which should ~halve the size of tx responses written to disk
  • in v24.x the amount of state writes per block is dramatically reduced. (600 writes per block)
  • in v24.x the amount of events and state writes per swap is notably reduced

So the conjecture is that even though we will produce more blocks per second, the disk growth rates should not grow beyond acceptable bounds.

Risks

  • We could potentially see increased peering issues at lower block times that hamper nodes from being able to sync
    • I haven’t heard reports of this from other chains at low block times, but it could be the case
  • If Osmosis gets more filled blocks, its possible sync speed degrades significantly under more load
    • I think this is a real risk, but I also think our current sync speeds give us a lot of “slack” room for the system getting unexpectedly slower
  • There has not been enough testing of IAVL v1 performance with “live” migration from IAVL v0
    • Performance testing with significantly improved speeds IAVL v1 speeds depend on “IAVL v1”-only databases, e.g. from copying a snapshot
      • However state gets updated to the IAVL v1 format after every write, so most state will get the improvements.
    • New nodes should hopefully be syncing from pure V1 snapshots, or state syncing, which will fix this problem.
10 Likes

With this motivation the goal is to increase block frequency, not throughput, right? What’s your take on reducing block size/block gas limit as part of the process?

2 Likes

This is true! We can definitely lower gas limits right now, though I’m unclear on how well gas is tracking our true load right now. We actually have a heavy amount of writes that are every block. (600 writes/block). Other than some spikes from a Quasar contract, most blocks have well under 500 writes/block from txs. In v24, the 600 writes/block will go down to under 50, and the writes per tx will go down significantly. I actually want to propose that we raise the gas for txs / workloads that feel under-priced. (Which I expect would lead to the similar goal, by helping improve the mispriced components of our load)

Whats interesting, is that our sync speed load right now seems to be from:

  • Cosmwasm contract calls (35% right now, projected 50% next release due to speedups elsewhere)
  • IAVL commit (22.5% right now, projected 10% next release due to lower write load)
  • Protorev (10% right now, projected 6% next release due to speedups)
  • IBC light clients (8% right now, projected 12% due to speedups elsewhere. Gets 2x lowered on the move to SDK v50)

Things that seem under-priced:

  • Block gossip cost (bandwidth per byte)
  • Flat state write cost (many writes can be caused by a contract)
  • Iterator Next cost (Iterators are often mis-used in the go code and contracts at underpriced sync costs)
  • Something about cosmwasm contract calls is causing heavy sync speed slowdowns. We currently don’t have gas + CPU time breakdowns by contract to know what precisely is the cause. (NOTE: This doesn’t mean theres
  • IBC client updates not charging for signatures

The Store gas numbers are here right now: cosmos-sdk/store/types/gas.go at 24e7758ca3848b12306313a34e8201f31c44648b · osmosis-labs/cosmos-sdk · GitHub

I suggest we double the WriteCostFlat, and 10x raise IterNextCostFlat to 300.

I’m still looking for if theres a clean point in which we can add gas for updating client, have talked to folks on IBC about this topic. I’m still unclear on what to do for cosmwasm contracts. I am curious what happens after state writing /reading is more correctly priced. It does preliminarily appear that there is a high overhead to cross-contract or contract<>SDK calls, which would suggest adding a flat gas cost there.

4 Likes

I don’t have the benchmarks you can get with all this production data. Not strictly block frequency related but here are a few things we are working on that help improve the CosmWasm situation:

  • Cache settings: wasm.memory_cache_size should be set to a value that can hold the mayority of Osmosis’ contracts through docs (1000-2000 MiB)
  • CosmWasm 1.4: you can now use db_next_key/db_next_value instead of db_next to avoid loading and copying an unused value or key into the Wasm sandbox.
  • CosmWasm 2.0: addr_validate is now implemented as 1 call into Go instead of 2
  • cw-store-plus successor: Embrace shorter DB keys in storage helpers (Charging the same for key and value does not make sense to me. Looking at how those DBs work there should be a heavy punishment for long keys)
  • cw-store-plus successor: Use a more performant and compact storage encoding in contracts
  • feature idea: allow configuring libwasmvm to do bech32 directly in Rust in addr_canonicalize/addr_humanize/addr_validate instead of calling into the chain/Go.

Hope those things help a bit

2 Likes

The interesting part is that the approach proposed does not move to the 1.5s blocktimes in 1 step, but in iterations. I think that it is a very good move.

Regarding the need to do a 3-4x speedup, how are we on complete filled blocks at this moment? Are we hitting the (near) 100% mark on every block already? Are we already competing on gas costs (kinda an ETH-like scenario) where people compete to get on a block first and people paying lower are ok with waiting a few blocks?

We are already at 6-10s user experiences. For me that is not so bad, also comparing to some of the other big ecosystems. I can imagine the desire to go faster though. Will it also have effects on the systems the validators and node-runners will need?

1 Like

Our blocks are moderately filled. Just took an average of the last 400 blocks, and saw we had an average of 21M gas. Max block gas limit right now is 240M. We’ve also lowered the gas load for many core operations users want to do (e.g. swapping) in v24, and the real-time costs of doing these.

I would expect the gas load to have some pretty steady usage due to arbitrage.

Post proto-dank-sharding, our gas costs are fairly comparable to ETH rollups, but I think our gas charges are too high. (There is a floor gas price on Osmosis, that I think should be parameterized lower tbh)

We are already at 6-10s user experiences. For me that is not so bad, also comparing to some of the other big ecosystems.

I kinda feel like the 6-10s UX is pretty bad – Solana’s user stickyness, along with the single sequencer rollups feels like sub 2s. Which is quite good.

This may have effects on what systems validators and node-runners need eventually, but I generally expect that all validators and node operators should be able to operate as they are today with the software updates in place right now

1 Like

Thanks simon for the cosmwasm suggestions!

I’ll check back on what the wasm cache size is, I think this is in place.
These suggestions you made sound amazing! Especially the address logic staying in cosmwasm, and the gas changes. Had not thought about incentivizing shorter DB keys.

I hope we can get a flow to understand and improve the performance bottlenecks together soon!

1 Like

For me the base question is whether people use Solana due to short blocktimes or for different reasons.

Otherwise SEI should have been parabolic already with the sub-1s blocktimes.

So is the issue we are trying to solve blocktimes and how it feels for users?
Or should we get our improvements of ecosystem-usage and stickyness from other aspects?

3 Likes

omega bullish on lower block times

2 Likes

Are there plans to implement a custom db backend or native support for pebbledb?

Tessellated is supportive of lowering blocktimes in a gradual fashion. Given the general speedups we’ve seen in recent osmosis versions the network seems likely to be able to sustain faster blocktimes.

1 Like

Update here. A large number of validators upgraded over the weekend to the 4s block time config. We are currently running at 4.35 seconds blocks, and thus far I’ve not seen any issues reported.

This puts us at having done a 13% average block time reduction relative to the prior 5 seconds. The target of 3 seconds block at v24 upgrade is still a further 31% speedup relative to today!

3 Likes

I think short blocktimes / scalability memes is part of it. Ultimately its having a great decentralized productand ecosystem that people want to use, but scalability and blocktime are key variables in this. The claim of this post, is that the current software should be able to support much lower blocktimes without meaningful losses of decentralization / notable increased operator costs

2 Likes

A Osmosis v24 for 2024 !! :ok_hand:

2 Likes

We are now at ~3.1 second blocks as of v24 yesterday!

I also realized that a number of validators have misconfigured configs that lead to them proposing blocks with 0 to 1 txs. Folks have been communicating that there is likely a mempool misconfiguration, so hopefully more of the blocks get fully utilized as well!

Furthermore, per @czarcas7ic we are currently seeing average block sync speeds of 6.97 Blocks Per Second. This means that by the block sync rate metric in the above guidelines, were good to keep reducing block times all the way to .8s.

So as we lower to 2 / 1.5s, we have to watch out for other issues popping up

2 Likes

I’m not sure how you can make this claim tbh. It’s been already observed, from other networks like Injective, that physical network latency does not allow meaningful decentralization at < 1s block times.

Every Injective validator with acceptable uptime runs from the same 2-3 cloud providers in 5-6 data centers in Europe. Trying to run a validator from anywhere else around the globe has proven to be impossible.

DyDx network has followed the exact same trajectory, to the point that they now simply force operators to run from Japan.

What’s even the point of pretending to be “decentralized”, when all we’re doing is just giving money to the same cloud operators. Might as well just contract GCP or AWS to run all your nodes.

I think you misunderstand the goal? The goal is to lower the block time under the constraint of no loss of decentralization amongst the validator set.

Block time in Cosmos is (block execution time + block proposing time + block gossip time + two rounds of 2/3rds gossip) + "consensus sleep".

The parenthetical we have abundant mainnet data on across networks. For geo-distributed validator sets, it takes around .8-1.1s. Thus far manifesting the block time decrease has been lowering that consensus sleep time.

I think we can’t go below 1.5s block latency without cometBFT software improvements or sacrificing geo-distribution,. Sacrificing geo-distribution IMO should be deemed unacceptable. With Comet Improvements, I don’t see why it can’t be more like 500-800ms, but we have to see that as such improvements come in. And furthermore, throughput of blocks can be raised to ~1 global internet round trip (~300ms) via pipelined consensus or DAG BFT consensus.

I don’t think we’ve lost any decentralization in osmosis by the move to 3s.

I also think that after were at 1.5s blocks, the focus is better off elsewhere, so as to nto explicitly not sacrifice decentralization.

Our goal should explicitly not be like DYDX or Injective. Geo-distribution of the validator set is a property we shouldn’t compromise as we do this.

3 Likes

Max, if you are interested in using pebble, I know that Tuan has built out support for it with iavl v1. I haven’t done benchmarks to compare with goleveldb yet though.

We also have pebble snapshots for osmosis:

1 Like

I really like how we’re approaching this. Indeed, we can be quite fast, and global. If we use these as our constraints, we will arrive with an ideal solution.

1 Like

3s is fine. My main concern is trying to achieve < 1s block times. I can’t see the benefit for the networks that are already doing it, and can see only problems if Osmosis tries to go for the same.

If your goal is 1.5s and you’re willing to listen to validators in case it goes sideways, then I have no concerns :slight_smile:

1 Like