All posts
7 min read
How Suward routes traffic
A look under the hood at how a single API key reaches 29 chains, the failover model, what we cache at the edge, and where the latency budget actually goes.
A look under the hood at how a single API key reaches 29 chains, the failover model, what we cache at the edge, and where the latency budget actually goes.
The marketing line is "one API key, 29 chains, no provider sprawl." That's true. This post is for the engineers who want to know how it actually works under the hood — what the request path looks like, where the latency budget gets spent, and what happens when a provider is down.
When you POST https://ethereum.suward.com/<YOUR_KEY> with a JSON-RPC body, the request walks through four layers:
End-to-end, a warm-path eth_blockNumber from Frankfurt to a German upstream is about 15 ms. A warm-path eth_getLogs over a 5k-block range is more like 80–300 ms depending on result size, with most of the budget at the upstream.
The router is the differentiating piece. Per chain, we maintain a pool of upstream providers — typically 2 to 5. For every incoming call, the router picks one based on:
trace_* and debug_* need a node with the right tracer compiled in; some upstreams strip them.eth_getLogs 30% slower than the rest of the pool for the last 60 seconds, the router weights it down.gateway_timeout all count differently. A provider that recently returned a few invalid-block-tag errors gets weighted up (because that was the client's fault, not the upstream's), while one that returned 502s gets weighted down.The decision is per-call and stateless on the request hot path. Picking takes microseconds.
If the chosen upstream returns a retryable error (5xx, timeout, network reset), the router retries against the next-best candidate. The whole retry budget for a single user request is 2 attempts and 8 seconds wall clock — after that, we surface the failure to you with a structured { error: { code, message, attempts } } response so your client can decide what to do.
We deliberately don't retry forever. Indefinite retries cascade into upstream overload during a real outage. Two attempts catches transient flakes; anything beyond is a real failure and you want to know about it.
We cache aggressively at the edge for read methods that are safe to cache. The big wins:
eth_chainId — TTL 24 hours. The chain ID doesn't change.eth_blockNumber — TTL 1 second. Aggressive but bounded; you'll never see a value more than a block behind.eth_getBlockByNumber(finalized=true) — TTL 1 hour. Once finalized, the value is immutable.eth_getCode — TTL 24 hours per address. Contract code doesn't change unless redeployed.About 60% of typical read traffic hits the cache. That's also why your eth_blockNumber heartbeat may return the same value across two consecutive calls a few ms apart — that's not a bug, that's edge caching saving you a CU.
We don't cache state-dependent methods (eth_call, eth_getBalance, eth_getLogs with non-finalized blocks, eth_getTransactionReceipt for unfinalized txs). The TTL would have to be zero to be correct, and zero-TTL caching is just overhead.
The router's health check is two-tier:
eth_chainId every 5 seconds. Latency + success ratio feeds the EWMA.If an upstream's success ratio drops below 95% over a 60-second window, it's marked degraded and gets only 10% of new traffic until it recovers. If it drops below 50%, it's marked down and gets 0%. Recovery requires three consecutive successful probes plus a 30-second cooldown.
In a real outage (single provider, e.g. Infura goes down for Ethereum), the failover is invisible to you — traffic shifts to the remaining pool members within a single rate-limit window. The thing you'll notice is a small (5–15%) latency bump until the surviving providers warm up.
For a typical EU → EU call:
| Stage | p50 latency | p99 latency |
|---|---|---|
| Edge (TLS, route) | 2 ms | 8 ms |
| Auth + KV lookup | 1 ms | 4 ms |
| Router decision | <1 ms | <1 ms |
| Upstream (chain) | 8 ms | 60 ms |
| Response + egress | 2 ms | 20 ms |
| Total | 13 ms | 92 ms |
The upstream node always dominates the p99 — which is why we maintain redundancy at that layer and not at the others.
A few things on the near-term roadmap that we're already building:
eu-central-1 doesn't ripple to US traffic.eth_subscribe is currently pinned to a single upstream per connection. We're working on transparent reconnection on the server side so a provider blip doesn't drop your sub.flashbots_*) by enabling them on your account.Questions on the architecture? Reply to this post on Twitter or hop in our Telegram. We post incident reports there too.