Context: Why We Reached for Workers + KV in the First Place

We run a public API with two tiers — free and pro — and needed per-API-key limits enforced at the edge, before requests ever touched origin. Cloudflare Workers seemed like the obvious place to do it: cheap, globally distributed, sub-millisecond cold starts. For state, Workers KV was right there, already wired into our account, and the docs made it sound like exactly what we needed.
The original design was simple, maybe too simple: a Worker reads a counter from KV keyed by API key, increments it on each request, and rejects with a 429 once the count crosses a threshold. It passed every test we threw at it locally. Requests came in, counters went up, limits kicked in at the right number. We shipped it and moved on.
The assumption we never questioned — the one that eventually cost us a very awkward Monday morning — was that “KV is a key-value store, it’ll behave like Redis.” It doesn’t. Cloudflare Workers rate limiting built directly on KV has a completely different consistency and cost model than an in-memory store, and we found out the hard way, in production, on paying customers’ accounts. This post is the retrospective on what we got wrong and what we run now instead.
Mistake 1: We Treated KV as Strongly Consistent
Cloudflare is upfront about this in their KV consistency documentation, but it’s easy to skim past when you’re focused on the happy path: KV writes can take up to 60 seconds to propagate globally across Cloudflare’s edge network. A write in one colo isn’t instantly visible in another. If a user’s requests get routed through two or three different PoPs during a traffic spike — which happens constantly with mobile clients and CDN-fronted apps — each PoP sees its own stale view of the counter.
In practice, this meant pro users capped at 100 requests per minute were bursting to 300-400 req/min whenever traffic happened to spread across multiple edge locations. Nothing errored. No exception, no failed request, no alert fired. The limiter was doing exactly what we told it to do — it just had a partial, delayed view of reality that let more traffic through than intended.
We didn’t catch this through monitoring. We caught it because our origin infrastructure bill spiked and someone went digging into request volume per customer. That’s the part that still bothers me: a silent under-enforcement bug that looks like a billing anomaly is a lot harder to trace back to “the rate limiter is eventually consistent” than an outright failure would have been. If your rate limiter can fail in a way that produces zero errors, you need a metric specifically watching for that — not just error rate.
Mistake 2: We Ignored KV’s Write Limits and Cost Model
The second problem was baked into the same code. KV enforces a hard limit of one write per key per second. We assumed — again, Redis brain — that extra writes would queue or throttle gracefully. They don’t. Under sub-second bursts from the same user, some writes simply didn’t take effect as expected, and the API call itself still returned success. There’s no thrown exception to catch. You have to know to expect this and design around it.
At around 50,000 requests per day from a single top-tier customer, we were generating roughly 50,000 KV writes per day just for that one key. Multiply that across a few thousand active API keys and the write volume adds up fast — Cloudflare’s KV pricing is per-operation, and at scale that’s real money, not a rounding error. Worse, our free-tier testing environment has a 1,000 writes/day cap, and one customer’s load test blew through that limit in under an hour, which is how we first noticed something was structurally wrong before it hit prod at full scale.
There was a second, subtler bug hiding in the same write path. KV.get() returns null for a key that’s never been written. Our code did parseInt(raw) without checking for null first, so on a brand-new API key the comparison against the limit was comparing against NaN. Every comparison involving NaN evaluates to false — including NaN >= limit. So new users had effectively no rate limit at all until their first successful write landed, sometimes for hours if propagation was slow. This is the naive version we shipped, showing both mistakes in one place:
// worker.js — the "Mistake 1 & 2" version: naive KV-based per-user rate limiter
// wrangler.toml requires: kv_namespaces = [{ binding = "RATELIMIT_KV", id = "..." }]
export default {
async fetch(request, env) {
const apiKey = request.headers.get("X-API-Key");
if (!apiKey) {
return new Response("Missing API key", { status: 401 });
}
// MISTAKE: raw API key used directly as KV key name (enumeration risk)
const kvKey = `ratelimit:${apiKey}`;
const windowSeconds = 60;
const limit = 100;
// MISTAKE: no handling for null on first-ever request
const raw = await env.RATELIMIT_KV.get(kvKey);
const count = parseInt(raw); // NaN if raw is null — comparison below silently passes
if (count >= limit) {
return new Response("Rate limit exceeded", {
status: 429,
// MISTAKE: no Retry-After header, clients retry immediately
});
}
// MISTAKE: read-then-write is not atomic — concurrent requests race here
const newCount = (isNaN(count) ? 0 : count) + 1;
// MISTAKE: assumes this write is instantly visible everywhere (it isn't)
await env.RATELIMIT_KV.put(kvKey, String(newCount), {
expirationTtl: windowSeconds,
});
// Forward to origin
const response = await fetch(request);
const newHeaders = new Headers(response.headers);
newHeaders.set("X-RateLimit-Remaining", String(limit - newCount));
return new Response(response.body, { status: response.status, headers: newHeaders });
},
};
Mistake 3: We Used Raw API Keys as KV Keys
The third mistake was a design choice that made everything above worse: we stored counters under keys like ratelimit:sk_live_abc123. That’s a naive move for two reasons. First, it’s a security smell — if KV namespace metadata or Worker logs ever leaked (via a misconfigured route or an overly verbose debug build), you’d be handing over a map of valid API key prefixes. We actually did briefly ship a debug build to prod that ran console.log(apiKey) for troubleshooting. It got caught in review within a day, but wrangler tail logs are visible to anyone with Worker log access on the account, so that’s a longer exposure window than we’re comfortable admitting.
Second, using the raw key as the KV key name made popular customers into “hot keys.” A single busy API key gets hit from many edge locations near-simultaneously, which is exactly the scenario that makes the propagation delay from Mistake 1 worse — more concurrent writers racing against a 60-second global sync window, on the same key, at the same time.
The fix in hindsight is straightforward: hash the API key with SHA-256 before using it as a KV or Durable Object key name, and never log the raw key anywhere, debug build or not. Hashing doesn’t fix the consistency problem, but it removes the enumeration risk entirely and it’s a five-minute change. We now treat “never use a raw customer identifier as a storage key” as a hard rule in code review, not a suggestion.
What We Do Differently Now
The real fix wasn’t a KV tweak — it was moving atomic counting off KV entirely. We now run one Durable Object per user (sharded by hashed key for very high-volume customers), which gives us single-threaded execution and true atomic increments. No races, no read-then-write gap, no cross-colo propagation lag, because a Durable Object instance lives in one place and serializes its own requests.
KV still has a job — it’s just the right job now. We use it for slow-changing config like plan tiers and allowlists, where eventual consistency within 60 seconds is genuinely fine. That’s the lesson underneath all three mistakes: match the consistency guarantee to the data, don’t assume one storage primitive fits every use case.
We also rebuilt the failure mode. Our first version failed closed on any KV or DO error — during an unrelated Cloudflare KV incident, that took our entire API down for twenty minutes. Now we fail open with structured logging: if the counter call times out, we let the request through and tag it for later review, rather than punishing customers for an infrastructure hiccup on our side. We switched from abrupt fixed-window counters to a rough sliding window (two adjacent fixed windows weighted by elapsed time) so we’re not allowing a 2x burst right at the window boundary, and every 429 now returns a proper Retry-After header so client SDKs back off instead of hammering us immediately.
Here’s the corrected limiter, using a Durable Object with an alarm to proactively reset the counter instead of relying on lazy KV TTL expiry:
// durable-object-limiter.js — corrected approach: atomic per-user counting
// wrangler.toml: [[durable_objects.bindings]]
// name = "RATE_LIMITER"
// class_name = "RateLimiter"
// [[migrations]]
// tag = "v1"
// new_classes = ["RateLimiter"]
export class RateLimiter {
constructor(state, env) {
this.state = state;
}
async fetch(request) {
const limit = 100;
const windowMs = 60_000;
const now = Date.now();
// Single-threaded DO storage — no race condition, true atomicity
let data = (await this.state.storage.get("counter")) || { count: 0, resetAt: now + windowMs };
if (now > data.resetAt) {
data = { count: 0, resetAt: now + windowMs };
}
data.count += 1;
await this.state.storage.put("counter", data);
// Proactive reset via alarm instead of relying on lazy TTL expiry
await this.state.storage.setAlarm(data.resetAt);
const remaining = Math.max(0, limit - data.count);
const allowed = data.count <= limit;
return new Response(JSON.stringify({ allowed, remaining, resetAt: data.resetAt }), {
status: allowed ? 200 : 429,
headers: allowed ? {} : { "Retry-After": String(Math.ceil((data.resetAt - now) / 1000)) },
});
}
async alarm() {
// Proactively clear counter when window ends, no stale reads on next request
await this.state.storage.delete("counter");
}
}
Watch out for one more thing if you’re moving to Durable Objects: the migrations syntax in wrangler.toml changed between Wrangler 2.x and 3.x. We’re on wrangler --version 3.28.1, and the [[migrations]] block with new_classes is what you want on that version — copying an old 2.x example straight into a 3.x project will fail silently on deploy in a way that’s annoying to debug. Also budget real time for load testing: wrangler dev --local --persist does not emulate cross-colo KV propagation delay, so your local tests will look perfect while production behaves completely differently. We now do a dedicated staging pass across multiple real edge locations before trusting any limiter change, and we cover more of that testing workflow in our DevOps_DayS posts on CI validation for edge deployments.
The upside, once we sorted this out: switching from per-request KV writes to Durable Objects cut our rate-limiting cost by roughly 40% at 2 million requests/day, because DO billing amortizes across duration rather than charging per operation. Cloudflare Workers rate limiting done right ended up cheaper and more correct — we just had to stop assuming KV was something it never claimed to be. For the full picture on the consistency tradeoffs, the Durable Objects documentation is worth reading end to end before you build on top of it, not after.
