Ludo API Rate Limits: Token Bucket, Sliding Window & Redis Implementation
Deep dive into LudoKing API rate limiting: compare token bucket vs. sliding window vs. fixed window algorithms, implement a production-grade Redis-backed rate limiter with per-endpoint quotas, configure fair use policies, and build a retry wrapper with exponential backoff and jitter.
Jump to Section
Comparing Rate Limiting Strategies
Rate limiting protects the LudoKing API infrastructure from traffic spikes, abusive clients, and denial-of-service conditions while ensuring fair resource allocation across all consumers. Choosing the right algorithm depends on your traffic patterns, consistency requirements, and infrastructure constraints. The three dominant algorithms — fixed window, sliding window, and token bucket — each make distinct trade-offs between implementation complexity, memory efficiency, and fairness.
Fixed window counting divides time into discrete windows (e.g., 60-second intervals) and counts requests within each window. It is the simplest algorithm to implement but suffers from the boundary burst problem: a client can send twice its limit in the final second of one window and the first second of the next, effectively doubling throughput at window boundaries. For a Ludo multiplayer API where game-match endpoints receive burst traffic at match start and end, this creates dangerous traffic spikes.
Sliding window logging tracks every request timestamp in a sorted data structure (typically a Redis sorted set) and counts only requests within the current time range. This eliminates boundary bursts completely and provides a smooth, accurate rate limit. The cost is higher memory usage — each request needs a timestamp entry — and slower lookups for high-throughput endpoints. However, with Redis sorted sets and pipelined Lua scripts, sliding window logging handles thousands of requests per second efficiently.
Token bucket models each client as a bucket that fills with tokens at a constant rate. Each request consumes one token; if the bucket is empty, the request is rejected. The bucket has a maximum capacity, allowing clients to accumulate tokens during idle periods and consume them in bursts. Token bucket is ideal for Ludo game APIs because match-making creates natural burst patterns: a client may send 10 rapid requests when joining a lobby, then go quiet during gameplay. Token bucket accommodates this pattern without penalizing the client, whereas fixed or sliding window would throttle the burst unfairly.
Redis Token Bucket Implementation
The token bucket algorithm requires storing two pieces of state per client: the current token count and the last refill timestamp. Redis is the canonical backing store because its atomic Lua scripting capability ensures that read-modify-write operations on the bucket happen atomically, eliminating race conditions under concurrent load. The Lua script executes the entire token consumption logic in a single Redis operation, guaranteeing that two simultaneous requests from the same client cannot both consume the same token.
-- KEYS[1] = bucket key (e.g., "ratelimit:player:abc123:games") -- ARGV[1] = capacity (max tokens) -- ARGV[2] = refill rate (tokens per second) -- ARGV[3] = current timestamp in milliseconds -- ARGV[4] = tokens to consume (default 1) local key = KEYS[1] local capacity = tonumber(ARGV[1]) local refillRate = tonumber(ARGV[2]) local now = tonumber(ARGV[3]) local toConsume = tonumber(ARGV[4]) or 1 -- Fetch current bucket state local bucket = redis.call('HMGET', key, 'tokens', 'lastRefill') local tokens = tonumber(bucket[1]) or capacity local lastRefill = tonumber(bucket[2]) or now -- Calculate tokens to add based on elapsed time local elapsed = (now - lastRefill) / 1000.0 local tokensToAdd = elapsed * refillRate tokens = math.min(capacity, tokens + tokensToAdd) local allowed = 0 local newTokens = tokens if tokens >= toConsume then allowed = 1 newTokens = tokens - toConsume lastRefill = now end -- Persist updated bucket state with TTL (2x window for cleanup) redis.call('HMSET', key, 'tokens', newTokens, 'lastRefill', lastRefill) redis.call('EXPIRE', key, 120) -- Return [allowed (0|1), remaining tokens, retry-after ms] local retryAfter = 0 if allowed == 0 then retryAfter = math.ceil((toConsume - newTokens) / refillRate * 1000) end return { allowed, math.floor(newTokens), retryAfter }
The Lua script executes atomically within Redis's single-threaded event loop, making it safe for concurrent access without locks. The capacity parameter defines the maximum burst size — for the LudoKing API's free tier game endpoints (100 requests/minute), a capacity of 100 with a refill rate of 1.67 tokens/second allows a client to accumulate tokens during idle periods and then burst up to 100 requests instantly. The TTL of 120 seconds ensures stale bucket keys are automatically evicted, preventing unbounded Redis memory growth.
import Redis from 'ioredis'; interface RateLimitConfig { capacity: number; // Max tokens in bucket refillRate: number; // Tokens added per second windowSecs: number; // Reporting window (e.g. 60 for per-minute) } interface RateLimitResult { allowed: boolean; remaining: number; retryAfterMs: number; limit: number; resetAt: number; // Unix timestamp when bucket refills fully } const LUA_TOKEN_BUCKET = ` local key = KEYS[1] local capacity = tonumber(ARGV[1]) local refillRate = tonumber(ARGV[2]) local now = tonumber(ARGV[3]) local toConsume = tonumber(ARGV[4]) or 1 local bucket = redis.call('HMGET', key, 'tokens', 'lastRefill') local tokens = tonumber(bucket[1]) or capacity local lastRefill = tonumber(bucket[2]) or now local elapsed = (now - lastRefill) / 1000.0 local tokensToAdd = elapsed * refillRate tokens = math.min(capacity, tokens + tokensToAdd) local allowed = 0 local newTokens = tokens if tokens >= toConsume then allowed = 1 newTokens = tokens - toConsume lastRefill = now end redis.call('HMSET', key, 'tokens', newTokens, 'lastRefill', lastRefill) redis.call('EXPIRE', key, 120) local retryAfter = 0 if allowed == 0 then retryAfter = math.ceil((toConsume - newTokens) / refillRate * 1000) end return { allowed, math.floor(newTokens), retryAfter } `; export class TokenBucketRateLimiter { private redis: Redis; private configs: Map<string, RateLimitConfig>; constructor(redisUrl: string) { this.redis = new Redis(redisUrl); this.configs = new Map(); this.redis.defineCommand('tokenBucket', { numberOfKeys: 1, lua: LUA_TOKEN_BUCKET, }); } registerEndpoint(key: string, config: RateLimitConfig) { this.configs.set(key, config); } async check( identifier: string, // e.g., apiKey, playerId, or IP endpointKey: string, tokensToConsume = 1 ): Promise<RateLimitResult> { const config = this.configs.get(endpointKey); if (!config) throw new Error(`Unknown endpoint: ${endpointKey}`); const redisKey = `ratelimit:${endpointKey}:${identifier}`; const now = Date.now(); const result = await this.redis.tokenBucket( redisKey, config.capacity, config.refillRate, now, tokensToConsume ) as [number, number, number]; const [allowedNum, remaining, retryAfterMs] = result; const resetAt = now + Math.ceil((config.capacity - remaining) / config.refillRate * 1000); return { allowed: allowedNum === 1, remaining, retryAfterMs, limit: config.capacity, resetAt: Math.floor(resetAt / 1000), }; } async close() { await this.redis.quit(); } }
Sliding Window Counter Alternative
The sliding window counter provides rate limiting accuracy between fixed window simplicity and sliding window log precision. It maintains a fixed number of sub-windows (typically 60, one per second) and computes the effective request count as a weighted sum: the full count from recent sub-windows plus a fractional count from the partially-elapsed current sub-window. This reduces memory usage compared to per-request logging while eliminating the boundary burst problem that plagues fixed window counters.
-- Sliding window counter using Redis sorted sets -- KEYS[1] = window key, ARGV[1] = max requests, ARGV[2] = window size (ms) -- ARGV[3] = current timestamp (ms), ARGV[4] = request ID local key = KEYS[1] local limit = tonumber(ARGV[1]) local window = tonumber(ARGV[2]) local now = tonumber(ARGV[3]) local reqId = ARGV[4] -- Remove expired entries outside the window redis.call('ZREMRANGEBYSCORE', key, '-inf', now - window) -- Count requests in current window local count = redis.call('ZCARD', key) if count < limit then -- Add this request with timestamp as score redis.call('ZADD', key, now, reqId) redis.call('EXPIRE', key, math.ceil(window / 1000) + 1) return { 1, limit - count - 1, 0 } -- allowed, remaining, retryAfter else -- Get oldest entry to calculate retry-after local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES') local retryAfter = 0 if oldest[2] then retryAfter = math.ceil((tonumber(oldest[2]) + window - now) / 1000) end return { 0, 0, retryAfter } -- denied, remaining, retryAfter end
The sliding window counter trades off per-request memory (using a sorted set entry per request rather than two numbers per bucket) for absolute accuracy. For the LudoKing API's leaderboard and analytics endpoints that need precise per-second fairness, the sliding window counter is the better choice. For game-critical endpoints like /games/:id/move where token bucket's burst accommodation matters more, the token bucket remains preferable. Many production systems use both algorithms simultaneously — token bucket for endpoints that benefit from bursting and sliding window for endpoints requiring strict fairness.
Express Rate Limit Middleware
The Express middleware integrates the rate limiter into the request lifecycle. It runs before route handlers, extracts the client identifier (API key, player ID, or IP address), checks the rate limit via Redis, and either allows the request to proceed or rejects it with a 429 response and appropriate headers. The middleware is designed as a factory function that accepts per-endpoint configurations, making it reusable across the entire LudoKing API surface.
import { Request, Response, NextFunction, RequestHandler } from 'express'; import { TokenBucketRateLimiter } from './TokenBucketRateLimiter'; interface MiddlewareConfig { limiter: TokenBucketRateLimiter; endpointKey: string; keyGenerator: (req: Request) => string; // Extract client identifier tokensToConsume?: number; skipSuccessfulRequests?: boolean; skipFailedRequests?: boolean; } export function createRateLimitMiddleware(config: MiddlewareConfig): RequestHandler { const { limiter, endpointKey, keyGenerator, tokensToConsume = 1, } = config; return async (req: Request, res: Response, next: NextFunction) => { const identifier = keyGenerator(req); try { const result = await limiter.check(identifier, endpointKey, tokensToConsume); // Attach rate limit info to response headers res.setHeader('X-RateLimit-Limit', result.limit); res.setHeader('X-RateLimit-Remaining', result.remaining); res.setHeader('X-RateLimit-Reset', result.resetAt); if (!result.allowed) { res.setHeader('Retry-After', Math.ceil(result.retryAfterMs / 1000)); res.setHeader('Content-Type', 'application/json'); return res.status(429).json({ error: 'Too Many Requests', code: 'RATE_LIMIT_EXCEEDED', message: `Rate limit exceeded for this endpoint. Retry after ${Math.ceil(result.retryAfterMs / 1000)} seconds.`, retryAfter: Math.ceil(result.retryAfterMs / 1000), limit: result.limit, remaining: 0, }); } next(); } catch (err) { // On Redis failure, fail open (allow request) to avoid blocking all traffic console.error('Rate limiter error, failing open:', err); next(); } }; } // Key generators for different identifier strategies export function byApiKey(req: Request): string { const apiKey = req.get('X-API-Key') || req.get('Authorization')?.replace('Bearer ', ''); if (!apiKey) return req.ip || 'anonymous'; return hashIdentifier(apiKey); // Hash to avoid leaking key material in Redis keys } export function byPlayerId(req: Request): string { return req.params.playerId || req.body?.playerId || byApiKey(req); } function hashIdentifier(id: string): string { // SHA-256 truncated to 16 hex chars — enough uniqueness, no key leakage // In production, use crypto.createHash('sha256').update(id).digest('hex').slice(0, 16) return id.slice(0, 16); } // --- Usage in Express app --- const limiter = new TokenBucketRateLimiter(process.env.REDIS_URL!); limiter.registerEndpoint('games-read', { capacity: 100, refillRate: 1.67, windowSecs: 60 }); limiter.registerEndpoint('games-write', { capacity: 20, refillRate: 0.33, windowSecs: 60 }); limiter.registerEndpoint('move', { capacity: 30, refillRate: 5.0, windowSecs: 60 }); app.get('/games/:id', createRateLimitMiddleware({ limiter, endpointKey: 'games-read', keyGenerator: byApiKey }), getGame ); app.post('/games', createRateLimitMiddleware({ limiter, endpointKey: 'games-write', keyGenerator: byApiKey }), createGame ); app.post('/games/:id/move', createRateLimitMiddleware({ limiter, endpointKey: 'move', keyGenerator: byPlayerId }), processMove );
The middleware fails open on Redis errors — a deliberate design choice. If Redis becomes unreachable, blocking all API traffic would be far more damaging than briefly allowing unlimited requests. The trade-off is acceptable for most Ludo game APIs where the worst-case scenario of a brief unthrottled period is manageable, while a Redis outage causing a complete API outage is not. For high-security deployments, configure the middleware to fail closed instead.
Per-Endpoint Rate Limit Tiers
Not all LudoKing API endpoints carry the same operational cost or fairness sensitivity. A POST /games/:id/move call triggers database writes, broadcasts to connected WebSocket clients, and potentially invalidates leaderboard caches — making it far more expensive than a GET /games/:id that serves a cached response. Configuring tiered limits ensures that computationally expensive endpoints cannot be used to amplify traffic and that shared resources are distributed according to endpoint value.
import { RequestHandler } from 'express'; import { TokenBucketRateLimiter } from '../lib/rateLimiter'; // Define all endpoint limits: [capacity, refillRate/sec] const ENDPOINT_LIMITS = { // Game state endpoints (read-heavy, cached) 'GET:/games': [120, 2.0], // 120/min 'GET:/games/:id': [60, 1.0], // 60/min 'GET:/leaderboard': [30, 0.5], // 30/min // Game write endpoints (expensive, fairness-critical) 'POST:/games': [10, 0.17], // 10/min 'POST:/games/:id/move': [30, 0.5], // 30/min (game-critical) 'POST:/games/:id/join': [10, 0.17], // 10/min 'POST:/games/:id/leave': [10, 0.17], // 10/min // WebSocket handshake (connection setup only) 'WS:/connect': [5, 0.08], // 5/min // Analytics and bulk endpoints (lower priority) 'GET:/analytics': [20, 0.33], // 20/min 'GET:/player/:id/history': [60, 1.0], // 60/min } as const; export function buildLimiter(redisUrl: string): TokenBucketRateLimiter { const limiter = new TokenBucketRateLimiter(redisUrl); for (const [endpoint, [capacity, refillRate]] of Object.entries(ENDPOINT_LIMITS)) { limiter.registerEndpoint(endpoint, { capacity, refillRate, windowSecs: 60, }); } return limiter; } // Global rate limit for unknown endpoints const GLOBAL_LIMIT = { capacity: 200, refillRate: 3.33, windowSecs: 60 };
Retry Wrapper with Exponential Backoff
When a rate-limited request returns 429, the client must retry intelligently. The naive approach — immediate retry or fixed-interval retry — creates synchronized request waves (the thundering herd problem) that overwhelm the server at exactly the moment it is already under load. Exponential backoff with jitter solves this by doubling the wait time after each failure and adding randomness to spread retries across a time window rather than a fixed point.
interface RetryOptions { maxRetries: number; baseDelayMs: number; maxDelayMs: number; backoffFactor: number; jitter: boolean; retryableStatuses: number[]; onRetry?: (attempt: number, error: Error, delayMs: number) => void; } const DEFAULT_OPTIONS: RetryOptions = { maxRetries: 5, baseDelayMs: 1000, maxDelayMs: 30000, backoffFactor: 2, jitter: true, retryableStatuses: [429, 500, 502, 503, 504], }; async function sleep(ms: number): Promise<void> { return new Promise(resolve => setTimeout(resolve, ms)); } /** * Full jitter algorithm — optimal for distributed retries. * Delay = random(0, min(maxDelay, baseDelay * 2^attempt)) * Spreads retries uniformly across the full backoff window, * minimizing collision probability in concurrent client scenarios. */ function fullJitterDelay(attempt: number, baseDelay: number, maxDelay: number, factor: number): number { const exponentialDelay = Math.min(maxDelay, baseDelay * Math.pow(factor, attempt)); return Math.floor(Math.random() * exponentialDelay); } /** * Decorrelated jitter — better for high-latency backends. * Delay = random(baseDelay, previousDelay * 3) * Provides stronger decorrelation between retry attempts. */ function decorrelatedJitterDelay(attempt: number, baseDelay: number, prevDelay: number): number { return Math.floor(baseDelay + Math.random() * prevDelay * 3); } export async function fetchWithRetry<T>( url: string, options: RequestInit = {}, userOptions: Partial<RetryOptions> = {} ): Promise<Response> { const opts = { ...DEFAULT_OPTIONS, ...userOptions }; let lastError: Error; let prevDelay = opts.baseDelayMs; for (let attempt = 0; attempt <= opts.maxRetries; attempt++) { try { const response = await fetch(url, options); // Handle rate limit specifically if (response.status === 429) { // Honor the server's Retry-After header if present const retryAfterHeader = response.get('Retry-After'); let delay: number; if (retryAfterHeader) { // Server tells us exactly when to retry delay = parseInt(retryAfterHeader, 10) * 1000; } else { // Calculate exponential backoff delay = opts.jitter ? decorrelatedJitterDelay(attempt, opts.baseDelayMs, prevDelay) : Math.min(opts.maxDelayMs, opts.baseDelayMs * Math.pow(opts.backoffFactor, attempt)); } prevDelay = delay; if (attempt === opts.maxRetries) { throw new RateLimitExceededError( `Rate limit exceeded after ${opts.maxRetries} retries. Total wait: ${delay}ms` ); } const retryAfter = response.get('X-RateLimit-Reset'); const waitUntil = retryAfter ? Math.max(delay, parseInt(retryAfter, 10) * 1000 - Date.now()) : delay; opts.onRetry?.(attempt, new Error('429 Rate Limit'), waitUntil); console.warn(`[RateLimit] Attempt ${attempt + 1}/${opts.maxRetries} — waiting ${Math.round(waitUntil)}ms`); await sleep(waitUntil); continue; } // Handle other retryable errors if (!opts.retryableStatuses.includes(response.status)) { return response; // Non-retryable — return immediately } const delay = opts.jitter ? fullJitterDelay(attempt, opts.baseDelayMs, opts.maxDelayMs, opts.backoffFactor) : Math.min(opts.maxDelayMs, opts.baseDelayMs * Math.pow(opts.backoffFactor, attempt)); if (attempt === opts.maxRetries) { throw new HTTPError(`HTTP ${response.status} after ${opts.maxRetries} retries`, response.status); } opts.onRetry?.(attempt, new Error(`HTTP ${response.status}`), delay); await sleep(delay); } catch (err) { lastError = err as Error; if (attempt === opts.maxRetries) break; const delay = Math.min(opts.maxDelayMs, opts.baseDelayMs * Math.pow(opts.backoffFactor, attempt)); await sleep(delay); } } throw lastError!; } class RateLimitExceededError extends Error { constructor(msg: string) { super(msg); this.name = 'RateLimitExceededError'; } } class HTTPError extends Error { constructor(msg: string, public status: number) { super(msg); this.name = 'HTTPError'; } } // --- Usage examples --- // Simple game move with automatic retry async function submitMove(gameId: string, move: MovePayload) { const response = await fetchWithRetry( `${API_BASE}/games/${gameId}/move`, { method: 'POST', headers: { 'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json' }, body: JSON.stringify(move), }, { maxRetries: 4, baseDelayMs: 500, onRetry: (attempt, err, delay) => { console.log(`Retrying move submission — attempt ${attempt + 1}, waiting ${delay}ms: ${err.message}`); }, } ); return response.json(); }
Rate Limit Response Headers
Standardized rate limit response headers allow clients to implement proactive throttling — adjusting their request rate before hitting the limit rather than reacting to 429 responses. The LudoKing API follows the IETF draft standard for rate limit headers, with four primary headers: X-RateLimit-Limit (total budget), X-RateLimit-Remaining (requests left in the current window), X-RateLimit-Reset (Unix timestamp of window reset), and Retry-After (present only on 429 responses).
interface RateLimitHeaders { limit: number; remaining: number; resetAt: number; retryAfter: number | null; } function parseRateLimitHeaders(response: Response): RateLimitHeaders { const parseInt_ = (v: string | null) => (v ? parseInt(v, 10) : 0); return { limit: parseInt_(response.headers.get('X-RateLimit-Limit')), remaining: parseInt_(response.headers.get('X-RateLimit-Remaining')), resetAt: parseInt_(response.headers.get('X-RateLimit-Reset')), retryAfter: parseInt_(response.headers.get('Retry-After')), }; } // Adaptive client that tracks quota and paces requests class AdaptiveRateLimitClient { private remaining: number = 0; private resetAt: number = 0; private limit: number = 0; private pendingRequests: number = 0; async request<T>(url: string, init?: RequestInit): Promise<T> { await this.waitForQuota(); const response = await fetchWithRetry(url, init); const headers = parseRateLimitHeaders(response); // Sync local quota state with server response this.remaining = headers.remaining; this.resetAt = headers.resetAt; this.limit = headers.limit; if (response.status === 429) { console.warn(`Server rejected request. Reset at ${new Date(headers.resetAt * 1000)}`); } return response.json(); } private async waitForQuota(): Promise<void> { // Wait until the reset window has passed if fully depleted if (this.remaining <= 0 && this.resetAt > Date.now() / 1000) { const waitMs = (this.resetAt * 1000) - Date.now(); console.log(`Quota depleted. Waiting ${Math.ceil(waitMs / 1000)}s for reset...`); await sleep(waitMs); } this.pendingRequests++; this.remaining = Math.max(0, this.remaining - 1); } }
Fair Use Policies and Tiered Access
Rate limits alone do not guarantee fair resource distribution — sophisticated clients can create many API key accounts to multiply their effective quota (the multi-tenant abuse problem). Fair use policies layer on top of rate limits to enforce per-organization and per-IP constraints that prevent quota multiplication. The LudoKing API implements a three-dimensional fair use model: per API key (token bucket), per IP address (sliding window counter), and per organization (aggregate limit across all org keys).
Tiered access also enables business model differentiation. Free-tier clients share a pooled global limit on expensive operations like game state writes, while Pro-tier clients receive dedicated quota allocations. Tournament and enterprise tiers get negotiated limits with SLA-backed availability. This tiering is implemented by mapping each API key to a rate limit tier in Redis and selecting the appropriate limit configuration at request time.
Beyond technical limits, fair use policies define what constitutes acceptable API usage: no scraping of leaderboard data, no automated player accounts that simulate human gameplay without disclosure, no redistribution of cached game state to third parties, and no use of the API to power competing Ludo platforms. Violations result in progressive remediation — warning, temporary suspension, then permanent revocation. The API Documentation has the complete terms of service and fair use guidelines.
Frequently Asked Questions
/move and sliding window for leaderboard and analytics endpoints.Retry-After header communicates the exact moment the server will accept requests again, eliminating the need for the client to guess via exponential backoff. When present, the client should wait exactly the specified duration rather than calculating its own delay. The server's reset timestamp accounts for all concurrent clients hitting the limit simultaneously, whereas a client's independently calculated backoff may overshoot or undershoot the actual refill time. Always prioritize Retry-After over calculated backoff when it is present in a 429 response. The REST API Reference documents which endpoints return Retry-After headers./move; it gets more requests on the aggregate budget. See the Node.js SDK Guide for client-side token bucket implementation tuned to each tier's limits.Need Higher Rate Limits for Your Ludo Platform?
Contact us to discuss Pro or Enterprise tier rate limits, tournament quotas, and fair use policies for your Ludo API integration.