Building Scalable Authentication: From Monolith to Millions of Users

Authentication is the first thing every app needs and the last thing most teams get right at scale. It starts simple – a users table, a password hash, a session cookie – and somewhere between 10,000 and 10,000,000 users, it becomes your biggest architectural liability.
This article breaks down what scalable auth actually looks like: the patterns, the pitfalls, and the decisions you'll wish you'd made earlier.
Why Auth Doesn't Scale Naively
Session-based authentication stores server-side state. That works fine for a single instance but breaks immediately once you need multiple servers. Sticky sessions are a band-aid; shared session stores like Redis are better, but you're still managing centralized mutable state.
The more fundamental problem: auth touches every request. A design that adds 200ms or a single DB round-trip to every authenticated call becomes your bottleneck at scale before your business logic ever does.
Key insight: Auth latency is always multiplied by your request volume. Optimize it early.
Stateless Auth with JWTs and Its Limits
JSON Web Tokens (JWTs) solve the state problem by encoding session data into a signed, self-contained token. No server-side lookups. Your auth layer scales horizontally because any node can validate any token.
The standard access token flow:
POST /auth/login → { access_token (15min), refresh_token (7d) }Access token is
Bearerheader on every requestValidate via signature check no DB hit
On expiry, exchange refresh token for a new pair
But JWTs have a well-known problem: you cannot revoke them before expiry. A compromised token is valid until it expires. Two practical mitigations:
Keep access token TTL short (5–15 min). Short blast radius on compromise.
Maintain a token denylist (Redis, bloom filter); a trade-off back toward statefulness but minimal: you only store revoked tokens, not all active ones.
Token Architecture for Microservices
In a monolith, auth middleware is a single choke-point. In microservices, you have a choice:
Pattern A – API Gateway Validation: Validate the JWT at the gateway; pass a trusted identity header (X-User-Id, X-Roles) to upstream services. Services trust the header, not the token. Simple, fast, but requires the gateway to be your hard trust boundary.
Pattern B – Service-Level Validation: Each service validates the JWT independently using a shared public key (asymmetric RS256/ES256). Stronger isolation, but validation overhead on every service call.
Recommendation: Use Pattern A for internal service mesh traffic. Reserve Pattern B for services that face external consumers directly or handle sensitive operations.
OAuth 2.0 and OpenID Connect at Scale
For any non-trivial product, don't build your own auth server. Use an identity provider (IdP): Auth0, Okta, AWS Cognito, or self-hosted Keycloak/Ory.
The OIDC flow in brief:
Authorization Code + PKCE – for browser/mobile clients (never implicit flow)
Client Credentials – for machine-to-machine, service accounts
Device Authorization – for CLI tools, smart devices
Why outsource? Because federated identity, MFA, session management, token rotation, brute-force protection, and compliance (SOC2, HIPAA) are genuinely hard to build correctly, and they're not your competitive advantage.
What you own: your authorization layer (what a user can do), not your authentication layer (who a user is). Keep these separated.
Authorization: RBAC vs ABAC vs ReBAC
Once you know who someone is, you need to decide what they can access. Three dominant models:
RBAC (Role-Based): Assign permissions to roles, assign roles to users. Simple, auditable, widely understood. Breaks down when roles proliferate (role explosion) or when context matters.
ABAC (Attribute-Based): Policies based on attributes of the user, resource, and environment. Expressive and powerful. Complex to reason about and audit at scale.
ReBAC (Relationship-Based): Permissions derived from graph relationships (user → resource). Used by Google Zanzibar, which powers Google Drive sharing. Ideal for complex ownership hierarchies. Higher implementation cost.
Practical path: Start with RBAC. Extend to ABAC for attribute-sensitive checks (e.g., geo-restricted content, subscription tier). Move to ReBAC only when you have recursive ownership or sharing semantics.
Scaling the Auth Infrastructure
Once you have the right model, make it fast:
Cache validated JWTs at the edge (CDN or gateway) for the duration of their TTL minus a buffer. Eliminates redundant crypto work on hot paths.
Cache permission decisions in-process or in Redis with short TTLs (10–60s). A Zanzibar-style check that hits a database per request will not survive 100k RPS.
Shard your refresh token store by user ID prefix. Prevents hot-key issues on token rotation endpoints during peak login periods.
Rate-limit auth endpoints aggressively: login, register, token exchange, password reset. These are your highest-value attack surfaces.
Deploy JWKS endpoints (public key discovery) behind a CDN. These are read-only and perfectly cacheable; zero reason for them to hit your origin.
Security Hardening Checklist
The implementation decisions that separate production-grade from proof-of-concept:
Use ES256 or RS256 for JWTs, never HS256 in distributed systems (shared secret is a liability)
Rotate signing keys on a schedule; support multiple active keys via kid claim in JWKS
Bind refresh tokens to device fingerprint or IP subnet; invalidate on suspicious change
Implement token binding for high-security flows (FAPI, banking-grade APIs)
Log all auth events: logins, failures, token refreshes, revocations; structured, to a SIEM
Enforce PKCE on all public clients; no exceptions, even for first-party apps
Set
Secure, HttpOnly, SameSite=Stricton any auth cookies
The Scaling Curve
Authentication architecture isn't a one-time decision; it evolves with your system:
1–10k users: Session-based or simple JWT, single IdP, RBAC with 3–5 roles
10k–1M users: Stateless JWTs, dedicated auth service, Redis-backed denylist, caching layer
1M+ users: Distributed token validation at edge, policy engine with ABAC/ReBAC, Zanzibar-style authorization, full observability pipeline
The mistake most teams make is designing for today and rearchitecting under fire. Auth is cheap to get right upfront and expensive to migrate when you're under load.
Get the primitives right – stateless tokens, clean separation of authn/authz, a trustworthy IdP, and aggressively cached permission checks – and auth will be the least of your scaling problems. Build the rest on top of that.
→ Follow NeuralStack | MS for more engineering deep dives.






