I ran our production image through Trivy this week. Zero CVEs on the base layer, zero on every transitive Node dependency, no leaked secrets, nothing. That is the surface-level win, but it would be a misreading to claim it as the headline. The scan only catches one class of problem: packages with known CVEs in a public database. Defense in depth is what actually keeps a web application secure, and the scan covers maybe one of its seven layers.
A clean scan tells you nothing about whether your routes are actually gated, whether your file uploads can be tricked into hosting malware, whether your sessions are validated correctly, or whether deleting a user leaks their content into an orphaned table. Those problems are an order of magnitude more common and an order of magnitude more damaging. The architecture choices that prevent them are not found in any scanning tool. They are decisions you make once, early, that quietly pay off forever.
This post walks through the architecture I run on every managed-cloud build. Seven concentric layers, from the network perimeter inward to the database. Most of the decisions are boring. That is the point. Defense in depth lives or dies on whether each layer is built to survive the failure of the one outside it. For the strategic framing this all sits inside, see my Zero Trust security overview, which is the conceptual primer.
1. The perimeter: stop traffic before it reaches your app
The single highest-impact defense-in-depth move at the network layer is also the simplest. Your application server does not accept connections from the public internet. Period.
The traditional setup (open port 443, terminate TLS at Nginx, hope the firewall config is right) leaks an enormous amount of attack surface. Anyone on the internet can probe your server, fingerprint your TLS configuration, run scanners against common paths, exhaust your file descriptors trying random URLs, and consume your resources rate-limiting them. Even with a perfect WAF you have still committed to operating that WAF.
A reverse tunnel inverts the relationship. The application server opens an outbound connection to a CDN edge, and the edge proxies traffic back through that connection. From the public internet, your server has no open ports. Port scans find nothing. SSH attempts go to a different host or are also tunneled. The attack surface visible to the internet collapses to “the CDN’s edge,” which is operated by a vendor whose entire business is operating CDN edges.
Layer two on top of that: the tunnel can be gated. A CDN-level access policy can require a verified identity (email magic-link, SSO group, allow-list, whatever) before the request is even proxied to your origin. If your application code has an authentication bug, the request never reaches the buggy code unless the user has already proved who they are at the edge. The same pattern is what makes a zero-trust mesh VPN like Netbird work for internal apps.
Layer three: the connection between the tunnel and your origin is itself encrypted, even though it is technically internal traffic. The edge presents your real certificate to the public user. The tunnel re-encrypts with a self-signed certificate to your origin and forces TLS 1.3 minimum. Operationally simple (the cert is generated once at provisioning time) and worth it. Even if someone gains access to the network between the tunnel and your origin, which is rare but not impossible on a multi-tenant host, the traffic is still ciphertext.
Three layers, no exposed ports, every hop encrypted. The internet has no way to talk to your app directly.
2. The base image: start from near-zero CVEs
The second-highest-impact defense-in-depth decision is your container base image. Most projects start with the most popular community image (node:alpine, python:slim, golang:bookworm, take your pick) and inherit somewhere between 20 and 100 known CVEs on day one. None of them caused by your code, all reported by your scanner, all part of your audit trail. The team spends time triaging issues it did not create.
Hardened base images flip this. Chainguard Images, Docker Hardened Images, and distroless variants ship the absolute minimum surface area. Typically no shell, no package manager, no setuid binaries, a non-root default user, and a curated package set kept current by the image vendor. Most weeks they have zero known CVEs. Patching the OS layer becomes the vendor’s problem, not yours.
The cost is real but bounded. Hardened images strip out things a typical Dockerfile assumes. No apk add mid-build. No bash for entrypoint scripts; you write them for sh or use a compiled binary. A non-root user means you cannot write to system paths. The first pass through your Dockerfile takes longer. The tenth pass is identical to using any other base.
In return: the OS layer of your image contributes nothing to your scanner output. Anything Trivy finds is something you introduced through your own dependencies, which means there is actually a fix. The “explain why this OS CVE is not exploitable in our context” ritual disappears entirely.
This pairs well with multi-stage builds. The build stage can use a fuller image (you may need a shell, a package manager, and build tools to compile native dependencies) while the final runtime stage copies only the built artifacts into a hardened base. Build comfort and runtime hygiene coexist.
The same logic applies to your data stores. A hardened Postgres image, a hardened Redis image, a hardened reverse-tunnel client. Every layer of the stack you can move off :latest-community to :latest-hardened is one less category of CVE you will be patching at 11pm on a Friday. It is also how you stay ahead of host-level kernel bugs like the Copy Fail LPE and Dirty Frag, which still need a manual mitigation but at least do not compound with a stack-full of avoidable userspace vulnerabilities.
3. Identity: SSO-only, no public signup
For an internal or customer-only application, the simplest defense-in-depth move on authentication is to delete the signup form. Authentication happens through an external identity provider via OIDC. The login button redirects to an SSO portal. Users you have not pre-provisioned cannot get past it.
This eliminates an entire class of vulnerabilities. Account enumeration, password reset attacks, credential stuffing, email confirmation flaws, throttling bugs, brute-force concerns, and the whole surface of “implementing authentication correctly” become someone else’s problem. Your identity provider (Authentik, Okta, Keycloak, Auth0, take your pick) solves them all. They have dedicated teams. You do not. My personal default is Authentik as a self-hosted identity provider: same OIDC contract, no per-seat pricing.
A useful pattern on top of this: separate identity from profile. The identity provider owns the canonical user record (email, name, group memberships). Your application stores a profile row keyed by the IdP’s subject identifier, holding only the data that is specific to your app (display name, avatar, preferences, activation state).
That separation has cascading benefits:
- The admin role is not stored in your database. It is a group claim from the IdP. An admin cannot be created by mutating a row; they have to be in the right SSO group, which requires admin access to the SSO portal, which is a separate auth domain.
- Deactivating a user is a profile flag, not an identity revocation. Their SSO still works for other apps. They just cannot get past your app’s middleware.
- Deleting a user is a profile delete. Their content can be preserved (with an anonymized author label) without orphaning rows in your database.
The complexity cost is one extra table and a short onboarding flow on first login. The payoff is that “authentication” and “user management” are now clean, separate concerns owned by different systems, and the more dangerous one is owned by a vendor that specializes in it.
4. Authorization: three independent gates, every time
Every protected route in a defense-in-depth architecture is checked at three independent layers:
- Edge / middleware. Before the request reaches any application code, verify that there is a valid session and the user is permitted on this URL.
- Layout / page. At the rendering layer, re-check the session and re-check role for admin pages.
- Action / mutation. Every server action or API handler checks the session inline, regardless of whether middleware “should have” caught it.
Three gates sound redundant. They are. The redundancy is the point.
Real-world routing layers have edge cases. Middleware might not run for paths starting with a particular prefix. A rewrite might bypass it. A new route group might forget to extend the middleware matcher. A future framework upgrade might change which paths the matcher reaches. Each of these is a class of bug that has happened in real applications, and each one is caught by the next layer in the stack.
For API endpoints specifically: middleware in many frameworks does not cover them by default. The action-layer check is the only thing standing between a curl request and your database. Make it non-optional.
A practical version of this pattern: write a requireAdmin() or requireUser() helper that throws on failure, and call it as the first line of every mutation:
export async function deletePost(id: string) {
const user = await requireUser();
await assertOwnsOrAdmin(user, id);
return db.post.delete({ where: { id } });
}
Duplicating the call is not a code smell. It is a guarantee that every new action gets the gate by default. The day you start factoring it into a wrapper is the day the next person forgets to use the wrapper.
The general principle is defense in depth restated: each layer assumes the layer outside it might fail, and is built to survive that assumption. If the same check is performed in two places by the same code path, it is not actually two checks. It is one check that runs twice. Real depth means each layer can catch a failure of the layer outside it.
5. Data: constraints in the database, not in application code
Application code is mortal. Database constraints are forever. Whenever an invariant can be expressed in the schema, express it there.
Concrete examples from a content-management context:
- A bookmark row can refer to either a post or a course. The column structure has two nullable foreign keys. A
CHECKconstraint enforces that exactly one is non-null. The application cannot insert a malformed row even if a bug tries. - A lesson must have either a video URL or a body of text, but at least one. Another
CHECK. - Display names must be case-insensitively unique. A
UNIQUE INDEXonlower("displayName")enforces it at the storage layer. A future query that forgets the lowercase comparison will not use the index, but the data is still protected from collisions regardless of how someone tries to insert it.
ALTER TABLE bookmark
ADD CONSTRAINT bookmark_exactly_one_target
CHECK ((post_id IS NULL) <> (course_id IS NULL));
CREATE UNIQUE INDEX profile_displayname_lower_uniq
ON profile (lower("displayName"));
For deletions: design the schema so deleting a user does not have to cascade-delete their content. Posts and comments store the author’s identifier and a denormalized author display name. When the profile is deleted, the identifier is nulled out and the display name is replaced with a sentinel like "[Deleted User]". The content persists. The link to the deleted person is severed.
That is both a privacy property (the deleted user becomes unidentifiable in old content, which satisfies right-to-erasure requirements without obliterating community discussion) and a content-preservation property (the discussion thread does not disintegrate when one participant leaves).
The trade-off is that you cannot query “all posts by user X” via a foreign key after deletion. That is the right trade. The content is not the user’s to retract once published.
Cascade-delete is reserved for the things that genuinely are the user’s: their likes, their bookmarks, their notification preferences. These are pure metadata about their relationship to the content, not the content itself. They vanish with the user.
6. The input boundary: never trust the client
File uploads
File handling is one of the most consistently mishandled parts of a web application. The pattern that works:
- MIME detection by magic bytes, not by the client’s declared
Content-Type. AContent-Type: image/pngheader tells you nothing about what is actually in the file. The first few bytes do. Libraries exist for this exact purpose. Use one. - Re-process anything that can be re-processed. Images go through an image library that rotates, resizes, and dimension-caps. That strips EXIF metadata (privacy: phone-camera GPS coordinates do not belong in your storage), normalizes the file format (predictability), and catches malformed inputs that might exploit downstream consumers.
- Generate the filename yourself. Never use the client’s. A collision-resistant random ID (cuid2, nanoid, UUID v4 with cryptographic randomness) and a server-determined extension based on the detected MIME. The user’s original filename never touches your filesystem.
- Cap everything. Maximum file size, maximum dimensions, maximum pixel count, maximum decompression ratio. Decompression bombs are a thing.
- Serve files through an authenticated route, not as static assets. A request for
/api/files/<id>checks the session, looks up permissions, and streams the file with explicitContent-TypeandContent-Dispositionheaders. Image MIMEs getinline. Everything else getsattachment. AddX-Content-Type-Options: nosniffto block browser MIME sniffing.
The result: users can upload files, the files get cleaned and stored, URLs cannot be guessed or used to serve arbitrary content as HTML through your domain.
HTML
If your application accepts rich text from users, the temptation is to sanitize on save and trust it from then on. The temptation should be resisted.
Sanitize on both save and render, against the same tight whitelist:
- On save: catches malicious input before it gets stored. Keeps the database clean for direct readers (other services, exports, admin tooling, anyone who connects with
psql). - On render: catches anything that slipped through historic versions of your sanitizer, catches database corruption, and catches future bugs where new code reads from a different path.
The whitelist should be the smallest possible set. Basic block elements, basic inline formatting, links, and tightly-controlled image sources. Allow <a> only with target="_blank" rel="noopener noreferrer" forced. Allow <img> only if src points to your own authenticated file endpoint. External URLs are forbidden.
The reason for the strict <img> rule is non-obvious. An external image URL in user-authored content lets the author silently track every viewer’s IP and User-Agent through the request to their server. Forcing all images through your own endpoint makes that impossible.
7. The cache layer: fail open, not closed
A cache exists to make slow things fast, not to be a source of truth. When your cache is unavailable, your application should slow down. It should not break.
Every cache read should be wrapped: if Redis is unreachable, return null and let the underlying query run. Every cache write should be wrapped: if Redis is unreachable, log and continue. Same for rate limiters built on top of Redis. If the rate limiter is down, allow the request rather than blocking everyone out of the system.
This is “fail open” in the operational sense (the app continues working through a backing-store outage), not in the security sense (where fail-open is a vulnerability). For caches and rate-limiters specifically, operational fail-open is correct. The rate limiter is a courtesy, not a security boundary. The real security boundary is the authentication and authorization checks, which do not depend on Redis being up.
For invalidation: avoid the KEYS * pattern at all costs. It blocks the entire Redis instance and gets worse as your dataset grows. A pattern that scales: for each cache key you write, also add it to a tracking set keyed by the cache prefix. To invalidate a prefix, read the set, delete the listed keys, drop the set entries. Linear in the number of currently cached keys, not in the total dataset.
Beyond the layers: the operational backbone
Layers prevent vulnerabilities. The operational practices around the layers prevent the slow accumulation of incidents and the panic mode that ships them in the first place. Most of this is also what I cover in a Cloud Infrastructure Audit on existing stacks.
Deploys: idempotent, sudoless, reproducible
A deployment process you can run twice without thinking is a deployment process that works at 3am.
Practical components:
- No
sudoon the deploy path. The deploy user runs as a normal user. When something needs root (chowning a Postgres data directory to a specific UID, for example), do it through Docker as a root proxy:docker run --rm -v dir:/d busybox chown 70:70 /d. The deploy user never has root on the box, so a compromised deploy script cannot compromise the host. - Each environment in its own directory under
$HOME. Staging in~/myapp-staging/, production in~/myapp-prod/. The two never share state, never share env files, never share data volumes. The same operator can manage both without juggling. - Compose files live in the repo and are snapshotted to the environment directory on every deploy. The repo is the source of truth. The running environment has a self-contained copy. If the repo is unavailable, the environment can still be redeployed from its snapshot.
- Secrets are generated at setup, not committed. The provisioning script generates database passwords, session secrets, and similar values at first run, writes them once to a 0600 env file, and never echoes them to the terminal. The repo contains no secrets. The CI image contains no secrets. The secrets exist only on the host that needs them.
- Idempotent provisioning. Re-running the setup script either no-ops or refreshes config without destroying data. Re-running the deploy script always converges to the desired state.
The promotion flow is similarly mechanical:
- Feature work on a
devbranch. CI builds a:devimage. - Deploy to staging from
:dev. Validate. - Merge
devintomain. CI builds a:mainimage. docker compose pull && docker compose up -don production.
No SSH-and-git pull on production. The artifact that goes through staging and the artifact that runs in production are the same container image, byte for byte. If staging is green, the rolling production deploy is just a pull. The same approach underpins the Portainer + Nginx Proxy Manager + Vaultwarden stack I run as my default self-hosted base, with Borg backups catching whatever the deploy layer cannot.
Dependency hygiene: the boring loop
- All ranges in
package.jsonare~(patches only), not^. The lockfile is the source of truth. - Dependabot is configured for security advisories only. Routine minor and major bumps are suppressed entirely. The PRs the bot opens are always meaningful, which is the only reason they get read.
- Once a quarter, I sit down for an hour and catch up on accumulated minors and majors. Read changelogs, run the test suite, fix what broke. Batched cognitive load.
- Unfixable transitive vulnerabilities get pinned via
overridesrather than waiting on upstream. Ten minutes per override. The value is permanent.
Most weeks, dependency work consists of “look at one PR, click merge if green.” Once a quarter, it is an hour. That is the entire budget. The clean scan is the natural result.
Background work: do not block the response
When a user creates a comment, the response should return as soon as the row is written. Notification fan-out (figuring out who needs to be told, looking up their preferences, queueing emails) happens after the response is sent, in a background task that fires when the framework gives you the hook for it: after() in modern Next.js, BackgroundTasks in FastAPI, delay_until_commit patterns in Rails.
The user experience improves immediately. The reliability of the comment write improves: if the notification logic throws, the comment is still saved, because the comment was saved before the notification logic ran. Failure modes become more isolated.
That is not directly a security property, but it is a robustness property in the same family. Decouple the things that have to succeed from the things that would be nice to succeed.
What defense in depth is not
It is not a silver bullet. It is not a guarantee against future CVEs. Those will land, the scan will go red, you will fix them. It is not effort-free either. The upfront cost of picking the right base, wiring SSO, designing the schema for soft deletes, writing the provisioning script, configuring Dependabot, threading three layers of auth checks, and writing the file pipeline is real. Call it a week of focused work spread across the first month of the project.
What you get in return is a project that does not generate ongoing security debt. The scan stays clean because there is nothing to clean up. The deploys stay boring because there is nothing to debug. The team (even when the team is one person) has cognitive bandwidth left over to actually build the product. The same logic scales down to a fresh VPS: see the Linux server security baseline for the host-level equivalent, and the WordPress server security guide for the WordPress-specific version of the same architecture.
The pattern
If there is a single sentence to extract from all of this: the secure system is the one where each layer assumes the others might fail and is built to survive that assumption.
The perimeter assumes the application has bugs. The application assumes the perimeter has been bypassed. The action-layer auth check assumes the middleware did not run. The render-time sanitizer assumes the save-time sanitizer missed something. The database constraint assumes the application logic is buggy. The fail-open cache assumes Redis is down. The idempotent deploy assumes the previous deploy failed halfway through.
None of those assumptions are dramatic. Each is a small concession to reality. Stack them and the result is a system where no single failure is catastrophic, and where the absence of failure is not a streak of luck. It is the geometry of how the thing was put together.
That is the system Trivy is reporting clean. The clean scan is one measurable corner of a much larger property: you built the thing the way it should be built, and the boring choices compounded.
Defense in depth does not require a security team. It requires a willingness to pick the unexciting option, pick it on purpose, and not let anyone talk you into the exciting one.