Hermes Agent is the AI agent I run on my own infrastructure. Two Hetzner VPSs over Tailscale: one that holds the OAuth tokens and talks to Slack and GitHub, one that runs the shell commands. The agent never SSHes into a normal Linux shell on either box. It SSHes into /workspace inside a rootless container that ships with cap_drop: ALL. If the model gets prompt-injected today, the worst thing it can do is poke around an unprivileged container on a machine that has no production secrets and no public SSH port.
This post is the version of that architecture I would actually deploy for a company. Not “we installed an AI tool”. A secure AI agent infrastructure built like every other production workload: gateway separated from execution, rootless containers, scoped tokens, pinned images, append-only backups, tested restores.
AI agents are useful because they can act. That is also why they are risky. A normal chatbot answers questions. An agent reads a GitHub issue, inspects a repository, runs a shell command, opens a browser, writes a file, calls an API, prepares a server change. The first question for any company deployment is not “does it work”. The first question is what happens when it gets tricked.
That is what this post answers.
The problem with most AI agent deployments
The quick AI agent setup I see in the wild looks like this:
- one VPS
- Docker installed
- a
.envfile with the GitHub, Slack, and model tokens - a public reverse proxy
- a Slack or Telegram bot
- a host directory mounted into the container as a workspace
- broad GitHub or server access
- no tested restore path
It works in a demo. It also puts too many things too close together.
If the same host stores the secrets and runs whatever shell command the agent decides to run, one bad tool call is the whole story. If the agent can reach the production Docker socket, it can reach more than the operator intended. If a GitHub token can merge, change repo settings, or read every private repository, the agent stopped being an assistant the moment that token was generated. It became a production actor.
This is not theoretical. OWASP lists prompt injection as the top LLM application risk for 2025 and excessive agency as a separate risk worth its own listing: the system has too much functionality, too many permissions, or too much autonomy for the job it actually needs to do.
An agent is not dangerous because it can write text. It is dangerous because text turns into action.
Assume the model will see hostile instructions
An AI agent reads untrusted input constantly. Web pages. Pull requests. GitHub issues. Slack messages. Support tickets. Log files. Documentation. Markdown files. Command output. Browser pages.
Any of that can contain instructions aimed at the model. Some obvious, some hidden in text the original user did not intend to treat as instructions, some indirect, where the attacker never talks to the agent directly but controls something the agent later reads.
You will not solve this with a better system prompt alone. Prompts help. Output checks help. Safer tools help. The real control is architectural:
Assume the model can be confused, then limit what a confused model can reach.
That one sentence is the base of every architecture decision below.
What a secure AI agent infrastructure should actually do
A safer setup has a small list of goals:
- keep secrets away from command execution
- keep agent execution away from production hosts
- scope every token to the smallest useful permission set
- require human approval for high-impact actions
- log enough to reconstruct what happened
- back up state that matters
- test restore before you need it
This is not about making compromise impossible. That is not a promise I want to make.
The goal is a smaller blast radius. If the agent is tricked, it should not get the gateway secrets. It should not see the production Docker socket. It should not mount customer data. It should not be able to merge to main. It should not delete the backups. It should not silently fail for a month before anyone notices.
That is a defendable position. “Perfectly safe AI agent” is not.
The Hermes Agent architecture: gateway, sandbox, private network
A company-grade deployment splits into three zones.
1. The gateway
The gateway runs the AI agent process and holds the credentials it needs to coordinate work. It connects out to:
- messaging platforms (Slack, Telegram, Signal, Matrix)
- LLM providers (OpenAI, Anthropic, OpenRouter)
- OAuth providers
- internal APIs
- GitHub or GitLab
- memory services
- notification systems
The gateway is not where arbitrary shell commands run. Its job is coordination, not open-ended execution. Anything that smells like bash -c should leave the gateway.
2. The sandbox
The sandbox is where agent commands run. This is where the agent clones a repository, inspects files, runs a script, installs a package, browses a page, or executes code that was generated three seconds ago.
The sandbox is treated as hostile-adjacent. It exists because the agent will eventually do something weird. It should not contain gateway secrets, should not have sudo, should not mount the system Docker socket, and should not share a host with unrelated production workloads.
The single highest-impact decision in the whole architecture: run the gateway and the sandbox on separate machines. If the sandbox is compromised, the attacker still has to cross a machine boundary before reaching the secrets. That one boundary is worth more than half the in-container hardening below.
3. The private network
The control plane stays private. Use Tailscale, WireGuard, or another mesh VPN. Slack, Telegram, GitHub, and model providers can all work through outbound connections. There is no reason to expose the agent gateway on a public IP.
Admin access stays private by default. Public ingress on either machine is one of the easiest risks to avoid, so avoid it. On my own deployment, both hosts run sshd bound to the Tailscale interface only. The public IP refuses every TCP connection on port 22.
ForceCommand: the SSH session that drops straight into a container
The pattern I would not deploy a company AI agent without: the gateway’s SSH key into the sandbox is ForceCommand-wrapped to a script that drops the session directly into a long-running container.
The agent never lands on the host. It does not get a normal Linux account where it can cd /etc and read who else lives on the box. It lands inside a container, in /workspace, with only the environment variables explicitly allowed through.
A sanitized version of the wrapper that ships on my sandbox:
#!/usr/bin/env bash
UID_NUM=$(id -u)
export XDG_RUNTIME_DIR="/run/user/${UID_NUM}"
export DOCKER_HOST="unix:///run/user/${UID_NUM}/docker.sock"
export PATH="/home/agent-sandbox/bin:${PATH}"
DOCKER=/home/agent-sandbox/bin/docker
PASSTHROUGH_VARS=(GITHUB_TOKEN)
DOCKER_ENV_ARGS=()
for var in "${PASSTHROUGH_VARS[@]}"; do
if [ -n "${!var-}" ]; then
DOCKER_ENV_ARGS+=(-e "$var")
fi
done
case "${SSH_ORIGINAL_COMMAND:-}" in
""|"bash"|"bash -l"|"bash -i"|"-bash") : ;;
*)
exec "$DOCKER" exec -i "${DOCKER_ENV_ARGS[@]}" \
-w /workspace agent-sandbox-shell bash -lc "$SSH_ORIGINAL_COMMAND"
;;
esac
exec "$DOCKER" exec -i "${DOCKER_ENV_ARGS[@]}" \
-w /workspace agent-sandbox-shell bash -l
The script syntax is not the point. The boundary is the point. If the GitHub token needs to reach the sandbox, it gets forwarded explicitly via PASSTHROUGH_VARS. Anything not on that allowlist stays on the gateway. Adding a new token later means editing the allowlist and the SSH AcceptEnv config — two deliberate steps, not an accident.
That is better than dumping the entire gateway environment into the sandbox shell and hoping nothing leaks.
Rootless Docker is the default for an AI agent sandbox
Containers do not make AI agents safe by themselves. Docker is a tool. It is not a security plan.
If a container runs with broad capabilities, sees the Docker socket, mounts sensitive host paths, or sits on the same host as production workloads, the word “container” does not buy you much. For AI agent sandboxes, rootless Docker is the default that closes the most common ways those mistakes turn into incidents.
It does not make container escape impossible. Nothing does. What it changes is the blast radius. Container root maps into an unprivileged user namespace rather than being host root. If the agent runs a bad command, it is boxed inside a much weaker runtime than it would be otherwise.
That matters because AI agent workloads are messy. They run shell commands. They install packages. They clone repositories. They parse untrusted files. They browse pages. They execute code that was generated seconds earlier and never reviewed.
That is not the risk profile of a normal web app container. The sandbox should assume mistakes. Rootless Docker makes those mistakes cheaper.
Treat the compose.yml as a security document
A compose.yml is not deployment glue. It declares what each service can reach, what it can write, what secrets it sees, what ports it exposes, and what survives a recreate.
For AI agent infrastructure, that file is part of the security posture. A good one answers:
- Which host paths are mounted?
- Is the Docker socket mounted?
- Which ports are exposed?
- Are ports bound to
127.0.0.1, a private IP, or0.0.0.0? - Which services can reach the public internet?
- Which services are on internal-only networks?
- Which Linux capabilities are present?
- Are logs capped?
- Are memory and process counts capped?
- Which data is persistent?
- Which data is disposable?
A shortened version of the shell container I run on the sandbox:
services:
agent-sandbox-shell:
image: nikolaik/python-nodejs:python3.13-nodejs24
restart: unless-stopped
working_dir: /workspace
command: sleep infinity
dns:
- <PRIVATE_DNS_RESOLVER>
- 9.9.9.9
init: true
ipc: private
cap_drop: [ALL]
cap_add:
- DAC_OVERRIDE
- CHOWN
- FOWNER
- SETGID
- SETUID
security_opt:
- no-new-privileges:true
pids_limit: 256
mem_limit: 3g
mem_reservation: 512m
cpus: 1.8
ulimits:
nofile: { soft: 4096, hard: 65536 }
nproc: { soft: 256, hard: 256 }
core: { soft: 0, hard: 0 }
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
tmpfs:
- /tmp:rw,nosuid,nodev,size=512m
- /run:rw,nosuid,nodev,noexec,size=64m
volumes:
- ./workspace:/workspace:rw
devices: []
This is not exotic. It is basic hygiene applied consistently — the same pattern I lay out in the xCloud security review and recommend across every managed Docker stack I audit.
Start with no capabilities. Add back only what the workload genuinely needs. Package managers usually need SETUID and SETGID because they drop privileges internally. File-ownership operations may need CHOWN, FOWNER, or DAC_OVERRIDE. The point is not “zero caps at all costs”. The point is knowing why each one is there.
Separate the shell container from the browser
Browser automation is useful. It is also one of the riskiest things an AI agent does.
The browser reads hostile pages by design. It follows redirects. It executes JavaScript. It talks to websites you do not control. If Chrome DevTools Protocol is exposed, do not treat it as having useful authentication out of the box. CDP is a debug interface, not a public API.
Do not run the browser beside the gateway secrets. Run it in the sandbox, in its own container, ideally separate from the shell container.
The shell container needs /workspace, package tools, GitHub CLI, scripts, and temporary build files. The browser container does not. It should have no workspace mount. It should not see repository files. It should not share state unless there is a specific reason.
services:
agent-sandbox-chrome:
image: chromedp/headless-shell:<PINNED_VERSION>
restart: unless-stopped
ports:
- "<PRIVATE_TAILNET_IP>:9222:9222"
dns:
- <PRIVATE_DNS_RESOLVER>
- 9.9.9.9
init: true
ipc: private
cap_drop: [ALL]
cap_add:
- SETGID
- SETUID
security_opt:
- no-new-privileges:true
mem_limit: 1g
mem_reservation: 256m
cpus: 1.0
pids_limit: 512
shm_size: "256m"
tmpfs:
- /tmp:rw,nosuid,nodev,size=256m
The port binds to a Tailscale IP, not 0.0.0.0. A private network ACL restricts which device can actually reach it. The difference between “we exposed CDP” and “we exposed CDP inside a controlled network path” is the entire reason this works.
For a deeper version of the workspace-isolation idea on user-facing desktops rather than agents, my Secure Workspaces engagement is the same threat model applied to humans clicking on links.
Use Docker Hardened Images where they fit
Base images are part of the security posture. Many production stacks start from large general-purpose images full of tools the service does not need. More packages mean more CVEs, more maintenance, and more post-compromise utility for an attacker.
Docker announced that its catalog of more than 1,000 Docker Hardened Images, based on Debian and Alpine, is now free and open source under Apache 2.0. The enterprise offering still exists for teams that need support, compliance options, custom images, or SLA-backed remediation.
For company AI infrastructure, that changes the baseline. Use hardened images where they fit: databases, Redis, monitoring, common platform services are all good candidates. Some agent sandboxes still need developer tools, package managers, browsers, or language runtimes that do not fit a minimal image yet. That is fine. Document the exception.
The rule is small: use hardened images by default, use larger images only when the task needs them.
Keep databases and memory services off the public network
If the agent uses a memory service, database, vector store, cache, or internal API, keep it off the public network.
A pattern that works:
- API binds to
127.0.0.1 - database stays on an internal Docker network
- Redis stays on an internal Docker network
- only the application container can reach the database and Redis
- external callers go through a controlled local interface or private network
services:
api:
ports:
- "127.0.0.1:8000:8000"
networks: [internal, external]
cap_drop: [ALL]
security_opt:
- no-new-privileges:true
database:
image: dhi.io/pgvector:<PINNED_VERSION>
networks: [internal]
cap_drop: [ALL]
cap_add:
- CHOWN
- DAC_OVERRIDE
- FOWNER
redis:
image: dhi.io/redis:<PINNED_VERSION>
networks: [internal]
cap_drop: [ALL]
security_opt:
- no-new-privileges:true
networks:
internal:
internal: true
external:
The agent does not need your database open on the VPS public interface. Redis does not need to be reachable from the internet. The memory API does not need a public reverse proxy unless there is a very specific reason. Most of the time, there is not. The defense-in-depth architecture post covers the same principle for web applications: every layer assumes the layer in front of it has been bypassed.
GitHub automation needs the tightest permission model on the box
GitHub is one of the best places to put an AI agent. An agent can summarize pull requests, inspect diffs, comment on risky changes, open small fix branches, update docs, triage issues, check CI failures, prepare release notes.
It is also one of the easiest places to over-permission an agent. A safer model uses:
- a dedicated bot account
- repository-limited access
- the lowest role that still does the job (Triage is often enough)
- branch protection and CODEOWNERS
- no direct merge rights
- token rotation on a schedule
- a check that prevents tokens from being written into
.git/config(usegh repo clone owner/repo, nevergit clone https://x:$TOKEN@github.com/...)
The token is still a risk. It has to be. If you want real automation, some capability has to cross the boundary.
The mature pattern is not pretending the risk is gone. It is scoping the risk until misuse stays bounded. The agent can review, comment, and prepare. A human merges. That is a good boundary, and it is the boundary I draw on my own bot account: Triage role only, CODEOWNERS blocks merge at the API, intent stays the binding constraint.
DNS visibility first. Strict egress later.
Strict egress allowlisting sounds great until the agent needs to install a package, read vendor docs, fetch a GitHub release, call a model API, or open a browser page. Make the policy too tight on day one and the agent becomes useless. Leave everything open forever and exfiltration paths stay wide.
A practical first step is DNS visibility. Route sandbox DNS through a resolver you control. I use AdGuard Home on the same Tailscale network as the sandbox. That gives me per-sandbox query logs, category-level blocking, and an honest baseline of what the agent actually reaches out to before I write a single allowlist rule.
DNS control is not full containment. It does not stop hardcoded IP traffic. It does not stop allowed-domain abuse. It does not stop every form of exfiltration. It is still useful, because the best time to write a strict egress allowlist is after you have seen what the agent actually needs, not before.
For stricter environments, layer a proxy and firewall rules on top of the DNS resolver. The order matters: visibility, baseline, then enforcement.
Backups belong inside the security model
Backups are boring until they are the only thing that matters.
For an AI agent deployment, the backup design covers both data and operational state:
- agent memory
- OAuth tokens
.envfiles- configuration files
- message history
- database dumps
- workspace scripts
- systemd units
- restore scripts
- runbooks
The backup system should not rely on the compromised host being honest about what it backs up.
The pattern I run on my own deployment:
- encrypted restic repositories
- off-provider storage on Backblaze B2 (the VPSs are on Hetzner, different provider entirely)
- one bucket per machine
- append-only B2 application keys on the VPS (write new snapshots, cannot delete anything)
- full-delete keys held only on my Mac, in macOS Keychain, sourced from Bitwarden
- hourly database dumps for irreplaceable memory state
- daily config and workspace snapshots
- daily heartbeat to an n8n workflow with payload-shape contracts
- on-failure notifications fired by
systemd OnFailure=on every backup unit - monthly automated restore drill into a throwaway pgvector container
The append-only part is the load-bearing detail.
This does not make backups indestructible. Targeted ransomware that knows the workflow could still cause damage if it owned both endpoints. What it stops is the common case: a compromised VPS quietly wiping its own recovery path. The same discipline carried over from how I run Borg backups on managed hosts: encrypted, deduplicated, off-host, key-separated.
Test restores, not just backups
A backup that has never been restored is a belief.
A better setup tests restores. The minimum I would ship for an AI agent deployment:
- restore the latest database dump into a throwaway container on the sandbox host
- validate the dump is non-zero and contains an end-marker
- assert expected row counts are above zero on the tables that hold real data
- POST the result to monitoring with a payload schema, not just a status code
- run that drill monthly on a systemd timer
On my own setup, the monthly drill restores Hermes’ Honcho memory database into a throwaway pgvector container under the sandbox user’s rootless Docker, streams the latest dump in, and asserts both peers > 0 and messages > 0 before posting the result to n8n. The last drill ran on 2026-05-23 against snapshot b65cb542: peers=4, messages=344, status OK. That is what “tested restore” looks like: a row count, not a vibe.
Also test file restores. Restore the agent config to a temporary directory, compare md5sums against the live tree, restart the service, verify it returns to active. The point is to confirm the whole chain works — not just that the bytes still exist on B2.
This is the part most deployments skip. It is also the part that tells you whether the backup system is real.
What private AI automation actually looks like as an engagement
This work is not “install an AI tool”. The tool will change. The model will change. The useful part is the deployment discipline.
For a company, the engagement looks closer to infrastructure work than tool installation:
- Map what the agent needs to do
- Define what it must never touch
- Split gateway from execution onto separate hosts
- Build a sandbox with rootless containers
- Scope tokens to the smallest useful permission
- Add human approval for high-impact actions
- Keep databases and memory private
- Add egress visibility
- Back up state that matters
- Test restore on a timer
- Document operations
That is the offer. Private AI automation on your own infrastructure, with the dangerous parts boxed in.
For a managed-hosting or server-management engagement, this lives naturally beside Managed Hermes Agent (this exact stack, deployed and run for you), the Managed AI Suite (private chat, automation, structured data on hardware you own), the Cloud Infrastructure Audit (find the cracks before they bite), Secure Workspaces (containerized desktops, same threat model applied to humans), and Operations & Workflow Strategy (turn the manual glue into resilient pipelines).
AI agents are another operational workload. Deploy them with the same seriousness as any other workload that touches code, servers, or customer data — the same discipline I lay out in Linux server security fundamentals and the defense-in-depth web architecture post.
A pre-deployment checklist
Before giving an AI agent access to company systems, ask these questions out loud:
Architecture
- Is the agent gateway separate from command execution?
- Does execution happen in a sandbox?
- Is the sandbox on a separate machine or otherwise strongly isolated?
- Is admin access private?
- Is public ingress avoided where it can be?
Secrets
- Where do API keys live?
- Can the sandbox read gateway secrets?
- Are tokens scoped to one job?
- Are tokens rotated on a schedule?
- Can the agent print or persist secrets accidentally?
Containers
- Is Docker rootless where possible?
- Is the production Docker socket absent from the sandbox?
- Are capabilities dropped by default?
- Are resource limits set?
- Are logs capped?
- Are images pinned to specific versions?
- Are hardened images used where they fit?
Networking
- Are databases and caches internal-only?
- Are browser automation ports private?
- Is DNS observable?
- Is egress logged?
- Is there a path to stricter egress when needed?
Automation boundaries
- Can the agent merge code?
- Can it deploy production changes?
- Can it delete data?
- Can it modify firewall rules?
- Which actions require human approval?
Recovery
- Are backups off-server, on a different provider?
- Are the VPS-side backup keys append-only?
- Are full-delete keys kept elsewhere?
- Has a restore been tested end-to-end?
- Is there a monthly restore drill on a timer?
- Do failures alert a human?
If the answer to most of these is “we do not know”, the agent is not ready for production-adjacent work.
What this does not promise
This architecture does not make AI agents perfectly safe. No serious security design promises that.
It does not eliminate prompt injection. It does not make browser automation harmless. It does not remove every exfiltration path. It does not make GitHub tokens risk-free. It does not replace normal access control, monitoring, patching, or backups.
What it does, instead, is:
- fewer exposed services
- less privilege in the wrong place
- fewer secrets near execution
- a smaller blast radius when the model gets confused
- better logs
- faster recovery
- clearer human approval boundaries
That is what mature security looks like in practice. Not perfect. Controlled.
Closing the loop
AI agents are moving from toys into company workflows. That does not mean they should get a shell on a production server and a pile of long-lived tokens.
The safe pattern is boring in the best way: split the gateway from execution, run commands in a rootless sandbox, write the compose.yml like a security document, scope tokens, keep databases private, watch DNS first and tighten egress later, back up the state that matters, and test restores on a timer.
If your company wants private AI automation on its own servers, design the boundary before you connect the tools. That is where the real work is. Everything after that is just wiring.
If you want this same architecture deployed and run on your own servers rather than just read about it, that’s my Managed Hermes Agent engagement. If you’d rather start by finding the cracks in a stack you already run, my Cloud Infrastructure Audit & Hardening engagement is the way in: read-only access, two-week turnaround, a Blueprint Report you can read on a phone with critical fixes separated from nice-to-haves.