AI agents that can run commands and touch your systems are genuinely useful and genuinely dangerous, and most people deploy them like they’re harmless. A container with default settings, a public port, the model’s tokens sitting right next to the code it executes. It works in a demo. It keeps working right up until the model reads a poisoned web page or a malicious instruction buried in a document and does exactly what it was told.
The Managed Hermes Agent is the opposite of that. It’s a real action-taking agent, deployed on infrastructure I’ve built around the assumption that it will eventually be turned against you, and then run by me so the hard parts stay handled.
See it before you buy it. The full architecture is public. I’ve documented the design in Hermes Agent Deployment: Secure AI Agent Infrastructure, and the hardened compose files, the SSH wrapper, and the backup runbook live in the open companion repo at github.com/wnstify/hermes-agent. Read exactly how it works, then decide.
Built to be attacked
The single most important decision in this setup is splitting the agent across two servers.
One server is the brain. It holds the model API tokens and the connections to Slack, GitHub, and your other accounts. The other is the hands. It’s where the agent’s code actually runs, inside a rootless container. The brain reaches the hands over a single locked-down channel that drops the agent straight into a working directory inside that container. It never lands on a normal shell on either machine.
So picture the worst case. The model reads a malicious instruction and tries to do harm. It’s confined to an unprivileged container on a server that has no production secrets and no public way in. It has no path back to the brain, nothing worth stealing in reach, and no business being on the host in the first place. The point isn’t to pretend prompt injection will never happen. It’s to make the blast radius boring when it does: a sandbox you wipe and rebuild. That containment is the product.
Nothing the public can reach
Both servers live on a private Tailscale network. There’s no public SSH port to brute-force, no admin panel exposed to the internet, and every service binds to a private address rather than listening to the whole world. The firewall denies inbound traffic by default.
On the way out, a compromised agent’s favorite move is to quietly send your data somewhere or pull down a second-stage payload. So outbound traffic runs through a strict allowlist. The agent reaches the handful of services it’s supposed to and nothing else. Most “secure” AI deployments lock the front door and leave the back door wide open. This closes both.
What the agent can actually do
Containment is worth nothing if the agent is useless, so this is a capable one.
Hermes runs shell commands, drives a real headless browser to navigate sites and pull structured data, reads and writes files, and can stand up its own private, version-controlled code repositories with verified, signed commits. It works the way a careful engineer would, and it keeps a record of what it did. Your team talks to it through almost any chat tool, whether that’s Slack, Teams, Telegram, WhatsApp, Signal, or Discord, so reaching it feels like messaging a colleague rather than logging into yet another tool.
What teams use it for
The security only matters because the agent is doing work worth protecting. In practice that looks like:
- Researching vendors, competitors, and technical options, then writing up what it found
- Keeping internal docs and runbooks current instead of letting them quietly rot
- Running recurring checks against your sites, repos, or servers and flagging what changed
- Opening pull requests for small code or content fixes, with signed commits you can audit
- Turning sessions, incidents, and decisions into searchable long-term memory
- Driving browser workflows that are too fiddly to wire up as a clean API
None of it needs someone babysitting the agent, and all of it stays on infrastructure you control.
Memory that survives
Most agents are goldfish. Close the session and the context is gone, so every conversation starts from zero.
Hermes runs on a dedicated long-term memory layer. It remembers your stack, the decisions you made, and the work in progress, and carries that forward between sessions. The agent gets more useful the longer you work with it, because it isn’t relearning your setup every morning. That memory is backed up on its own schedule and restore-tested like everything else.
Backups you’ve actually restored
Recovery is where most setups quietly fall apart, because nobody tests the backups until they need them.
Here, backups are encrypted and stored off-site, and the keys on the servers are append-only. A server can write new backups but can’t delete old ones, so even a full compromise can’t destroy your recovery path. The delete-capable keys live somewhere else entirely. And once a month, an automated drill restores a real backup into a throwaway database and checks the data is really there by counting the rows. A backup you have never restored is a rumor, and I don’t build on rumors.
How this differs from the Managed AI Suite
If you’ve seen the Managed AI Suite, the fair question is why this costs more for what looks like less. Here’s the honest answer.
The Suite is the broad private-AI environment: chat, automation, structured data, and an agent, all on one managed server you own. It’s the right starting point when you want the whole stack.
Managed Hermes Agent is narrower and stricter. It’s for when the agent itself is the workload, taking real action: running commands, driving a browser, touching code, reaching into business systems. That carries more risk, so it gets more architecture: a second dedicated server for the brain and hands split, scoped tokens, the outbound allowlist, heavier monitoring, and the monthly restore drills. The higher price buys that extra hardening and the second host, not a longer feature list.
If you want the whole private-AI stack, start with the Suite. If the agent is the point, start here.
Who this is for
This is for companies handing an AI agent real access: to code, to data, to systems that matter. If the agent is only answering questions in a chat box, you don’t need this. If it’s running commands, moving data, and acting on your behalf, then it’s an operational workload and it deserves the same discipline as anything else in production.
You own the servers and the data. I deploy the architecture, harden it, and run it, and you reach me directly when something needs a human. If you’d rather start by finding the cracks in what you already run, the Cloud Infrastructure Audit & Hardening is the way in.
The whole thing is documented in the open. Read the full architecture writeup or browse the companion repo on GitHub before we ever talk.
Stop running an AI agent like a toy. Tell me what you want it to do, and we’ll scope the deployment on a call.