Iron Goo
Methodology

How we run 24/7 AI operations

Most AI operations is in-app chatbots with tools. The right shape is agentic clients on the operator's machine, with a knowledge layer the agent can read. Claude Code, OpenClaw, Codex. The supervision model that keeps humans in the loop.

Most of what gets sold as AI operations is the wrong shape. The pattern that has taken over the marketing material is an in-app chatbot with tool access: you log in to a SaaS dashboard, you talk to a bot, the bot calls APIs on your behalf. The problem is not the model. The problem is the architecture. The bot lives inside one applications box, only sees what that application shows it, and goes silent the moment your work crosses any of the seams a real business has (the CRM, the payment processor, the calendar, the file system, the dozen spreadsheets that hold the actual state of the company).

The shape that works is the inverse. The agent runs on the operators machine, or on a server the operator controls. It sees the file system, it can run shell commands, it can call any API or open any browser. The application is whatever the operator is doing today, not a fixed product surface. We have been running this pattern for three years, across our own operations and for clients who got tired of the chatbot experience. This page describes how we set it up.

The agentic-client architecture

The base layer is an agentic client, running locally or on a controlled server. Our default is Claude Code, Anthropics official client for Claude, because the model is the strongest we have used for engineering and ops work and the client itself is well-built. We pair Claude Code with OpenClaw, a lighter agent we run in a Docker container on the same box for tasks that need to persist across sessions (memory store, queues, scheduled jobs). For occasional cross-checks we use Codex (OpenAIs coding agent), so we have a second pair of eyes when we want one.

The combination matters less than the principle. A real operator needs a primary agent that can drive a session (Claude Code), a secondary agent or service that holds memory and runs in the background (OpenClaw), and the freedom to point either at any tool, file, or API on the operators machine. That is the agentic-client model. Everything else (chatbots, in-app assistants, RAG-with-tools sidebars) is a product trying to look like an operator and falling short.

We do not run agents in the cloud as a SaaS unless the work genuinely requires it. The cloud-SaaS model adds latency, adds a vendor between you and your data, and limits which tools the agent can reach. Local-first or your-server-first is the default. Cloud is a deliberate choice for the cases where it earns its keep.

The knowledge layer

An agent without a knowledge layer is a clever stranger. It can read code and write code, but it does not know your business. It does not know that you ship to seven countries and that one of them charges customs differently. It does not know that your support team handles VIP customers through a separate inbox. It does not know that last Tuesday you decided to deprecate the old API and the rollout is mid-flight. None of that lives in the agents training data, and none of it is on a public site. It lives in your teams heads, in a Notion page, in a Slack thread from three weeks ago.

We give the agent a knowledge layer. Three artifacts.

First, a project memory file. A markdown file the agent reads on every session start. It contains the current state of the business: what is in flight, what was decided last month, who owns what, where the bodies are buried. It is short (rule of thumb, under five hundred lines per project), it indexes longer files for detail, and it is curated. We do not dump everything into it; we put the things the agent needs to avoid asking the same question twice.

Second, skills. A skill is a markdown document that teaches the agent how to do a specific recurring task. How to deploy our staging environment. How to cut a new release. How to handle an incident on the order pipeline. Each skill includes the steps, the gotchas, the names of the people to escalate to, and any commands or scripts the agent can run. Skills are written by humans, refined by humans, and read by the agent on demand. They are how the agent gets reliably better at the specific work your business does.

Third, runbooks. Step-by-step procedures for known incidents and known recurring jobs. The skill is the general teaching; the runbook is the specific procedure for the moment. When something breaks at three in the morning, the agent reads the runbook, takes the first three steps automatically (or queues them for human approval, depending on the risk profile), and tells you what it found. The runbook is the difference between the agent flagged something and the agent diagnosed and started the recovery.

The supervision model

We do not run agents autonomously. The fantasy that an AI agent goes off and runs your business while you sleep is, for now, a fantasy. The agents are good enough to do significant work without supervision in narrow domains, and not good enough to make consequential decisions without a human checking the work in any domain. Anyone selling otherwise is either lying or has a very narrow use case.

What does work, and what we ship, is a supervision model that keeps humans in the loop without making the human a bottleneck. Two patterns.

Pattern one: agent proposes, human approves. The agent assembles the work (the deploy plan, the customer email draft, the database migration script), and shows the human the plan before executing. The human approves with one keystroke or amends. This pattern fits high-reversibility work with non-trivial cost: a deploy that affects production, an email that goes to a customer, a database write that is hard to undo. The human spends seconds per approval, the agent does minutes or hours of preparation.

Pattern two: agent does, human reviews afterward. The agent executes the work, logs everything it did, and the human reviews the log on a fixed cadence (daily, weekly, depending on the work). This pattern fits low-reversibility cost, high-volume work: triaging support tickets, running scheduled data jobs, monitoring alerts. The human is not in the critical path; the human is in the audit path. This is where the 24/7 part of 24/7 operations actually lives. The agent handles the small things at three in the morning, the human reads the log over coffee.

For both patterns, we instrument heavily. Every action the agent takes gets logged, and the logs are queryable. If something goes wrong on Tuesday and you only notice on Friday, you can read back what the agent did and why. The instrumentation is the difference between an agent you trust and an agent that scares you.

What we typically automate

The work that fits this model best has three properties. Recurring (the same shape of task happens often enough to be worth setting up). Codifiable (someone could write down how to do it, even if it has not been written down yet). Auditable (the result is verifiable after the fact, even if you cannot pre-specify every input).

Concrete examples. Deploying staging environments and running smoke tests. Triaging incoming support tickets to the right owner. Drafting routine emails (renewals, follow-ups, status updates) for human approval. Monitoring scheduled jobs and alerting on regressions. Updating internal documentation when code changes. Running periodic data quality checks across your databases. None of these are glamorous. All of them used to take a part-time human, and now take a junior fraction of a human plus an agent doing the bulk of the work.

What we do not automate

Hiring decisions. Strategic choices about product direction. Customer escalations involving genuine conflict. Anything where the cost of a wrong answer dwarfs the time saved. Anything that requires reading the room or weighing political considerations a model cannot see. We are not in the business of replacing the leadership teams judgment. We are in the business of taking the recurring, codifiable, auditable work off their plate so they have the time to use that judgment well.

Where this fits

The operations work is the long tail of an Iron Goo engagement. The audit happens in three days, the build happens in weeks, but operations is the shape of the relationship from then on. We set up the agentic-client stack on your teams machines, we write the first round of skills and runbooks, and we hand off the supervision model to your ops lead with a clear playbook for adding new automations as the business grows. Read the operations service page for the customer-side framing, or get in touch with what you are running today and we will tell you whether the agentic shape is right for your shop, or whether something simpler (or nothing at all) is the better answer.

Ready to move?

Send us a note about where your business is today. You'll get back a written assessment within two business days.

Talk to us