Iron Goo
Featured card for the Iron Goo guide on building a small-business site an AI agent acting for a customer can finish a task on

Designing for the Other User: the AI Agent Acting for a Person

Atamyrat Hangeldiyev
Atamyrat Hangeldiyev
Systems Architect
March 13, 2026
On this page
UX

Someone asked Claude to book a furnace tune-up for them while they were in a meeting, and the agent walked the regional HVAC company's site almost to the end. It found the service, picked the right option, filled the name and address fields, and arrived at the calendar. There it stopped. The date picker was a grid of bare numbered cells with a tiny month-arrow glyph and no text the agent could read to know what it was looking at, no name on the control, no machine-readable label marking any cell "Tuesday, available". A person had used that exact widget a thousand times by squinting at the small calendar, recognizing the shape, and clicking the cell that looked right. The agent could not squint. It had no eyes to squint with. It tried the only thing left, retried the previous step, and the booking never happened. The customer came out of the meeting to a half-finished task and a business that never heard from them.

Designing for the AI agent as a user is the practice of building a small business's own digital surface so that an AI agent acting on a person's behalf can perceive its structure, read its labels and state, and complete the task it was sent to do, in the context of small and mid-sized businesses whose sites were drawn only for a human's eye. That is the whole of it. It is not a new marketing channel, not a trick to get recommended by a chatbot, not a feature you bolt on. It is the same surface you already have, judged by one new question: when an actor that cannot see the screen is driving, does the task still finish. This guide is the argument that the work this requires is almost entirely the work that already makes the site good for people, plus one specific thing the agent must be told that a sighted person is told by sight.

What it means to design your site for the agent acting on a customer's behalf

The agent is not a crawler indexing your pages and not a chatbot you installed. It is software a real person handed a real job to: book the appointment, get the quote, reorder the part, check whether the thing is in stock. The person delegated the task and walked away. The agent now stands where that person would have stood, on your site, trying to finish. Designing for it means building the surface so that the doing of the task, not just the looking at the page, is something a non-visual actor can carry through to completion.

That framing matters because it rules things out. It is not about making your copy more persuasive to a model. It is not about how the agent found you or whether the agent recommends you. It is about a surface the agent already arrived at being one it can actually operate.

The agent is a real user with a real task, sent by a real person

When a person uses Claude or a Claude-powered agent to handle something, the intent is not "browse this company". The intent is a finished outcome: an appointment on the calendar, a quote in the inbox, an order placed. The agent is the hands; the person is still the one who wanted the result. So the standard the agent holds your site to is the same standard the person holds it to, can the task be completed, with one difference in how it perceives the surface. The person reads the screen with eyes and a lifetime of pattern recognition. The agent reads the surface through its structure, its text, and the names and states the markup exposes. Same task, same goal, different sense organ. Everything in this guide follows from that single asymmetry.

Concretely: on the B2B parts distributor's site, a person asked to "get a quote for a 40-foot container" sees a four-step form, recognizes each field by its visible label and position, and clicks Next four times. An agent sent to do the same job needs each of those things to exist as something it can read and act on, not as something a sighted person infers from layout. If the steps are real, named, and observable, the agent finishes. If they are only visual, it stalls exactly where the person would have coped.

This is your surface being used, not your business being found

There is a separate discipline about whether an AI names you, cites you, or recommends you when someone asks it a question. That is search visibility, and it lives in a different pillar entirely. It is a real subject and it is not this one. This guide starts after the agent has already arrived at your site with a task in hand. The question here is never "did the model pick us". It is "now that an agent is on our booking page trying to book, can it".

I am drawing that line hard and then not crossing it. Being found is about getting the agent to your door. Agent legibility is about whether the agent can do anything once it is through the door. Confusing the two leads businesses to spend on the wrong thing, polishing how they get described while their actual booking flow remains something only a sighted human can complete. If you want the visibility side, read the search pillar; it is not taught here, on purpose.

A concrete picture: the same booking, a human and an agent doing it

The clearest way to see what changes is to watch the identical task run twice on the two-location dental group's new-patient intake, once by a person and once by an agent.

A person doing the booking

They land on the intake page. They see a heading styled large and bold and read it as "New Patient Appointment" because it looks like a title. They scan three cards, recognize the middle one as the one they want by its icon and position, and click it. They reach a date control, see a small calendar, and click the cell that visually reads as next Thursday. They fill four fields they identify by the words next to them. They submit. A green checkmark animates in and fades. They saw it flash, so they believe it worked, and they close the tab.

An agent doing the booking

It lands on the same page. The large bold text is a styled div, not a real heading, so the agent has no structural landmark telling it what this page is for. The three cards are unlabeled tiles; the icon that told the person "this one" is a decorative image with no text. The date control exposes no accessible name and no readable indication of which cells are available. The four fields have placeholder text that vanishes when focused and no persistent label, so the agent cannot reliably tell which field is which. It submits. The only success signal is a checkmark that animates and disappears with no text behind it, so the agent cannot confirm the appointment was made and either reports failure or retries and double-books.

Same page. Same task. The person muscled through five places where the surface communicated only visually, because a person can guess and recover. The agent failed at the first of those places and never reached the others. Nothing about the page was broken for humans. It was simply never built so the task itself, as opposed to the appearance of the page, could be perceived by an actor without eyes.

Your site already has two users, and most of it cannot tell

Here is the idea this whole guide turns on, and it is not a slogan, so I am going to keep attaching it to specific things on specific pages every time it comes up. Your site has a person using it. Your site now also has, on some real fraction of visits, an agent using it on behalf of a person. The surface is one surface. The two actors want the identical outcome. But the surface, as most SMB sites are built, can only communicate with one of them, because it communicates through appearance, and only one of the two has eyes.

That is the stakes. Not "agents are the future". The narrow, practical claim: the booking page that loses the agent at the unnamed date picker is the same page that made the human squint at the unnamed date picker, and fixing it for the agent fixes that strain for the person too. The cost of ignoring this is not abstract. It is the specific furnace tune-up that did not get booked, the specific quote that did not get requested, on a real flow you already own.

The person and the agent are asking the surface for the same thing

On the niche industrial-supply shop's reorder page, the task is identical no matter who is driving: find the previously ordered part, set the quantity, place the order. A returning customer does it by recognizing the part from a thumbnail and a name they remember. An agent sent to "reorder the same filter cartridges as last time" does it by reading the part identifier, the quantity input, and the order control as named, operable elements. Neither actor wants anything the other does not. The reorder either completes or it does not. The surface does not need a different goal for each; it needs to express the one shared goal in a form both can act on.

This is why the work converges instead of forking. You are not building a path for the person and a separate path for the agent. You are making the single existing path express what it is, so that whoever is driving can follow it. The part with a real identifier in real text helps the customer who is comparing two cartridges as much as it helps the agent that has to know which one it is.

Why a surface drawn only for the eye fails the agent at the exact spot it strained the person

Go back to that date picker. A human used it, but watch how. They moved their eyes across a grid, used color and a faint border to tell available from unavailable, found "next Thursday" by counting cells visually, and clicked. Every one of those operations was visual inference doing work the markup did not do. The person succeeded by paying a small tax in effort and attention. The agent hit the same control and there was nothing under the appearance for it to act on, so it did not pay a larger tax, it simply could not proceed.

The pattern repeats wherever a surface offloads meaning onto appearance. The B2B quote form's "Next" button that is a styled span with no button role: a person clicks the thing that looks clickable; the agent looks for a control and finds decoration. The dental intake's required-field error shown only as a red outline: a person sees red and re-reads; the agent gets no signal a field is wrong. In every case the failure point for the agent is precisely the place the human had to compensate. The agent is, usefully, an instrument that finds every spot your surface made a person work to understand it.

The same clarity pays twice

Key idea

State this once and then never as a slogan again: a control with a real accessible name, a state expressed in text and not only in color, and a task that exists as a structured path rather than a sequence of inferred screens. Those three things let the agent finish the booking. The same three things are what let a screen-reader user, a keyboard-only user, and a distracted person on a phone in bright sun finish the same booking. You are not buying agent support. You are buying clarity, and the agent is one of several users that clarity was always for.

The economic point follows directly. If serving the agent required a separate build, an SMB with no spare engineering time would be right to defer it. It does not require a separate build. It requires the existing flow to stop encoding its meaning in pixels. That is a finite, structural change to one surface, and it discharges several obligations at once, which is exactly why it is worth doing for a business that cannot afford to do anything twice.

What a sighted person gets for free that you have to give the agent on purpose

A sighted person extracts four things from a page without anyone deciding to provide them: what each region is, what each control is, what just happened, and where the path goes next. They get all four from appearance and inference. The agent gets none of the four from appearance, because it has no appearance to read from. It gets them only if the surface states them. These are the four things, each tied to the specific failure that occurs when it is missing.

Real structure, not structure you can only see

A heading that is large and bold is not a heading. It is text that looks like a heading to a person who can see it. To an agent, structure exists when the markup says "this is a heading", "this is the main content", "this is a form", "this region is the booking widget". When structure is only visual, the agent has no map. On the dental intake page, the styled-div title meant the agent could not establish what the page was for or locate the appointment region within it, which is why it never got as far as the date control. Real structure is the difference between a surface the agent can orient inside and a flat wall of elements it has to guess at.

The fix is not cosmetic and a person will not see it change. The page looks identical. But "New Patient Appointment" rendered as an actual top-level heading, the booking form rendered as an actual form with an accessible name, the steps rendered as real grouped regions, turns a flat wall into a labeled building the agent can move through. The failure it prevents is the most basic one: the agent not knowing where it is or what it is supposed to operate.

Names and labels a machine can read

A person recognizes the reorder control on the industrial-supply page because it is a cart icon and they know what a cart icon means. The agent sees an element with no text. "Recognizable by icon" is not a property the agent can use. Every control the task depends on, the primary button, every form field, the step advance, has to carry a name in text the markup exposes, not a name a human supplies by recognizing a glyph.

The concrete failure here is the one that killed the HVAC booking. The date picker had no accessible name and the available cells carried no readable label, so even though a person could see which days were open, the agent could not tell an available Tuesday from an unavailable one or even tell that this region was a date picker at all. A real accessible name on that control and a readable indication of which cells are selectable would not change one pixel for the person who can already see it. It would be the entire difference between the agent finishing the booking and the agent dead-ending. Unlabeled icon-only controls are the single most common place I have watched agents fail on small-business surfaces, because they are the place the design leaned hardest on the human eye.

State the agent can observe

The person believed the dental booking worked because a green checkmark flashed and faded. They saw it. The agent cannot rely on having seen a transient animation, and if nothing in the surface says in readable text "your appointment is confirmed for Thursday at 2pm", then as far as the agent can tell, nothing happened. This is the failure that produces double bookings: the agent completes the action, gets no observable confirmation, assumes it failed, and retries. The business gets two appointments and a confused customer, or an angry one who was charged twice.

State means more than success. It means the error too. The B2B quote form that signals a missing field with only a red border tells a person "fix this" and tells the agent nothing, so the agent submits again into the same wall. Observable state is text that says what happened: confirmed, failed, this field is required, this date is unavailable. It does not replace the visual confirmation a person likes. It sits behind it so that the actor without eyes has the same information the actor with eyes got from the flash of green.

A task that exists as a path, not only as screens a person infers

A four-step form is, to a person, four screens they move through by clicking the thing that advances. They infer the path by doing it. To an agent, a task is completable when it exists as a traversable path: a sequence of steps that are real, ordered, and individually perceivable, with a real control that advances each one and observable state that says which step you are on and whether the last one succeeded. When the path exists only as screens a person assembles in their head, the agent has nothing to traverse.

On the B2B distributor's quote request this was the difference between success and a stall at step four. The first three steps had real advance controls; the agent walked them. The final confirmation step's only "submit" was a styled element with no role and the only result was an un-named state change, so the agent reached the last step and could neither confirm it had submitted nor tell it was done. The path broke at exactly the point where the surface stopped expressing the path and started relying on the person to know they were finished. A task is legible to an agent only when the whole path, including the end, is something the surface states rather than something the human supplies.

One surface, two actors
The task or nothing
The same fix, both users

The screen-reader user and the agent are asking for the same thing

Almost everything in the previous section will sound familiar if you have ever thought about whether a blind person can use your site. That is not a coincidence and it is the most useful fact in this guide. The actor that uses a screen reader and the agent that acts for a customer perceive your surface through nearly the same channel: not the pixels, but the structure, the names, the states, the path expressed in markup. What serves one serves the other, because the bottleneck is identical.

The semantic markup that serves one serves the other

A screen-reader user reaches the HVAC date picker and, if it has no accessible name and no readable available-state, is as stuck as the agent was, for the same reason. The fix is the same fix. This overlap is large and it is real, and it is fully argued, with the keyboard, focus, contrast, and announcement detail an SMB actually needs, in the guide on accessibility and inclusive design for small businesses. The semantic, well-structured markup that makes a surface usable by a screen reader is the same material that makes it agent-legible, and that guide is where the accessibility argument lives in full. I am not going to re-teach it here; I am pointing at it because the work you would do for the agent and the work you owe the screen-reader user are, in the parts that matter most, one body of work.

The honest scope of the overlap: most of agent-legibility is good accessibility. Not all of it. An agent also benefits from things a screen-reader user does not strictly need, like a clean and predictable path through a multi-step flow expressed in a way automation can follow without visual cues. But the foundation, real structure, real names, observable state, is shared, and if you have done the accessibility work properly you have already done most of this.

This means the highest-payoff move is work you may already owe

A business your size very likely has an accessibility obligation, commercially and in many places legally, whether or not anyone has framed it that way. If that is true, the highest-payoff move available to you is not a new agent project. It is doing the accessibility work you already owe, correctly, on the one flow that matters, because that single body of work discharges the legal exposure, serves disabled customers you are currently turning away, and makes the surface agent-legible, all from one structural change. You almost never get to spend once and satisfy three obligations. Here you do, and that is the practical reason to act, not any forecast about agents.

The smallest real thing a small site can do first

You do not need an agent strategy. You need to make one task survive a non-visual actor, and you need to start with the one task that, if it fails, costs you money. Everything below is scoped to a business with no spare design or engineering time, doing the smallest thing that is actually real.

Pick the one task that matters and make it the test

Do not audit the whole site. Name the single task that is the business: for the HVAC company it is booking a service call, for the B2B distributor it is the quote request, for the dental group it is new-patient intake, for the supply shop it is reorder. That one task is your test. A generic "is our site agent-friendly" review produces a long list and no action. "Can an agent complete our one revenue task" produces a yes or a no and a short list of exactly what blocks it.

Walk that one task as the agent would: can it be perceived, understood, completed

Take the one task and go through it asking three questions at every step. Can the agent perceive what this region and control are, from structure and text, with appearance removed. Can it understand what it is supposed to do here, from names and labels, not from an icon or a layout. Can it complete the step and observe that it succeeded, from readable state, not from a flash of color. Wherever the answer is no, you have found a specific defect on a specific control, which is worth more than any general assessment.

  1. Name the one task

    Write down the single task that is the business. Booking, quote, reorder, intake. Not the whole site. The one flow whose failure is a lost customer.

  2. Walk it as the agent

    Go step by step. At each control ask: real structure, real name, observable state. Record every place the answer is no, as a named control, not a vague note.

  3. Fix the blocking gaps, retest the same task

    Fix the specific defects that stop completion, not everything you noticed. Then run the one task again, same way, and confirm it now finishes end to end.

Three concrete fixes that move the needle on a normal site

On most small-business surfaces, three changes carry the majority of the result. They are ordered by how often I have seen each one be the actual blocker.

  1. Put a real accessible name on the primary control of the one task. The book button, the submit, the date picker, whatever the task hinges on, gets a name the markup exposes in text, not an icon a person decodes. This is the single fix that most often turns a dead-end into a completion, because the unnamed primary control is the most common failure point. A person sees no change. The agent gets a handle on the exact element it needs.
  2. Make the one success state readable. When the task completes, the surface must say so in text the agent can read: "Your appointment is confirmed for Thursday at 2pm", not only an animation that fades. This is the fix that stops double submissions, the most expensive failure mode because it produces duplicate bookings and charges.
  3. Make the primary task reachable without a visual-only step. If finishing requires recognizing an icon, reading a color, or inferring a screen, that step has to also exist as named structure with observable state. Remove the one place in the path where the only way through is to see it.

Three fixes, one flow. That is a real first move for a business with no team, and it is finite enough to actually get done.

When this is engineering you do not staff

Be honest with yourself about what these fixes are. They are not copy edits and they are not settings in a page builder. Putting real structure, real accessible names, and observable state into a booking widget or a multi-step quote form is front-end engineering done at the level of the components themselves. Many small businesses do not have a person who does that work, and the page builder that produced the unnamed date picker will keep producing unnamed date pickers. Making your surface genuinely agent-legible and actionable is exactly that engineering, structural work shipped into the site itself, and it is the kind of work covered by Iron Goo's AIO service, which prepares a business's surface for AI agents that act on a customer's behalf. That service is explicit, and I will match its honesty here, that agent-driven commerce is the frontier this is built for, not a finished, settled pipeline. The point is not that agents are doing everything today. The point is that the structural work that lets an agent finish the task is the same work that already serves your other users, so it is worth doing now, by someone who does this for a living if that is not you.

What this is not: two boundaries you should not blur

This subject sits between two things it gets confused with, and the confusion is expensive because it sends a busy owner to spend on the wrong work.

Designing for the agent that uses your site vs being found by the AI that recommends it

Being found is whether an AI names, cites, or recommends you when a person asks it a question; agent legibility, this guide's subject, is whether the agent can finish the task once it is already on your site. A business can be perfectly recommended and still lose every delegated booking to an unnamed date picker, and perfectly agent-legible yet never sent any agents. They are independent, fixed in different places, and only the second is taught here; the visibility side lives in the search pillar.

Your surface being consumed by an agent vs the AI feature you ship yourself

This is the boundary most worth getting exactly right, so it is argued in full here and only pointed at elsewhere. There are two completely different questions that both involve AI and a website, and they are not the same attribute.

This guide's question: your normal existing surface, the booking page you already have, is being driven by an external agent that some customer pointed at you. You did not build the agent. You do not control it. Your job is to make your surface something it can operate. The agent is the user; your page is what it uses.

The other question: you decide to put an AI feature on your own site, a chatbot, an assistant, an AI search box, something you build and run. Now the design problems are entirely different: setting expectations for what your bot can do, showing your bot's uncertainty honestly, falling back to a human when it should. That is a different attribute with different failure modes, and it is its own guide, the UX of an AI feature your business ships itself. One sentence test to keep them apart: if the AI was sent by your customer and you did not build it, you are in this guide; if you built the AI and put it on your site, you are in that one. I am not teaching that one's content here beyond this disambiguation, because it is genuinely a different problem, not a subtopic of this one.

An agent-legible site vs "just add an API" or a separate site for robots

Two tempting shortcuts both fail, and a busy owner should know why before someone sells them either.

The first is "we will just expose an API". An API is a developer integration. The agents acting for ordinary consumers are operating your public site the way a person would, not wiring into an endpoint, and even where an API exists, your site is still the thing the customer's agent lands on and tries to use. An API does not excuse an illegible page; it is a different surface for a different consumer, and it does not make the booking page work for the agent that is actually on it.

The second is "we will build an agent version of the site". You do not want a parallel site for robots. It will drift from the real one, it will fall out of date, and it doubles the surface you maintain while leaving the real page exactly as broken as it was. The entire point of this work is that there is one surface that resolves the task for whoever is driving it. Two surfaces is the failure, not the goal.

What designing for the agent changes around it

Doing this work does not stay contained to one button. It changes a few things upstream of any single page, and naming them keeps you from treating a structural problem as a cosmetic one.

How it changes your templates, not just one page

The unnamed date picker is almost never one bad page. It is a component reused across every page that books anything, so the same defect ships everywhere that component does, and fixing the one instance you noticed leaves the rest broken. Agent legibility is therefore a template-level property, built in by construction, not a tag patched onto a page after the fact. The durable version of this work is structural: the component that renders a date picker renders one with a real name and observable state every time, so every page that uses it inherits a surface an agent can operate. For a business whose templates were never built that way, this is the same work as a structural rebuild that produces semantic templates by construction, which is what Iron Goo's Foundation service does for companies that do not staff that rebuild. The reason to think at the template level and not the page level is simple: a page-level fix decays the moment someone adds another page from the same template, and a template-level fix holds.

How it folds into the accessibility work you may already owe

Do not schedule this as an "agent project" with its own budget line. Fixing the booking flow so a screen-reader user can complete it does the large majority of what makes it agent-legible in the same pass. Treat them as one initiative scoped to the one task that matters, and you spend once for what would otherwise be three projects.

Why the one-task test beats a generic agent audit

A general "agent readiness audit" produces a document; the one-task test produces a fixed booking flow. The audit has no completion criterion, so it expands into a backlog nobody acts on. The test has a binary answer and a finite list of the controls that blocked it. For a business with no spare time, the test that ends with a working flow wins.

Where this leaves you

Strip the noise away and the claim is narrow. Your site has a person using it and, on some real share of visits you cannot honestly put a number on, an agent using it for a person. The surface is one surface and it currently speaks only to the one with eyes. Making it speak to both is not a forecast bet on an agent revolution and not a separate build. It is the same clarity that already helps the distracted person on a phone, the screen-reader user, and now the agent, expressed once, in the structure of the page, on the one task whose failure costs you a customer.

The single smallest move is not abstract. Take the one task that is the business, the booking, the quote, the reorder, the intake, walk it as an agent would with appearance removed, and give the control it hinges on a real name and a readable confirmation. If that work is engineering you do not staff, hand it to someone who does, but scope it to that one flow first. When you have made the one task survive a non-visual actor, the next thing to read in this pillar is the UX of an AI feature your business ships itself, because that is the adjacent and genuinely different question, and the full UX pillar holds the rest of the work this one sits inside.

Related in UX

Ready to move?

Send us a note about where your business is today. You'll get back a written assessment within two business days.

Talk to us