Your Agent Is About to Hold a Credit Card

A matte-black credit card resting in a payment terminal lit by electric-blue light in a dark data infrastructure room - a metaphor for an autonomous AI agent that can now move real money without a human at the checkout

Last week Google Pay announced it is rebuilding its payment stack for a buyer that does not have hands. The Universal Commerce Protocol, a new server architecture, and cross-device biometric approval are all aimed at one thing: letting an autonomous agent move money without a human clicking through a checkout page.

This is the moment a lot of agent programs have been quietly heading toward without admitting it. For two years the demos have been about retrieval, summarization, and drafting. Read-only. Reversible. Embarrassing when wrong, but not expensive. The instant an agent can complete a purchase, the failure mode changes from "that answer was off" to "that charge was real." Everything you deferred about authorization, identity, and audit comes due at once.

We build agents that go to production and stay there. So here is the part the announcement does not cover: what it actually takes to run an agent that spends money on the second day, the second week, and the quarter after the launch demo.

Spending is a different class of action

Most agent architectures treat tool calls as roughly interchangeable. The model decides to call search_inventory or send_email or submit_order, and the framework dispatches it. That uniformity is convenient and it is exactly the assumption that breaks the moment one of those tools moves funds.

A read action that fires twice returns the same data. A purchase that fires twice charges the card twice. A retrieval with a bad parameter returns junk you can ignore. A payment with a bad parameter sends money to the wrong merchant, in the wrong amount, and now you are in a reconciliation and refund process instead of a retry loop.

The first thing a transacting agent needs is a taxonomy of its own actions. Reversible and free. Reversible and costly. Irreversible. Money-moving. Those are not the same risk tier and they cannot share one approval path.

Tier 1

Reversible & free

Reads, retrievals, lookups. Fire twice and nothing changes. Wrong and you ignore the junk and retry.

Tier 2

Reversible & costly

Drafts, holds, provisional changes. Undoable, but the undo has a cost in time, attention, or cleanup.

Tier 3

Irreversible

Sent emails, submitted forms, external commitments. No clean undo. The blast radius lives outside your system.

Tier 4

Money-moving

Purchases, transfers, payments. Fire twice and you double-charge. Wrong and you are in refunds, not retries.

If your current architecture cannot tell you, per action, which bucket it falls in, you do not yet have an agent that is ready to hold a credit card. You have a chatbot that is one prompt injection away from a chargeback.

Idempotency is not optional anymore

Agents retry. They retry because the network blipped, because a tool timed out, because the orchestrator restarted, because a multi-agent handoff replayed a step. In a read-heavy world that is harmless and you rarely think about it.

A transacting agent has to treat every money-moving call as something that might be delivered more than once and must only take effect once. That means an idempotency key generated before the call, carried through the payment provider, and checked on the way back. It means the agent's own memory of "did I already do this" cannot live only in a context window that gets truncated or a process that gets killed. It has to be durable state outside the model.

Three retries. One charge.

The key is generated once, carried through every retry, and checked before money moves.

This is ordinary discipline in payments engineering and it is almost entirely absent from agent frameworks today. The frameworks were designed for conversation, where replay is free. Production commerce is where that design assumption sends you the bill.

The approval gate is a product, not a checkbox

Google's answer to "what if the agent buys something it should not" is cross-device biometric approval. The agent arranges the purchase on the laptop, you get a prompt on your phone, you approve. That is a human in the loop, and it is the right instinct. The hard part is not the prompt. The hard part is the policy behind it.

When does the agent act on its own and when does it stop and ask? Under fifty dollars, go. Over five hundred, always confirm. A new merchant the user has never bought from, confirm regardless of amount. Three purchases in ten minutes, pause the whole sequence. Those rules are business policy, and now they have to be encoded as software that sits between the model's decision and the actual transaction.

We have written before about the human-agent interface layer, the part of the system where agents and people actually meet. Transacting agents are where that layer stops being an architecture nicety and becomes the thing standing between your company and unauthorized spend at machine speed. A few things we have learned building these gates:

The approval policy has to live outside the model. If the rule for when to ask a human is expressed in the prompt, a clever input can talk the model out of it. The gate has to be deterministic code the model cannot reason its way around.

Approvals need context, not just a dollar figure. "Approve this $480 charge" is useless. "Approve $480 to a vendor you have not used before, requested by the procurement agent, for the office order you started this morning" is a decision a human can actually make in two seconds.

Silence is not consent. If the human does not respond, the safe default is no purchase, not a timeout that proceeds. That sounds obvious until you watch a team ship the version that proceeds because it demoed more smoothly.

A money-moving agent is the highest-value target in your stack

There is a reason this lands in the security column. An agent that can read your data is a confidentiality problem. An agent that can spend your money is a fraud problem, and fraud problems attract people who do this for a living.

The attack surface is the same one we flagged when we wrote up the five streaming challenges and the prompt-injection gates most demos skip. A product listing, a webpage, an email the agent ingests, any of these can carry instructions aimed at the agent rather than the user. In a read-only world a successful injection leaks information. In a transacting world a successful injection places an order. Same vulnerability, much larger blast radius.

Google itself warned earlier this spring that malicious web pages are being crafted to poison agents. Now picture that poisoned page sitting in the product feed of an agent with payment authority.

The defense is not a better prompt. It is structural separation between the content an agent reads and the instructions it is allowed to act on, plus the action taxonomy and approval gates above, so that even a fully compromised reasoning step cannot reach an irreversible money-moving action without passing a deterministic check the attacker does not control.

Gating at deploy is not governance in production

Here is the distinction that separates a launch demo from a system you can run. The policies and gates we just described get defined during development and wired in at deploy. That is necessary and it is not enough. A rule written in a sprint is a static artifact. The agent runs in a world that changes every minute: new merchants, new prompts, new edge cases the policy author never imagined, a model version that behaves a little differently than the one you tested.

Governance is what sits between the agent and its actions while it is running, not the document you wrote before it shipped. It is the layer that evaluates every action at the moment the agent attempts it, against the policy in force right now, and either allows it, blocks it, or escalates it for human approval. The agent proposes. The governance layer disposes.

Runtime Governance

The agent proposes. The governance layer disposes.

Every money-moving action is intercepted, checked against current policy, and recorded.

That separation has to be real and external, because an agent that polices itself is an agent that a single bad input can talk out of its own rules.

This is the work Strongly was built around. Not a one-time review at deploy, but live guardrails over agent actions in production: every money-moving call intercepted and checked against current policy before it executes, spend limits and merchant allowlists and velocity thresholds enforced at runtime rather than trusted to the model, and an escalation path to a human that the agent cannot bypass. When policy changes, it changes in the governance layer and takes effect immediately, without redeploying the agent. The agent's reasoning is one thing. What it is permitted to actually do is governed somewhere it cannot reach.

The payoff

The payoff shows up the first time an agent tries to do something it should not. With runtime governance, the action is stopped, logged, and surfaced before any money moves. Without it, you find out from the chargeback.

Day two is where it gets real

Launch day, your transacting agent works. Day two is when the questions arrive that the demo never had to answer.

A customer disputes a charge the agent made. Can you reconstruct, with certainty, what the agent saw, what it decided, which policy version was in force, whether the governance layer allowed or escalated the action, and whether a human approved? If your audit trail is a log of model tokens, you cannot. Money-moving agents need an audit record that is independent of the model's reasoning trace, capturing the action, the idempotency key, the governing policy and its version, the allow-or-escalate decision, and the human approval if one was required. This is the other half of runtime governance. The guardrail decides what the agent may do; the audit trail is the permanent record of what it did and under whose authority. One without the other is incomplete. A gate with no record cannot be defended after the fact, and a record with no gate is just a transcript of damage.

“

The guardrail decides what the agent may do. The audit trail is the permanent record of what it did and under whose authority. A gate with no record cannot be defended after the fact, and a record with no gate is just a transcript of damage.

That record is not a nice-to-have you add later. It is the thing your finance team, your auditors, and eventually your lawyers will live in, and it is why Strongly treats the governance layer and the audit ledger as one system rather than two. Every action the guardrail evaluates is the same action the ledger captures, which means the answer to "what did the agent do and was it allowed" is always reconstructable from a single source of truth.

Then there is the part nobody puts on a slide. Google's new intermediary, which it confusingly also calls an MCP server (a Merchant Commerce Platform, not the Model Context Protocol your engineers know), routes your agent's transactions through Google's infrastructure and gives Google a privileged view of the commerce that flows through it. A universal standard is genuinely useful. It also creates a dependency and a data-aggregation point you are choosing to build on. The convenience is real and so is the lock-in, and the time to weigh that is before you wire your spend through it, not after it becomes load-bearing.

What to do this quarter

You do not need a transacting agent in production next month. You do need to stop treating "the agent can buy things" as a feature you bolt on at the end. Concretely:

Classify every action your agents can take by reversibility and cost. If you cannot produce that list today, that is your first deliverable.

Make idempotency a requirement for any action that touches money, with durable state outside the model.

Build the approval gate as deterministic code with policy that lives outside the prompt, and default to no-action on silence.

Put a governance layer between the agent and its actions at runtime, not just at deploy. Every money-moving call should be evaluated against current policy and allowed, blocked, or escalated before it executes, in a layer the agent cannot reason its way past.

Stand up an audit ledger tied to that governance layer before the first real charge, so every action and the authority behind it is reconstructable from one source of truth.

Decide deliberately how much of your commerce flow you are willing to route through a single provider's protocol, and what your exit looks like.

Agentic commerce is coming whether your architecture is ready or not. The teams that win the next phase are not the ones with the flashiest shopping demo. They are the ones who can let an agent spend money on a Tuesday at 2am, govern every action it takes while it takes it, and prove on Wednesday exactly what it did and under whose authority. Guardrails in production and an audit trail to match are not paperwork you bolt on after launch. They are what makes the agent safe to turn loose in the first place. That is the work. That is what we do.

Your Agent Is About to Hold a Credit Card What agentic commerce actually demands on day two