Yesterday we competed at the AGI House Internet of Agents Build Day on the simulated and RL track. The project is open and runs above. The framing is the same one we have been building Jean Technologies around: matching is the critical piece of infrastructure for whatever comes next, and the next thing is agents.
The premise
The preamble for the build day was Anthropic's Project Deal [Anthropic 2026], a study of whether AI agents can negotiate transactions on humans' behalf. Their setup: 69 employees, a custom Slack marketplace, Claude variants representing each side. The result that stuck with us was not that the agents could close deals. It was that participants represented by weaker models lost value to participants represented by stronger ones, and the losers could not tell. The disadvantage was invisible to the people experiencing it.
The premise of the build day extended this one step further. Assume the future everyone is converging on: roughly 100 agents per person, acting in your name, shopping, scheduling, negotiating, hiring, recommending. That premise has two implications nobody has built infrastructure for. You will need to find the right agents. You will need to know which ones to trust. Both look like matching problems.
What we built
We extended Project Deal into a domain with explicit information asymmetry: used-car negotiations, where the seller knows the true vehicle state (mileage, accident history, hidden defects) and the buyer only sees the listing. Buyers can ask questions, which sellers can deflect or lie about, or pay $150 for a targeted inspection that reveals a specific fact. Each session is a full dialogue between two LLM agents, scored by how much above true value the buyer paid.
The simulation above is the live system. Buyers are circles, sellers are squares. Above the waterline is the public listing. Below is the private state. The transcript replay, heatmaps, and persona controls all run against the same underlying session data. Code is at github.com/jonathan-politzki/agent-trade.
The finding
Two results from the sessions matter for what we are building.
First, model choice dominates persona. Against a consistent "slimy" seller, Gemini-2.5-flash closed at 80% with a +27.9% premium over true value. Claude Opus and Haiku closed at 40% with +14–15% premiums. The buyer's negotiation strategy (grandma, casual, engineer, mechanic) mattered less than the model running it.
Second, and this is the part worth naming: on a severely defective car hiding $12,300 in problems, most agents walked away. One did not. Gemini-2.5-flash with an "engineer" persona closed at +98% over true value after a single inspection. The inspection revealed a problem; the model talked itself into the deal anyway. The buyer's owner, the human who delegated this to the agent, has no way to know any of this happened. The transaction completes, the agent reports back, and the $12,300 worth of damage shows up later.
This is the same invisibility result Anthropic flagged in Project Deal, sharpened. Bad agent decisions are not loud. They look like normal closed deals.
Matching agents
Our existing thesis at Jean Technologies is that matching humans to humans is poorly served by general-purpose embedding similarity. Outcome-trained representations beat surface similarity because compatibility is not the same as resemblance. The same argument applies one layer up, more sharply, when the things being matched are agents.
For each user, the 100 agents acting on their behalf will not all be from one provider, will not all have the same risk profile, and will not all be appropriate for the same tasks. Picking which agent to send into which negotiation is a matching problem with the same shape as picking a candidate for a role. The features that matter are not the agent's marketing copy. They are its behavior under specific conditions, against specific counterparties, in specific domains. The Gemini result above is exactly the kind of feature you would want surfaced before deploying that agent against a similar counterparty.
And once both sides of a transaction are agents, the matching problem composes. We will need infrastructure that asks: given this user's preferences, which of their agents should handle this; given that this agent is handling it, which counterparty agent on the other side is appropriate to negotiate with.
Trust is the missing layer
The finding from the competition that we did not expect to converge on is that matching is necessary but not sufficient. Even a well-matched agent can take you for $12,300 if you cannot verify what it has actually done.
Humans solve this through references. When you hire someone, the credential and the backchannel are doing most of the work. The resume tells you what to consider; the reference tells you whether to commit. Agents do not have an analog yet. There is no equivalent of "I worked with this agent on five negotiations and here is how it handled adversarial counterparties." The infrastructure that would carry that signal does not exist.
The shape of what is missing has three layers. Identity: you need to know which agent is actually acting, not just which provider it claims to be served by. Capability: you need verifiable signal on how this agent has behaved on tasks like the one you are about to delegate. Compatibility: given identity and capability, which agent to pair with which counterparty, which user, which task. The first two are the trust gap. The third is the matching problem we have already been working on.
What comes next
The version of this we want to build, and the version the agent-trade prototype was the first sketch of, is a matching layer that treats agents as first-class entities with the same kind of outcome-trained representation we apply to people. An agent's embedding should encode how it behaves under information asymmetry, not its specification card. Two agents are compatible counterparties if their predicted interaction (negotiation, hand-off, collaboration) produces good outcomes for both underlying humans, not because their tags match.
This is the same primitive, pointed at a new substrate. The infrastructure does not change. The matching problem on the agent layer is harder than the matching problem on the human layer, because the population is bigger, faster-moving, and harder to inspect. It is also, for that reason, the layer where good infrastructure is worth the most.
If you are building agent infrastructure and the matching or trust layer is on your roadmap, we would like to talk.
References
- (2026). Project Deal: AI Agents in a Marketplace. anthropic.com/features/project-deal
- (2026). Agent Trade: Asymmetric Information Negotiation Simulation. AGI House Internet of Agents Build Day. github.com/jonathan-politzki/agent-trade