Why General-Purpose Embeddings Fail at Human Matching
And what outcome-trained representations get right
The embedding models that power most search and recommendation systems today were trained with a single objective: make semantically similar text produce similar vectors. This works remarkably well for document retrieval and semantic search. It fails, often silently, when applied to human matching.
The similarity trap
Consider a recruiting platform that uses off-the-shelf embeddings to match candidates to roles. The system will reliably surface candidates whose resumes contain the same keywords as the job description. A posting for "Senior React Engineer" will return profiles that mention React, JavaScript, and frontend development.
But keyword overlap is not what predicts a successful hire. The candidates who stay longest and perform best are often those whose deeper traits, working style, growth trajectory, and cultural fit, align with the team and role in ways that surface-level text matching cannot capture.
This is the fundamental gap. General-purpose embeddings optimize for a proxy (text similarity) rather than the actual outcome (successful match).
What outcome-trained embeddings look like
Our approach starts from a different premise. Instead of training on text similarity, we train dual-encoder models on labeled outcomes: hires that lasted, dates that converted to second dates, founders who closed term sheets with specific investors.
The training signal is not "these two texts are similar" but "these two entities had a successful interaction." This produces a fundamentally different geometry in the embedding space, one that encodes compatibility rather than surface similarity.
Measuring the difference
On our internal benchmarks, domain-specific embeddings trained on outcome data achieve 84.3% NDCG@10 on compatibility retrieval tasks, compared to 52-69% for leading general-purpose models including OpenAI, Gemini, and Voyage. The gap is not marginal. It reflects a structural difference in what the models have learned to represent.
Implications
For any platform where the quality of matches directly impacts business outcomes, the choice of embedding model is not a commodity decision. It is the core piece of infrastructure that determines whether your system matches on keywords or on compatibility.
We are building this infrastructure as a platform: embeddings and rerankers adapted for specific matching domains, deployable via API, and trained on the outcomes that matter to each customer.
Related
Emotion Vectors and the Future of Matching
Anthropic just published landmark research showing that LLMs maintain abstract, causally operative representations of emotion. This is something we have been thinking about for a long time, and it changes how we should build matching systems.
Introducing the Embedding Adapter
Switching embedding providers has historically meant re-encoding your entire vector corpus. Today we are releasing infrastructure that eliminates that requirement.