Jean Technologies Icon
Jean Technologies Text
All Posts
Research||8 min read

Why General-Purpose Embeddings Fail at Human Matching

And what outcome-trained representations get right

JP
Jonathan Politzki
Founder

The embedding models that power most search and recommendation systems today were trained with a single objective: make semantically similar text produce similar vectors. This works remarkably well for document retrieval and semantic search. It fails, often silently, when applied to human matching.

The similarity trap

Consider a recruiting platform that uses off-the-shelf embeddings to match candidates to roles. The system will reliably surface candidates whose resumes contain the same keywords as the job description. A posting for "Senior React Engineer" will return profiles that mention React, JavaScript, and frontend development.

But keyword overlap is not what predicts a successful hire. The candidates who stay longest and perform best are often those whose deeper traits, working style, growth trajectory, and cultural fit, align with the team and role in ways that surface-level text matching cannot capture.

This is the fundamental gap. General-purpose embeddings optimize for a proxy (text similarity) rather than the actual outcome (successful match).

What outcome-trained embeddings look like

Our approach starts from a different premise. Instead of training on text similarity, we train dual-encoder models on labeled outcomes: hires that lasted, dates that converted to second dates, founders who closed term sheets with specific investors.

The training signal is not "these two texts are similar" but "these two entities had a successful interaction." This produces a fundamentally different geometry in the embedding space, one that encodes compatibility rather than surface similarity.

Measuring the difference

On our internal benchmarks, domain-specific embeddings trained on outcome data achieve 84.3% NDCG@10 on compatibility retrieval tasks, compared to 52-69% for leading general-purpose models including OpenAI, Gemini, and Voyage. The gap is not marginal. It reflects a structural difference in what the models have learned to represent.

Implications

For any platform where the quality of matches directly impacts business outcomes, the choice of embedding model is not a commodity decision. It is the core piece of infrastructure that determines whether your system matches on keywords or on compatibility.

We are building this infrastructure as a platform: embeddings and rerankers adapted for specific matching domains, deployable via API, and trained on the outcomes that matter to each customer.