We studied 1000 entities, across 17 LLM’s and Chabots using 5070 prompts, and getting 86,190 answers to understand what LLMs know vs what they say and why attention inside AI looks wildly unequal.
Most executives now hear some version of this line every week: “We need to be more visible in AI.”
The problem is nobody agrees on what “visible” means.
Some teams run a few prompts in ChatGPT, count mentions, and call it a score. Others track whether a model can answer basic questions about their company and call that “awareness.” Both approaches sound reasonable until you try to use them to make decisions.
Because in LLMs, recognition is not the same as recommendation.
To move this from folklore to measurement, we ran a structured benchmark and built a simple composite index an LLM Share of Mind score you can use like a standard.
Here’s what we did, what we found, and how you can use it.
What we measured and why
We tested 1,000 entities across 17 chatbots/LLMs using 5,070 total prompts generating 86,190 answers. Then we built a composite index that separates two things people constantly mix up:
- Recognition: do models know the entity exists?
- Recommendation (or “naming”): do models actually say the entity’s name when answering real questions?
In our dataset, those two measures are related, but not identical. The correlation between them is about 0.653 strong, but nowhere near 1.0. That gap is where strategy lives.
The problem: “AI visibility” is measured like folklore
If you ask five teams how they measure AI visibility, you’ll often get one of these answers:
1) “We asked a few questions and counted mentions.”
This is easy. It’s also noisy.
A brand can show up because:
- “How does this work?”
- “What are the rules?”
- “What should I consider?”
Counting mentions from a handful of prompts can confuse luck with share of mind.
2) “We tested if the model knows who we are.”
This is better but it’s still incomplete.
A model might “know” your company and still avoid naming you in answers. Or it might name you even when it can’t reliably identify you as a real entity (yes, that happens).
The thesis
Real “share of mind” in LLMs needs three ingredients:
- Breadth: how many models recognize you
- Depth: how often you get named in answers
- Stability: how confident we are, based on sample size and denominator reliability
The dataset: why this benchmark is different
We designed the benchmark to be large enough to reduce anecdote-driven conclusions.
Scope
- 1,000 entities (each one is a row)
- 17 models queried per entity (Entity Score denominator = 17)
- 5,070 total prompts (sum of questions asked across all entities)
- 4,950 unique question strings (prompts were not copy/paste clones)
- 86,190 answers (17 models x 5,070 prompts)
Two scores (defined clearly)
We measured two primary signals:
- Entity Score (Entity_num / 17)
“How many of the 17 models recognize/know this entity?”- If 15 models recognize the entity: Entity Score = 15/17
- Questions Score (Q_num / Q_den)
“How often does the entity show up in answers to the question set?”- If the entity is mentioned 60 times out of 170 answer opportunities: Questions Score = 60/170
A key note for credibility: denominators are not perfect in real life
Question volume varies by entity. Most entities were tested with 4–5 questions, some with 10.
And we detected 35 rows where Q_den ≠ 17 × Questions Count meaning some answer opportunities were missing or returned errors. Most benchmarks never even look for this. We did, because stability matters.
What the raw results reveal (and why one metric is misleading)
If you only take one thing from this article, make it this:
LLMs have two separate behaviors:
- They can recognize an entity,
- And they can surface an entity in answers.
Those are not the same behavior.
A. The “Known vs Named” reality check
When we cross-tabbed recognition vs naming, we got four groups:
- Aligned, Recognized & Recommended = 617 entities: known and named
(Entity>0 and Q_num>0) - Ignored = 319 entities: known but never named
(Entity>0 and Q_num=0) - Misaligned = 28 entities: unknown but still named at least once
(Entity=0 and Q_num>0) - Invisible = 36 entities: unknown and never named
(Entity=0 and Q_num=0)
Total: 1,000 entities.
This is the core reason a single metric fails. If you only measure “knowing,” you miss the 319 entities the models recognize but never recommend. If you only measure mentions, you miss whether the mentions are grounded or accidental.
B. The “Mention Cliff” is the default
Most entities barely show up in answers.
- Median Q_pct = 0.0235
Meaning: half of entities appeared in answers less than ~2.4% of the time. - 355 entities had Q_num = 0
Meaning: they were never named in answers at all.
That’s not a rounding error. That’s the baseline reality: most brands and organizations are invisible in day-to-day LLM responses unless something pushes them into the model’s active set.
C. LLM awareness is long-tailed
Recognition across models is not evenly distributed.
- 64 entities were recognized by 0 out of 17 models
- Only 25 entities were recognized by 17 out of 17 models
- Median Entity_pct = 0.2353
Meaning: the “typical” entity was recognized ~24% or by about 4 of 17 models
So “AI awareness” isn’t a smooth ranking. It’s a steep cliff.
D. Recognition and mention are related but distinct
Across all 1,000 entities, the correlation between Entity_pct and Q_pct was:
- Pearson correlation ≈ 0.6529
That’s strong enough to say the two measures move together. It’s also weak enough to guarantee this:
If you only measure one, you will mis-rank a lot of entities.
Most entities live in “the quiet zone”
Two numbers tell the story:
- 355 entities were never named at all in answers. That’s 35.5% of the full list.
- 319 entities were recognized by at least one model but still never named in answers.
So even when models “know” something, they often don’t bring it up.
There’s also a smaller, weird category:
- 28 entities were “unknown” by the recognition test but still got named at least once in answers.
That’s a clue that naming is influenced by more than clean “knowledge” including wording, ambiguity, and how models complete text.
A quick guide to percentiles (one sentence)
A percentile is just a rank cut.
- 50th percentile = the middle entity
- 90th percentile = top 10%
- 99th percentile = top 1%
Now let’s look at the shape.
Curve #1: Recognition is unequal (but not insane)
- Bottom 25%: recognized by about 3 of 17 models
- Middle (median): recognized by about 4 of 17 models
- Top 25%: recognized by about 10 of 17 models
- Top 10%: recognized by about 14 of 17 models
- Top 5%: recognized by about 16 of 17 models
- Top 1%: recognized by 17 of 17 models
Here’s what recognition looks like across the 1,000 entities:
A few more anchors:
- 64 entities were recognized by 0 of 17 models (completely unknown in this panel).
- 25 entities were recognized by all 17 models.
What this means: recognition is uneven, but it rises in a fairly steady way as you move up the rankings.
Curve #2: Mentions are extremely unequal
Now look at mentions in answers. This is where the “attention inequality” analogy really fits.
- Bottom 25%: 0% mention rate (not named at all)
- Middle (median): named in about 2.35% of answers
- Top 25%: named in about 10.6% of answers
- Top 10%: named in about 39.7% of answers
- Top 5%: named in about 59.4% of answers
- Top 1%: named in about 82.4% of answers
- Max: 100% (always named)
Here’s the simplest way to feel the difference:
- The typical (median) entity gets named in about 2 out of every 100 answers.
- A top 10% entity gets named in about 40 out of every 100 answers.
That’s roughly a 17× jump in “share of voice” from the middle to the top tier.
This is the cliff.
Recognition rises like a slope. Mentions rise like a wall.
Put the two curves together: “memory” vs “speech”
For clarity:
- Recognition is memory. The model has “heard of you.”
- Mentions are speech. The model chooses to say your name when someone asks.
Most of the competitive battle happens in the second step.
And the data says that step is highly concentrated:
- A small group gets named constantly.
- A large group barely gets named or never gets named.
Why this happens
The benchmark can’t prove motives, but the pattern is consistent with how LLMs behave:
1) Models default to “safe” names
When a user asks broad questions (“best,” “who,” “where”), models tend to pull from a short list of familiar, high-confidence entities.
2) Relevance beats awareness
A model can know your brand, but if the question doesn’t strongly point to your category, it won’t risk naming you.
3) Naming consistency matters more than people think
Small variations different spellings, punctuation, suffixes, domains can split the signal. This can lower both recognition and mentions.
4) Attention is path-dependent
Once an entity becomes a common answer, it keeps getting reinforced: more mentions → more expectedness → more mentions. That’s how power-law distributions form.
What a C-level leader should do with this
1) Stop using one metric to answer two questions
You need two dashboards:
- Recognition: “Are we known?”
- Mentions: “Are we being chosen?”
If you only track recognition, you can miss the real problem: being known but never surfaced.
2) Use percentiles as your planning language
Percentiles make this real without overthinking:
- Median mentions: ~2.35%
- Top 25% mentions: ~10.6%
- Top 10% mentions: ~39.7%
If you’re near the middle and your goal is “top tier,” that’s not a small lift. It’s a different league.
3) Treat AI visibility like distribution, not branding
In unequal systems, progress often looks like this:
- Nothing changes for a while…
- Then you cross a threshold and results accelerate.
So your plan should look like a measurement loop:
- Test often,
- Compare across multiple models (not just one) with a tool like LLMtel.com,
- Reduce naming variance
- Make category relevance obvious in public-facing language.
A transparent note on data quality
Most entities had 4–5 questions, some had 10. That changes the number of opportunities to be mentioned.
Bottom line
If you’re building a brand in the AI era, don’t start with “Do LLMs know us?”
Start with two questions:
- Do they recognize us?
- Do they choose to say our name?
Your benchmark shows the second one being named is where attention inequality is most extreme. And that’s where market advantage will compound.