Say hello to the new Saazy! See what’s new ✨

The Shape of LLM Awareness

Date

Author

We studied 1000 entities, across 17 LLM’s and Chabots using 5070 prompts, and getting 86,190 answers to understand what LLMs know vs what they say and why attention inside AI looks wildly unequal.

Most executives now hear some version of this line every week: “We need to be more visible in AI.”

The problem is nobody agrees on what “visible” means.

Some teams run a few prompts in ChatGPT, count mentions, and call it a score. Others track whether a model can answer basic questions about their company and call that “awareness.” Both approaches sound reasonable until you try to use them to make decisions.

Because in LLMs, recognition is not the same as recommendation.

To move this from folklore to measurement, we ran a structured benchmark and built a simple composite index an LLM Share of Mind score you can use like a standard.

Here’s what we did, what we found, and how you can use it.

What we measured and why

We tested 1,000 entities across 17 chatbots/LLMs using 5,070 total prompts generating 86,190 answers. Then we built a composite index that separates two things people constantly mix up:

In our dataset, those two measures are related, but not identical. The correlation between them is about 0.653 strong, but nowhere near 1.0. That gap is where strategy lives.

The problem: “AI visibility” is measured like folklore

If you ask five teams how they measure AI visibility, you’ll often get one of these answers:

1) “We asked a few questions and counted mentions.”

This is easy. It’s also noisy.
A brand can show up because:

Counting mentions from a handful of prompts can confuse luck with share of mind.

2) “We tested if the model knows who we are.”

This is better but it’s still incomplete.
A model might “know” your company and still avoid naming you in answers. Or it might name you even when it can’t reliably identify you as a real entity (yes, that happens).
The thesis
Real “share of mind” in LLMs needs three ingredients:

The dataset: why this benchmark is different

We designed the benchmark to be large enough to reduce anecdote-driven conclusions.

Scope
Two scores (defined clearly)

We measured two primary signals:

A key note for credibility: denominators are not perfect in real life

Question volume varies by entity. Most entities were tested with 4–5 questions, some with 10.

And we detected 35 rows where Q_den ≠ 17 × Questions Count meaning some answer opportunities were missing or returned errors. Most benchmarks never even look for this. We did, because stability matters.

What the raw results reveal (and why one metric is misleading)

If you only take one thing from this article, make it this:
LLMs have two separate behaviors:

Those are not the same behavior.

A. The “Known vs Named” reality check

When we cross-tabbed recognition vs naming, we got four groups:

Total: 1,000 entities.

This is the core reason a single metric fails. If you only measure “knowing,” you miss the 319 entities the models recognize but never recommend. If you only measure mentions, you miss whether the mentions are grounded or accidental.

B. The “Mention Cliff” is the default

Most entities barely show up in answers.

That’s not a rounding error. That’s the baseline reality: most brands and organizations are invisible in day-to-day LLM responses unless something pushes them into the model’s active set.

C. LLM awareness is long-tailed

Recognition across models is not evenly distributed.

So “AI awareness” isn’t a smooth ranking. It’s a steep cliff.

D. Recognition and mention are related but distinct

Across all 1,000 entities, the correlation between Entity_pct and Q_pct was:

That’s strong enough to say the two measures move together. It’s also weak enough to guarantee this:

If you only measure one, you will mis-rank a lot of entities.

Most entities live in “the quiet zone”

Two numbers tell the story:

So even when models “know” something, they often don’t bring it up.

There’s also a smaller, weird category:

That’s a clue that naming is influenced by more than clean “knowledge” including wording, ambiguity, and how models complete text.

A quick guide to percentiles (one sentence)

A percentile is just a rank cut.

Now let’s look at the shape.

Curve #1: Recognition is unequal (but not insane)

Here’s what recognition looks like across the 1,000 entities:

A few more anchors:

What this means: recognition is uneven, but it rises in a fairly steady way as you move up the rankings.

Curve #2: Mentions are extremely unequal

Now look at mentions in answers. This is where the “attention inequality” analogy really fits.

Here’s the simplest way to feel the difference:

That’s roughly a 17× jump in “share of voice” from the middle to the top tier.

This is the cliff.
Recognition rises like a slope. Mentions rise like a wall.

Put the two curves together: “memory” vs “speech”

For clarity:

Most of the competitive battle happens in the second step.

And the data says that step is highly concentrated:

Why this happens

The benchmark can’t prove motives, but the pattern is consistent with how LLMs behave:

1) Models default to “safe” names

When a user asks broad questions (“best,” “who,” “where”), models tend to pull from a short list of familiar, high-confidence entities.

2) Relevance beats awareness

A model can know your brand, but if the question doesn’t strongly point to your category, it won’t risk naming you.

3) Naming consistency matters more than people think

Small variations different spellings, punctuation, suffixes, domains can split the signal. This can lower both recognition and mentions.

4) Attention is path-dependent

Once an entity becomes a common answer, it keeps getting reinforced: more mentions → more expectedness → more mentions. That’s how power-law distributions form.

What a C-level leader should do with this

1) Stop using one metric to answer two questions

You need two dashboards:

If you only track recognition, you can miss the real problem: being known but never surfaced.

2) Use percentiles as your planning language

Percentiles make this real without overthinking:

If you’re near the middle and your goal is “top tier,” that’s not a small lift. It’s a different league.

3) Treat AI visibility like distribution, not branding

In unequal systems, progress often looks like this:

So your plan should look like a measurement loop:

A transparent note on data quality

Most entities had 4–5 questions, some had 10. That changes the number of opportunities to be mentioned.

Bottom line

If you’re building a brand in the AI era, don’t start with “Do LLMs know us?”
Start with two questions:

Your benchmark shows the second one being named is where attention inequality is most extreme. And that’s where market advantage will compound.

Leave a Reply

Your email address will not be published. Required fields are marked *