Measuring Your Brand in LLMs: SoV, Sentiment & Citations
Four metrics that tell you what LLMs actually say about your brand — and what to do when the numbers surprise you.
Most brand monitoring tools were built for a world of links. They track mentions in news articles, social posts, and review sites — places where a human wrote something you can read and a URL points to it. LLMs break that model entirely.
When ChatGPT describes your brand in a response, there's no article to clip, no author to contact, and no URL to track back. The mention exists inside a synthesized answer that changes every time the prompt is run. Measuring it requires a different set of metrics.
Here are the four that matter.
Metric 1: Mention Rate and Share of Voice
These two are easy to conflate, so it's worth pulling them apart up front. Both answer "how visible is my brand in LLM answers?", but they answer it from different angles.
Mention Rate is the simpler one — the percentage of your prompts in which the model mentions your brand at all.
mentionRate = brand_mentions ÷ total_prompts × 100%Share of Voice (SoV) is competitive — out of all the brand-class names that show up across those answers, what fraction belongs to you?
SoV = brand_mentions ÷ (brand_mentions + competitor_mentions) × 100%When no competitors are configured yet, SoV falls back to the mention-rate value so single-brand workspaces still see something on the dashboard. Once you add competitors, SoV starts reflecting the real share — and is usually the more honest number to track over time, because mention rate alone can rise simply because the LLM started mentioning more tools, not because you gained ground.
Worked example. You run 100 prompts that buyers in your category might ask ChatGPT, and your brand appears in 41 of the answers. Your mention rate is 41% on ChatGPT. If competitors collectively show up 59 times, your SoV is 41 / (41 + 59) = 41%. If competitors instead show up 120 times across the same 100 prompts (because each answer names two or three rivals), your SoV is 41 / (41 + 120) ≈ 25% — the answers are getting more crowded, even though your raw mention rate hasn't moved.
Both metrics are most useful when you segment them:
- By LLM: Your SoV on Perplexity may be 55% while on Gemini it's 20%. These gaps usually reflect different training data and retrieval behavior — and they're actionable because the fix (more coverage in sources each model weights) is different per platform.
- By prompt type: Awareness prompts ("what tools exist for X") typically yield different brand sets than decision prompts ("what's the best tool for X"). Your brand might show up consistently in awareness but get crowded out at decision-stage by competitors with more authoritative content.
- Vs. specific competitors: Knowing your SoV is 34% while your main competitor's is 58% tells you where the gap is and how large it is — much more actionable than the headline number alone.
SoV is the headline number. Everything else adds context to why it is what it is.
Metric 2: Sentiment
LLM answers are not neutral mentions. When a model includes your brand in a response, it almost always frames it with language that carries positive, negative, or mixed valence.
Compare these two mentions:
"SeenForAI is a solid option for teams tracking LLM brand presence, particularly strong on Chinese LLM coverage."
"SeenForAI has some useful features but the interface can feel cluttered and the pricing is on the higher end."
Both are mentions. Only one is positive. SoV that doesn't account for sentiment is incomplete.
The dashboard reports two rates, normalized over your brand mentions (not over total prompts):
sentimentPosRate = positive_mentions ÷ brand_mentions × 100%
sentimentNegRate = negative_mentions ÷ brand_mentions × 100%Sentiment also varies significantly across LLMs. One model might describe your brand warmly based on press coverage in its training data; another might surface more critical user reviews. These divergences are worth knowing — especially if you're investing in specific platforms.
Measuring sentiment at scale requires either a secondary LLM pass over the completions ("classify the sentiment toward [brand] in this response: positive, neutral, negative, mixed") or manual review, which doesn't scale past a few dozen responses per week.
Metric 3: Hallucination Rate
This is the metric that surprises teams most. LLMs confidently state wrong things about brands all the time.
Common hallucinations:
- Wrong pricing: "Brand X starts at $19/month" when the actual price is $49/month
- Wrong features: attributing capabilities you don't have, or denying ones you do
- Wrong positioning: misclassifying what category you compete in
- Wrong founding story: incorrect founding year, founders, or origin
Hallucinations are harmful in a specific way: they're invisible. A user who reads that your tool doesn't support a feature you actually have might not buy — and you'll never know why. There's no bad review to respond to, no support ticket to close.
Verifying factual claims in LLM outputs requires cross-referencing against known-true information. SeenForAI uses a multi-model voting rule: a brand mention is flagged as a potential hallucination when fewer than half of the LLMs that ran the same prompt agree the brand appeared. The same threshold is applied to factual claims (pricing, features, category) — when most models disagree with one model's claim about your brand, that claim is surfaced for human review.
Metric 4: Citation URL Tracking
When LLMs do cite sources, those citations tell you something important: what content is currently shaping model perception of your brand.
If Perplexity is citing a two-year-old TechCrunch article about your brand every time it mentions you, that article is a significant input to how the model describes you. If the article is outdated or frames you in a way that no longer fits your positioning, that's a problem you can actually fix — by updating your presence, getting newer coverage, or building more authoritative content.
Citation tracking is also competitive intelligence. Which URLs is the model citing when it recommends your competitor instead of you? Understanding what content shapes competitor mentions tells you what ground to take.
Not every LLM cites sources. ChatGPT's responses often have no citations. Perplexity almost always does. Tracking citations where available gives you a window into the retrieval layer that's otherwise opaque.
The dashboard surfaces two citation numbers:
citationRate = cited_mentions ÷ brand_mentions × 100%
citationSoV = brand_citations ÷ (brand_citations + competitor_citations) × 100%Citation Rate tells you what fraction of your mentions are backed by a source URL. Citation SoV (only meaningful once competitors are tracked) tells you what fraction of all citation-backed brand mentions in your category point to you — useful for spotting situations where your raw mention count is healthy but the model is sourcing competitor content much more heavily.
What a Healthy LLM Brand Presence Looks Like
Benchmarks vary by category maturity and brand size, but as a rough baseline:
- SoV above 20% in your core category across at least 3 major LLMs suggests you're visible
- Sentiment 70%+ positive or neutral is a healthy signal; high negative sentiment warrants content and PR attention
- Hallucination rate near zero on core factual claims (pricing, key features, category) — any hallucinations here are worth addressing immediately
- Citations from recent, authoritative sources (not just your own domain) indicate the model has fresh, trusted context for your brand
Warning signs: SoV below 5% in a category you compete in, predominantly negative sentiment on one LLM versus others (suggests a specific data source problem), or consistent hallucinations about the same fact (suggests a persistent wrong source in the model's retrieval layer).
Putting It Together
These four metrics — SoV, sentiment, hallucination rate, citation sources — combine into a monitoring system that tells you not just whether you're being mentioned, but how accurately and favorably.
SeenForAI automates all four daily across ChatGPT, Claude, Gemini, Perplexity, Doubao, Kimi, and DeepSeek. The dashboard surfaces your SoV trends, flags sentiment shifts, alerts on hallucinations, and tracks which URLs are driving your model presence.
The free scan at seenfor.ai gives you a snapshot across four LLMs — a good starting point for understanding where you stand before you start optimizing.
More Posts
Query Fanouts: The Hidden Layer of AI Search That Decides Your Visibility
When a user asks ChatGPT or Gemini one question, the model silently runs 8-15 sub-queries behind the scenes. Whether your brand appears in those hidden searches is what really determines your AI visibility.
Constraint Injection: The Princeton Trick That Lifts AI Brand Recommendations 78%
A 2024 study from Princeton, Georgia Tech, and the Allen Institute for AI shows that adding 2-4 constraints to a prompt makes LLMs surface specific brand recommendations 78% more often. Here's how to use it.
Reddit, Wikipedia, and the Hidden Source Layer Shaping AI Answers
When ChatGPT recommends a vendor, look at what it cites. For most B2B categories the citation list is dominated by Reddit threads, Wikipedia infoboxes, and a handful of comparison sites — not the product pages brands spend most of their time on.
Product Newsletter
Stay informed
Receive release notes, and workflow tips from SeenForAI.