Retrieval Density Score (RDS)

The only metric that captures the accuracy-cost tradeoff in one number.

Formula
RDS = F1 / mean_tokens
Higher RDS = more correct answers per token spent
CKG
0.001751
F1: 0.857 · Tokens: 274
RAG
0.0000413
F1: 0.817 · Tokens: 17,900
GraphRAG
0.0000368
F1: 0.825 · Tokens: ~10,000

Why RDS?

F1, BLEU, ROUGE, and METEOR measure accuracy alone. They don't answer the question that matters in production: "How much did it cost to get there?"

A system with F1 = 0.80 using 10,000 tokens per query is less efficient than a system with F1 = 0.60 using 100 tokens. Accuracy-only metrics cannot distinguish between these cases.

RDS solves this. It captures both dimensions in one number: accuracy AND cost. A higher RDS means you get more correct answers per dollar of compute.

How to Calculate RDS

  1. Run your system against a fixed query set (100+ queries recommended).
  2. Measure macro F1 score across all queries (use BERTScore for semantic tasks).
  3. Record the mean token count per query.
  4. Divide: RDS = F1 / mean_tokens

Example: Your system achieves F1 = 0.35 using 800 tokens per query. RDS = 0.35 / 800 = 0.000438. Compare this to CKG (0.001751) for calibration.

FAQ

Where does the 42× advantage come from?
CKG RDS is 0.001751 and RAG RDS is 0.0000413. Divide: 0.001751 / 0.0000413 = 42.4×. This means CKG delivers 42× more correct information per token than RAG. It's the difference between accurate-and-cheap (CKG) and accurate-but-expensive (RAG).
Who introduced RDS?
Retrieval Density Score was introduced by Daniel Yarmoluk and Dan McCreary in the open CKG benchmark (2026), a reproducible evaluation across 47 domains and 8,121 queries. Dan McCreary is the former Head of AI at TigerGraph and co-author of Learning Graphs (O'Reilly).
Is RDS the same as cost-per-correct-answer?
Not exactly. Cost-per-correct-answer = (cost per token × mean tokens) / F1. RDS = F1 / mean_tokens. They're related but measure different dimensions. RDS is dimensionless and lets you compare systems on the same scale. Cost-per-correct-answer includes your specific token pricing.
Can I use RDS for my own system?
Yes. Calculate it the same way: gather 100+ queries, measure macro F1, record mean tokens, divide. The benchmark data (CKG, RAG, GraphRAG) gives you calibration points. If your RDS is higher than 0.001751, you've beaten CKG. If it's lower, you know where you stand.
Where can I see the full benchmark?
Read the open CKG benchmark paper (PDF) → It includes methodology, all 47 domains, the full query set, reproducible code, and confidence intervals.
Ready to improve your RDS?
Book a benchmark walkthrough