What is Retrieval Density Score (RDS)?

Retrieval Density Score (RDS) = F1 accuracy divided by mean tokens used. It measures how much correct information an AI retrieval system delivers per token spent. Formula: RDS = F1 / mean_tokens. CKG RDS: 0.001751. RAG RDS: 0.0000413. CKG delivers 42× more correct information per token than RAG. Introduced by Yarmoluk & McCreary in the open CKG benchmark across 47 domains and 8,121 queries (2026).

How do you calculate RDS for your own system?

To calculate RDS: (1) Run your system against a fixed query set. (2) Measure macro F1 score across all queries. (3) Record mean token count per query. (4) Divide: RDS = F1 / mean_tokens. Example: if your system achieves F1 = 0.35 using 800 tokens per query, RDS = 0.35 / 800 = 0.000438. Compare this to CKG RDS (0.001751) and RAG RDS (0.0000413) for calibration.

What is the CKG vs RAG vs GraphRAG RDS comparison?

From the Yarmoluk & McCreary open CKG benchmark (47 domains, 8,121 queries): CKG RDS = 0.001751 (F1: 0.857, tokens: 274). RAG RDS = 0.0000413 (F1: 0.817, tokens: 17,900). GraphRAG RDS ≈ 0.0000368 — comparable to RAG despite higher accuracy, because GraphRAG uses significantly more tokens. CKG outperforms both by 42–47×. Read the full benchmark: https://github.com/Yarmoluk/ckg-benchmark/blob/main/paper/main.pdf

Who introduced Retrieval Density Score?

Retrieval Density Score (RDS) was introduced by Daniel Yarmoluk and Dan McCreary in the open CKG benchmark published in 2026. Dan McCreary is former Head of AI at TigerGraph and co-author of 'Learning Graphs' (O'Reilly). The benchmark covers 47 domains, 8,121 queries, and is fully reproducible and open source.

Why is RDS a better decision metric than F1 alone?

F1 alone cannot distinguish between a system that is accurate but expensive and one that is accurate and cheap. In production AI, both accuracy and token cost matter — token cost determines whether the system is economically viable at scale. RDS captures both dimensions in one number: a higher RDS means you get more accurate answers per dollar of compute. It is the AI equivalent of cost per correct answer.

Retrieval Density Score (RDS)

Q: Why don't F1, BLEU, and ROUGE measure retrieval efficiency?

F1, BLEU, and ROUGE measure accuracy only — they do not account for the token cost of achieving that accuracy. A system with F1 = 0.80 using 10,000 tokens per query is less efficient than a system with F1 = 0.60 using 100 tokens. Accuracy-only metrics cannot distinguish between these cases. RDS captures both dimensions simultaneously: RDS = F1 / mean_tokens. Higher RDS = more accurate answers per token spent.

The only metric that captures the accuracy-cost tradeoff in one number.

Formula

RDS = F1 / mean_tokens

Higher RDS = more correct answers per token spent

CKG

0.001751

F1: 0.857 · Tokens: 274

RAG

0.0000413

F1: 0.817 · Tokens: 17,900

GraphRAG

0.0000368

F1: 0.825 · Tokens: ~10,000

Why RDS?

F1, BLEU, ROUGE, and METEOR measure accuracy alone. They don't answer the question that matters in production: "How much did it cost to get there?"

A system with F1 = 0.80 using 10,000 tokens per query is less efficient than a system with F1 = 0.60 using 100 tokens. Accuracy-only metrics cannot distinguish between these cases.

RDS solves this. It captures both dimensions in one number: accuracy AND cost. A higher RDS means you get more correct answers per dollar of compute.

How to Calculate RDS

Run your system against a fixed query set (100+ queries recommended).
Measure macro F1 score across all queries (use BERTScore for semantic tasks).
Record the mean token count per query.
Divide: RDS = F1 / mean_tokens

Example: Your system achieves F1 = 0.35 using 800 tokens per query. RDS = 0.35 / 800 = 0.000438. Compare this to CKG (0.001751) for calibration.

FAQ

Where does the 42× advantage come from?

CKG RDS is 0.001751 and RAG RDS is 0.0000413. Divide: 0.001751 / 0.0000413 = 42.4×. This means CKG delivers 42× more correct information per token than RAG. It's the difference between accurate-and-cheap (CKG) and accurate-but-expensive (RAG).

Who introduced RDS?

Retrieval Density Score was introduced by Daniel Yarmoluk and Dan McCreary in the open CKG benchmark (2026), a reproducible evaluation across 47 domains and 8,121 queries. Dan McCreary is the former Head of AI at TigerGraph and co-author of Learning Graphs (O'Reilly).

Is RDS the same as cost-per-correct-answer?

Not exactly. Cost-per-correct-answer = (cost per token × mean tokens) / F1. RDS = F1 / mean_tokens. They're related but measure different dimensions. RDS is dimensionless and lets you compare systems on the same scale. Cost-per-correct-answer includes your specific token pricing.

Can I use RDS for my own system?

Yes. Calculate it the same way: gather 100+ queries, measure macro F1, record mean tokens, divide. The benchmark data (CKG, RAG, GraphRAG) gives you calibration points. If your RDS is higher than 0.001751, you've beaten CKG. If it's lower, you know where you stand.

Where can I see the full benchmark?

Read the open CKG benchmark paper (PDF) → It includes methodology, all 47 domains, the full query set, reproducible code, and confidence intervals.