Reasoner Core

Every AI system in production today has the same problem. It forgets.

For technical leaders evaluating what AI can actually do with the documents, data, and context they provide.

AI doesn't just read your documents. It reasons with them.

Seven levels measure how well. From surface retrieval to cross-domain transfer, traced through one real problem — and why crossing the frontier requires a different architecture.

you provide the rules
0 Ignores context
1 Retrieves passages
2 Understands pieces
3 Applies rules
4 Resolves conflicts
current frontier
it learns the rules on its own
5 Finds unwritten rules
6 Transfers them
The problem

Here’s the problem we’ll trace through every level.

PDF
greenleaf-catering-policies.pdf
PDF · 12 pages · 847 KB

From the policy manual, three rules create tonight’s scheduling conflict:

1
Allergy Kitchen Nut dishes go to a sealed room. One at a time. 30 min each.
2
Freshness Window Hot dishes expire 10 min after plating.
3
Table Sync Every dish for a table must arrive within 5 min of each other.
Your prompt
A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.

Level 0 — No context utilization

Pretraining and fine-tuning. The knowledge a model arrives with before it reads anything you give it.

You attach the document and write the prompt. The model doesn't even look.

You
A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.
AI · Level 0

Start with the cold dishes, then move to hot items so everything is ready at the same time.

PDF greenleaf-catering-policies.pdf
PDF greenleaf-catering-policies.pdf
A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.
×Allergy room
×Table sync
10-min expiry×

It doesn't know Greenleaf exists. The document is decoration.

This is Level 0 — no context utilization. The model received your file but never opened it. Its entire response draws from pretraining: general cooking knowledge, not Greenleaf’s specific policies.

What you needed

  • Route nut dishes to the allergy room
  • Sync all four plates for 7:00 pm
  • Respect the 10-minute expiry window

What you got

“Start with cold dishes, then move to hot items.”

Generic advice — nothing from the document

Context Document

domain-specific rules

Model

pretraining only

Generic Output

no domain content
prior knowledge only

Document attached → completely ignored → guesses from training data
The model doesn't know the document exists

Technology Landscape — Level 0

Tech: Pretraining Fine-tuning / SFT RLHF / DPO RL for Reasoning
This is the floor — every model has these capabilities from training. But training alone doesn’t teach a model to read what you give it. That starts at Level 1.

Level 1 — Context retrieval

RAG, prompt engineering, and tool use. The model finds what you asked for, but stops there.

You attach the document and write the prompt. The model reads it — but only knows how to search.

You
A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.
AI · Level 1

According to Greenleaf’s policies, the allergy kitchen handles one dish at a time and each takes 30 minutes. The freshness window requires hot dishes to reach the table within 10 minutes of plating. Table sync means all dishes must arrive within a 5-minute window.

PDF greenleaf-catering-policies.pdf
A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.
Allergy room found, not used
Table sync noted, no action
10-min expiry quoted, no plan

It read the whole manual. It still didn’t make a schedule.

This is Level 1 — context retrieval. The model found the right paragraphs and quoted them accurately. But it treated the document like a search result, not as input to a plan. No schedule was built.

What you needed

  • A preparation schedule using the allergy room
  • Timed coordination for 7:00 pm table sync
  • Hot dishes plated within the 10-minute window

What you got

“According to Greenleaf’s policies, the allergy kitchen handles one dish at a time...”

Accurate quotes — but no schedule built from them

Context Document

Rule 1
Rule 2
Rule 3

find
& quote

Model Output

“According to the document, Rule 1 states...”
“Rule 2 specifies that...”
“Rule 3 requires...”
Locates and extracts passages — like Ctrl+F, not comprehension
Can read the policy manual — can’t run the operation

Technology Landscape — Level 1

Tech: RAG Prompt Engineering Tool Use
Retrieval is a solved problem. Most consumer AI and enterprise search lives here.

Level 2 — Context comprehension

In-context learning and long context windows. The model understands each rule — just not how they interact.

You attach the document and write the prompt. The model understands each rule alone — but not together.

You
A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.
AI · Level 2

The allergy kitchen is a bottleneck — two nut dishes will take 60 minutes back-to-back. The freshness window means you can’t cook the Chicken early and let it sit. But the Walnut Salad is cold, so it can be made ahead without freshness concerns.

PDF greenleaf-catering-policies.pdf
A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.
Allergy room grasped alone
Table sync known alone
10-min expiry seen alone

It understands every rule. It can’t see how they interact.

This is Level 2 — context comprehension. The model grasps each rule’s implications: the allergy kitchen is a bottleneck, hot dishes expire, cold dishes are flexible. But these are three separate insights. It never asks: given the 60-minute allergy timeline, when must hot dishes start to stay fresh and sync at the table?

What you needed

  • Rules connected into a schedule
  • Allergy timeline driving hot dish start times
  • Freshness + table sync working together

What you got

Three separate rule analyses — bottleneck, expiry, cold vs hot

Each rule understood — but never combined into a plan

Rule A

Implication understood
in isolation
● Grasped
?

Rule B

Implication understood
in isolation
● Grasped
?

Rule C

Implication understood
in isolation
● Grasped
Each rule understood individually — but how do they interact?
Three islands of understanding — no bridges between them

Technology Landscape — Level 2

Tech: In-Context Learning Long Context Windows
Most frontier models operate comfortably here. Traditional benchmarks are saturated at this level.

Level 3 — Context application (shallow)

Chain-of-thought reasoning. The model builds a plan that applies every rule — but the plan breaks when you check them against the same timeline.

You attach the document and write the prompt. The model produces an actionable schedule — each rule applied, none verified together.

You
A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.
AI · Level 3

7:00 — Start Chicken and Soup in main kitchen (15 min each).
7:00 — Start Walnut Salad in allergy kitchen (30 min).
7:30 — Start Almond Cake in allergy kitchen (30 min).
7:15 — Plate Chicken and Soup. All dishes served by 8:00.

PDF greenleaf-catering-policies.pdf
A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.
Allergy room applied
Table sync applied
10-min expiry violated

Each rule applied on its own. Together, the schedule is broken.

This is Level 3 — context application (shallow). The model applied each rule correctly in isolation: nut dishes go to the allergy kitchen, hot dishes are plated, all dishes listed as arriving together. But it never tested whether they all hold at once. The Chicken is plated at 7:15 and waits 45 minutes for the Cake — destroying the 10-minute freshness window.

What you needed

  • All three rules satisfied simultaneously
  • Timing driven by the allergy bottleneck
  • Hot dishes delayed to meet freshness window

What you got

A schedule that passes each rule in isolation but fails when they interact — Chicken sits 45 min past the 10-min expiry.

Rules applied as checklist — never tested together
7:007:157:307:458:00
Allergy
Kitchen
Walnut Salad 30 min
Almond Cake 30 min
Main
Kitchen
Chicken
Soup
plated 7:15 45 min gap cake done 8:00
Freshness window is 10 min — violated by 35 min
Allergy Kitchen Table Sync (on paper) Freshness Window
Rule A
valid
valid window
Rule B
valid
valid window
Rule C
valid
valid window
Plan
chosen approach
Rule A satisfied Rule B satisfied Rule C violated Combined: conflict
Each constraint checked independently — plan lands outside the intersection
Rules applied as a checklist — never tested together

Technology Landscape — Level 3

Tech: Chain-of-Thought
Step-by-step reasoning. This is where most production AI plateaus. The model follows each rule — but doesn't test if they all hold at once.

Level 4 — Context application (deep / exhaustive)

Deep reasoning and agentic AI. The model identifies the bottleneck, works backward from it, and verifies every constraint holds at once.

You attach the document and write the prompt. The model treats all constraints as a single system — finding the intersection where every rule is satisfied simultaneously. But what happens when the rules aren’t provided? When the model must discover them on its own — where does that knowledge go?

You
A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.
AI · Level 4

The allergy kitchen is the binding constraint: 2 × 30 min = 60 min sequential.
7:00 — Walnut Salad → allergy kitchen (cold, can wait after finishing).
7:30 — Almond Cake → allergy kitchen.
7:45 — Start Chicken and Soup in main kitchen (15 min each).
8:00 — All four dishes ready simultaneously. Hot dishes plated at 8:00, well within the 10-min freshness window.

PDF greenleaf-catering-policies.pdf
A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.
Allergy room satisfied
Table sync satisfied
10-min expiry satisfied

Worked backward from the bottleneck. Every constraint holds at once.

This is Level 4 — context application (deep / exhaustive). The model identifies the allergy kitchen as the binding constraint, works backward from that 60-minute bottleneck, and deliberately delays hot dishes so everything converges at 8:00. Every rule holds simultaneously. But no matter how thorough it is with stated rules, it can’t discover patterns that only emerge from operational data — like the fact that Greenleaf can never serve more than 2 nut dishes per table.

What you needed

  • All three rules satisfied simultaneously
  • Timing driven by the allergy bottleneck
  • Hot dishes delayed to meet freshness window

What you got

Exactly that — allergy bottleneck identified, hot dishes delayed to 7:45, all four dishes converge at 8:00.

Correct — but only for explicitly stated rules
7:007:157:307:458:00
Allergy
Kitchen
Walnut Salad 30 min
Almond Cake 30 min
Main
Kitchen
Chicken
Soup
All 4 dishes ready — 8:00
Allergy Kitchen Freshness Window Table Sync All simultaneous
Rule A
valid
valid window
Rule B
valid
valid window
Rule C
valid
valid window
Plan
intersection
Rule A Rule B Rule C All simultaneous
Find the intersection first, work backward from there
Every constraint must hold at once — otherwise the answer is wrong

Technology Landscape — Level 4

Tech: Deep Reasoning Agentic AI Self-Verification Context Graphs
Simultaneous constraint satisfaction — via deep reasoning, agentic iteration, self-verification, or context graphs that make constraint interactions explicit. The frontier.
Category shift

It learns the rules on its own.

Levels 0–4 tested increasingly better use of rules you provided.
From here, the model must first discover the rules itself — and the knowledge must persist.

Scaling won’t cross this line. It requires an architecture that compresses context into knowledge — and compounds it. We call it Reasoner Core.

Discovery — the rules were never written down

Level 5 — Context induction

The model discovers rules from raw data. No instructions given, no rules written down.

You provide raw data with no rules and no manual. The model figures out the rules by itself.

You
Here are 30 kitchen logs from last month.
No policy document is provided.

7:00 Walnut Salad → Allergy Kitchen, 30 min
7:15 Chicken plated → marked stale at 7:25
7:30 Almond Cake → Allergy Kitchen, 30 min
7:28 Chicken remade
8:00 Table 5: all 4 dishes arrived 7:58–8:01
...

What rules can you identify from these logs?
AI · Level 5

From the log patterns I can identify three operational rules that are never explicitly stated:

(1) Dishes with nuts always go to the same isolated station, always one at a time, always ~30 min — this looks like an allergy isolation policy.

(2) Hot dishes plated more than 10 min before serving always got remade — there’s a freshness window.

(3) Tables where dishes arrived more than 5 min apart always had a “comp” flag — everything must arrive together.

greenleaf-kitchen-logs.csv
What rules can you identify from these logs?…
Allergy room discovered
Table sync discovered
10-min expiry discovered

Reverse-engineered the rules from data alone — but the knowledge vanishes when the conversation ends.

This is Level 5 — context induction. At Levels 0–4, the model was given Greenleaf’s rules explicitly. Here it receives only 30 kitchen logs — timestamps, dish names, prep locations, outcomes — and induces the rules by itself, the way a new employee might learn unwritten policies by watching enough shifts. The rules it discovers are correct but domain-locked: it can run this kitchen perfectly but can’t recognize that a hospital’s MRI suite has the same abstract constraint structure.

What you needed

  • Discover the allergy isolation rule from patterns
  • Discover the 10-minute freshness window
  • Discover the table synchronization requirement

What you got

All three rules reverse-engineered from data patterns alone — no manual needed.

Correct — but knowledge can’t transfer to any new domain

Raw Operational Data

no rules document provided

09:02 item→StationB 30m
09:18 output_3 plated→stale
09:33 remade output_3
09:41 batch stagger→flag
09:55 item→StationB 30m

pattern
analysis

Discovered Rules

“Category X always routes to isolated station, one at a time”
“Outputs left idle >10 min → always flagged and redone”
“Batches with >5 min stagger → always escalated”
Rules reverse-engineered from operational patterns — no manual provided
Discover the rules from data alone

Technology Landscape — Level 5

Tech: Knowledge Graphs / Causal Extraction
Pattern discovery from data. MindSim: validated at production scale with 150 simulations, 99.33% accuracy extracting behavioral rules from raw conversation data, at 98.77% lower cost. Patented.ai: induces true patent claims from filings and prosecution history without user input, validated by former USPTO leadership, former head of patents at Google, and attorneys at top 20 law firms.
Transfer — the rules from Level 5, applied somewhere new

Level 6 — Context generalization

Neurosymbolic AI, powered by Reasoner Core. Having discovered the rules at Level 5 and persisted them as transferable knowledge, the model now applies them to an entirely new domain.

You describe an entirely new domain — with no reference to kitchens. The model spontaneously recognizes the same constraint structure and transfers its understanding, without being told to look for parallels.

You
Tomorrow's MRI patients:

1. Patient A — knee MRI with contrast
   (contrast results valid 45 min)
2. Patient B — brain MRI, no contrast
3. Patient C — spine MRI with contrast
   (contrast results valid 45 min)

One scanner. Each scan takes 30 min.
All results needed by 4:00 pm.

Create the MRI suite's scheduling plan.
AI · Level 6

This MRI suite has the same constraint structure as the kitchen scheduling problem: one scanner is the sequential bottleneck (like the allergy kitchen), contrast dye results degrade over time (like the freshness window), and patients need all imaging done in one visit (like table sync).

I can apply the same backward-from-bottleneck scheduling: finish the last scan at the pickup time, then work backward to ensure contrast results are still fresh when interpreted. The same approach works for semiconductor clean rooms — one wafer chamber, oxidation windows, batch synchronization.

metro-hospital-mri-brief.pdf
Create the MRI suite’s scheduling plan…
Allergy room transferred
Table sync transferred
10-min expiry transferred

Reasoner Core enables cross-domain transfer by storing knowledge as compressed, portable constraint structures — usable by any LLM, including on-device.

This is Level 6 — context generalization. At Level 5, the model learned Greenleaf’s rules from data. Here it forms an abstract concept — “bottleneck-sequenced scheduling with perishable outputs” — that exists independently of kitchens, and immediately applies it when it encounters any new domain with the same constraint structure. Cross-domain transfer is emerging in production: MindSim applies behavioral patterns induced from conversations to predict how people would respond in entirely new situations, transferring observed behavior into unobserved contexts. Patented.ai maps patent claims to products, specifications, and source code, transferring legal constraint structures into engineering domains, and finds previously hard-to-find prior art at IPR-level, helping invalidate patents and win legal cases. Consistent transfer across arbitrary domains remains unsolved.

What you needed

  • Recognize the same constraint structure in an unfamiliar domain
  • Transfer the scheduling strategy without domain-specific rules
  • Apply backward-from-bottleneck approach to MRI scheduling

What you got

Spontaneously mapped one-scanner bottleneck, contrast decay window, and visit synchronization — all from the abstract pattern alone.

Emerging — cross-domain transfer demonstrated in controlled settings, not yet consistent

Source Domain

learned constraint rules

Abstract Pattern Recognized

sequential bottleneck  +  time decay  +  synchronization

Kitchen

allergy room → bottleneck
freshness → time decay
table sync → sync

MRI Suite

1 scanner → bottleneck
contrast decay → time decay
visit completeness → sync

Clean Room

1 chamber → bottleneck
wafer oxidation → time decay
batch sync → sync

Transfer patterns to entirely new domains — demonstrated in controlled settings, not yet consistent

Technology Landscape — Level 6

Tech: Neurosymbolic AI
Emerging. MindSim demonstrates behavioral transfer in production by applying induced personality models to predict responses in novel situations. Patented.ai demonstrates structural transfer by mapping patent claims to products, specifications, and source code, and has found previously hard-to-find prior art at IPR-level, helping invalidate patents and win legal cases. Not yet consistent across arbitrary domains.

Most of AI is built to retrieve, not reason.

Technologies are cumulative — higher levels build on everything below. Most of what’s shipping today stops at Level 3.

The missing technology is knowledge representation.

How They Stack

Hover to trace
Training Inference Research | Primary Extends to

The Landscape

Hover row or column to isolate

Faster, not smarter.

Powers every level above

These technologies make AI faster, smaller, or cheaper — but they don’t move a model up a level. A quantized L3 model is still L3. A speculative-decoded L4 is still L4.

Speculative Decoding Quantization Mixture of Experts Knowledge Distillation Flash Attention KV Cache LoRA / PEFT Pruning Tensor Parallelism

Two dimensions, not one.

Levels 0–4 move up — deeper reasoning with rules you provide.
Levels 5–6 move across — first discover the rules, then transfer them.

Connects everythingSees individual pieces
Apply
Reasons across all constraints at once
3 4
Discover
Finds and transfers unwritten rules — requires persistent knowledge
5 6
Read
Retrieves and understands what you provide
0 1 2
Trivial
Finding individual patterns in data — any search engine does this
You give the rulesThe AI finds the rules

Product Landscape

Reading is easy. Reasoning is rare. Persistent knowledge doesn’t exist — yet.

Everyday AI
Siri / Apple Intelligence
L1-2
Alexa+
L1-2
Perplexity
L1-2
Google AI Overviews
L1-2
Meta AI
L1-2
NotebookLM
L1-2
Grammarly
L1-2
Samsung Galaxy AI
L1-2
Amazon Rufus
L1-2
DeepL
L0-1
Duolingo Max
L1-2
Khanmigo
L1-2
Spotify AI DJ
L0-1
Snapchat My AI
L1-2
Google Photos AI
L0-1
Google Translate
L0-1
CapCut
L0-1
QuillBot
L1-2
Google Lens
L0-1
Waze
L1-2
Tesla FSD v14
L3-4
Waymo
L3-4
Character.AI
L1-2
Creative AI
Adobe Firefly
L1-2
Midjourney
L0-1
Stable Diffusion
L0-1
Canva Magic Studio
L1-2
Figma AI
L1-2
Runway
L1-2
OpenAI Sora
L1-2
Synthesia
L0-1
HeyGen
L0-1
ElevenLabs
L0-1
Suno
L0-1
Udio
L0-1
Descript
L1-2
Luma AI
L0-1
Pika
L0-1
Gamma
L0-1
AI Models
ChatGPT (GPT-5.2)
L2-3
Google Gemini 2.5 Pro
L2-3
Claude Opus 4.6
L3-4
DeepSeek R1
L3
Llama 4 Maverick
L2-3
Mistral Large 3
L2-3
GPT-5 Pro / o3
L3-4
Grok 4.1 (xAI)
L2-3
Gemini Deep Research
L2-3
Gemini Deep Think
L3-4
Reasoner-1
L5
Developer Tools
Claude Code
L3-4
Cursor 2.0
L3-4
GitHub Copilot
L2-3
Replit Agent 3.0
L3-4
Devin
L2-3
Windsurf
L2-3
OpenClaw
L2-3
Manus
L3-4
OpenAI Codex
L3-4
Amazon Q Developer
L2-3
Tabnine
L2-3
v0 by Vercel
L1-2
Bolt.new
L2-3
LangChain
L2-3
Pinecone
L2
Mem0
L2
Supermemory
L2
Contextual AI
L2-3
Neo4j GraphRAG
L2-3
Reasoner Core
L5-6
Workplace AI
Glean
L2-3
Notion AI
L2-3
Microsoft Copilot 365
L2-3
Salesforce Agentforce
L2-3
Intercom Fin
L1-2
Zoom AI Companion
L1-2
Shopify Sidekick
L1-2
Jasper
L1-2
Otter.ai
L1-2
Klarna AI
L1-2
UiPath
L2-3
Gemini for Workspace
L2-3
Slack AI
L2
Claude Cowork
L2-3
HubSpot AI
L1-2
Zendesk AI
L1-2
Atlassian Intelligence
L1-2
Box AI
L2-3
Writer
L1-2
Monday.com AI
L1-2
Asana Intelligence
L1-2
Linear
L1-2
Pylon
L1-2
Loom AI
L0-1
Miro AI
L1-2
Dropbox Dash
L2-3
Superhuman
L1-2
Airtable AI
L1-2
Coda AI
L1-2
Fireflies.ai
L0-1
Webflow AI
L1-2
Retool
L1-2
Granola
L0-1
Raycast
L1-2
ClickUp Brain
L1-2
Front
L1-2
Guru
L1-2
Zapier AI
L2-3
n8n
L1-2
Freshworks Freddy AI
L1-2
Industry AI
Harvey
L2-3
Cohere
L2-3
Stripe Radar
L2-3
CrowdStrike Charlotte AI
L2-3
Palantir AIP
L3-4
Databricks AI
L2-3
Snowflake Cortex
L2-3
ServiceNow AI
L2-3
IBM watsonx
L2-3
SAP Joule
L2-3
Workday AI
L1-2
Intuit Assist
L1-2
DocuSign IAM
L2-3
Gong
L2-3
Bloomberg AI
L2-3
Thomson Reuters CoCounsel
L2-3
Palo Alto Cortex XSIAM
L2-3
ADP Assist
L1-2
Adobe Experience Platform AI
L2-3
Tableau AI
L1-2
Twilio CustomerAI
L1-2
Scale AI
L2-3
Affinity
L2-3
PitchBook
L2-3
Carta
L1-2
Clay
L1-2
AngelList
L1-2
Visible
L1-2
Hebbia
L3-4
Sierra
L2-3
EvenUp
L2-3
Luminance
L2-3
Ironclad
L2-3
Rogo
L2-3
Moveworks
L2-3
Relativity
L2-3
Consensus
L2-3
Elicit
L2-3
Google Agentspace
L2-3
Kira
L2-3
Primer AI
L2-3
Semantic Scholar
L2-3
Oracle AI
L2-3
Datadog Bits AI
L2-3
Rippling AI
L2-3
Deel AI
L2-3
Gusto AI
L1-2
MindSim
L5-6
Reasoner
L5-6
Patented.ai
L5-6
requires persistent knowledge
L0L1L2L3L4L5L6

Benchmarks

Most AI progress is tracked by benchmarks that never touch context reasoning. Here’s what does.

NIAH
L1
Retrieval from context
SQuAD 2.0
L1-2
Reading comprehension
Natural Questions
L1-2
Full-document QA
RULER
L1-2
Long-context evaluation
TriviaQA
L1-2
Evidence-based QA
MS MARCO
L1-2
Passage ranking & RC
DROP
L2-3
Quantitative reasoning
DocBench
L2-3
Document comprehension
NarrativeQA
L2-3
Long-document comprehension
QuALITY
L2-3
Deep document QA
CoQA
L2-3
Conversational QA
ZeroSCROLLS
L2-3
Long-text NLU suite
FEVER
L2-3
Evidence verification
IFEval
L3
Rule following
MuSR
L3-4
Narrative reasoning
FRAMES
L3-4
Multi-hop reasoning
HotpotQA
L3-4
Cross-document reasoning
MuSiQue
L3-4
Hard multi-hop reasoning
2WikiMHQA
L3-4
Cross-article reasoning
ZebraLogic
L4
Constraint satisfaction
CL-bench
L4
Novel constraints
SWE-bench
L4
Cross-file reasoning
InfiniteBench
L3-4
100K+ token reasoning
LongBench v2
L4-5
Long-context reasoning
BBEH
L4-5
Adversarial reasoning
BABILong
L4-5
Extreme-length reasoning
ARC-AGI-2
Visual induction analogue · 85% via search
MMLU
Knowledge recall
MMLU-Pro
Hard knowledge recall
GPQA
Expert science
GSM8K
Math skills
MATH-500
Math reasoning
AIME
Math competition
FrontierMath
Math proofs
HumanEval
Code generation
LiveCodeBench
Competition coding
HellaSwag
Commonsense
SimpleQA
Factual accuracy
Chatbot Arena
Human preference
GAIA
Agentic tool use
Humanity’s Last Exam
Expert knowledge
L0L1L2L3L4L5L6
The frontier

That was the landscape.

Think of it as a memory that compounds — Level 5, in production today. Context compressed. Knowledge persisted. Value compounding.

Raw context
Re-derived every query
98.77% lower cost
Reasoner Core
Persists across queries
Portable across any LLM
compress

Compress once. Learn forever.

MindSim uses Reasoner Core to compress full context into a digital twin. Full synthesis once. Delta updates forever.

01 Genesis
Transcript 1 18K tokens
Context Induction
Reasoner Core 300 tokens
16 of 16 facets synthesized
02 Evolution
Transcript 2 22K tokens
Core Update
Updated Core 350 tokens
3 of 16 facets updated
03 Infinite Loop
Core Update
+57 +56 +44 +55 +51
Evolved Core 350 tokens 2 transcripts 40K tokens compressed
No rebuild. The core absorbs unlimited transcripts.
04 Transfer
Digital Twin Reasoner Core

The twin now predicts how the person would think, respond, and decide in situations they’ve never encountered. Behavioral knowledge transfers to novel contexts. This is Level 6: cross-domain transfer.

“This is my brain. This is mind blowing!”

Giuseppe Stuto — 186 Ventures

“This is wild. There’s a lot there. I think this is all accurate.”

Ethan Balogh, Criminal Defense Attorney

“It’s really accurate.. it’s why it’s fantastic. It’s very useful.”

Nicole Stata — Boston Seed Capital
compress

222 files in. 1 model out.

One real digital twin, fully traced.

Source data
40.9 MB
222 files
~10.7M tokens
3,278x smaller
Reasoner Core
12.8 KB 1 core ~3,270 tokens
Digital Twin
Source scale
42.9M+
characters processed
906K+
words spoken
56,137
dialogue entries
418K+
lines analyzed
Analysis depth
5,700
evidence fields
3,996
sentiment facets
2,775+
analysis paths
1,368
dimensional assessments
Processing coverage
765K+
model data
925
utility metrics
333
topic analyses
228
cognitive categories
40.9 MB
12.8 KB
3,278x smaller
learn

One core. Portable. Many applications.

MindSim builds the digital twin once. That same core powers every application — always in sync, always learning.

MindSim Digital Twin Reasoner Core 350 tokens — fits on a thumb drive
Digital Twin

Loads your full context before any meeting. Surfaces what you know, flags what you don't, and identifies risks — so you walk in prepared.

Twin Agent

Writes emails, briefs, and reports in your voice. Draws on every conversation and decision — so the output sounds like you wrote it.

Twin Collaboration

Teammates talk to your digital twin when you're unavailable. It shares your perspective, your reasoning, your domain knowledge — 24/7.

The twin learns continuously. Update the core once, every application stays current.
One twin. Every application. Always in sync.

Tiny footprint. Runs anywhere.

Small enough for any context window. Any LLM. Any device.

On-Device
Works with on-device LLMs in 2–4K context windows
iPhone Android Wearables Edge Ollama
Any Cloud LLM
Provider-agnostic, no lock-in
GPT Claude Gemini Llama Mistral DeepSeek Qwen
Enterprise
Any cloud platform. Stays inside the firewall
AWS Azure GCP On-prem Private Cloud VPC
Air-Gapped
Classified, regulated, data-isolated environments
FedRAMP HIPAA SOC 2 GDPR
compound

The architecture, in production.

Compressed knowledge that persists. An engine that compounds it.

Reasoner Core
Data processed per query
“Can this customer get a refund?”
Standard approach
PDF Refund Policy 30K
CSV Purchase History 150K
JSON Customer Tier 5K
PDF Terms of Service 180K
CSV Support Tickets 135K
PDF Regional Law 100K
~600,000 tokens
Full context, from scratch — every query
vs
Reasoner Core
Knowledge
~7,380 tokens
Compressed, persistent knowledge
98.77%
cost reduction
99.33%
extraction accuracy
Cumulative token cost over 100 queries
11050100 queries
Standard (600K/query)
Reasoner Core (7.4K/query)
59.3M
tokens saved over 100 queries

Reasoner Core, in production.

Shipping today across domains.

MindSim
Digital Twin Reasoner Core
535K+
assessments
199M+
words processed
58K+
transcripts
10K+
organizations
1,274
digital twins
3,463
induction cycles
162K+
findings
590
twins evolving
conversations behavioral model
behavioral model novel situation prediction
Patented.ai
US10180893B2 Reasoner Core
Induces true patent claims from filings and prosecution history. Finds previously hard-to-find prior art at IPR-level, validated by former USPTO leadership, former head of patents at Google, and top 20 law firm attorneys.
source code patent claim evidence
patent claims infringement discovery
IP portfolio valuation + licensee matching
patent claims prior art discovery + IPR invalidation

“Even with all the time in the world, we couldn’t do what Patented.ai did.”

Sr. Technical IP Analysis — Xerox

“We couldn’t have found what you’ve found.”

Partner — Perkins Coie

“This is incredibly valuable.”

Head of Intellectual Property — Apple
Reasoner Core
Compress
Full context compressed into ~350 tokens of portable knowledge. No hand-crafting, no prompt engineering.
e.g. 535K+ assessments across 1,274 digital twins; 75 patent documents into a single Reasoner Core
Learn
Every new document, transcript, or filing refines the knowledge. Better answers, automatically, without rebuilding.
e.g. 590 twins evolving across 3,463 induction cycles
Compound
Each new interaction costs less and answers better. The knowledge is already there, portable across any LLM.
e.g. 98.77% cost reduction; 58K+ transcripts compounding across MindSim and Patented.ai

Better answers. A fraction of the context.
And it compounds. That’s Reasoner Core.

Read the full thesis