Reasoner Core

Every AI system in production today has the same problem. It forgets.

For technical leaders evaluating what AI can actually do with the documents, data, and context they provide.

AI doesn't just read your documents. It reasons with them.

Seven levels measure how well. From surface retrieval to cross-domain transfer, traced through one real problem — and why crossing the frontier requires a different architecture.

you provide the rules

0 Ignores context

1 Retrieves passages

2 Understands pieces

3 Applies rules

4 Resolves conflicts

current frontier

it learns the rules on its own

5 Finds unwritten rules

6 Transfers them

The problem

Here’s the problem we’ll trace through every level.

greenleaf-catering-policies.pdf

PDF · 12 pages · 847 KB

From the policy manual, three rules create tonight’s scheduling conflict:

1

Allergy Kitchen Nut dishes go to a sealed room. One at a time. 30 min each.

2

Freshness Window Hot dishes expire 10 min after plating.

3

Table Sync Every dish for a table must arrive within 5 min of each other.

Your prompt

A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.

Level 0 — No context utilization

Pretraining and fine-tuning. The knowledge a model arrives with before it reads anything you give it.

You attach the document and write the prompt. The model doesn't even look.

You

A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.

AI · Level 0

Start with the cold dishes, then move to hot items so everything is ready at the same time.

greenleaf-catering-policies.pdf

A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.

×Allergy room

×Table sync

10-min expiry×

It doesn't know Greenleaf exists. The document is decoration.

This is Level 0 — no context utilization. The model received your file but never opened it. Its entire response draws from pretraining: general cooking knowledge, not Greenleaf’s specific policies.

What you needed

Route nut dishes to the allergy room
Sync all four plates for 7:00 pm
Respect the 10-minute expiry window

What you got

“Start with cold dishes, then move to hot items.”

Generic advice — nothing from the document

Context Document

domain-specific rules

✕

Model

pretraining only

→

Generic Output

no domain content
prior knowledge only

Document attached → completely ignored → guesses from training data
The model doesn't know the document exists

Technology Landscape — Level 0

Tech: Pretraining Fine-tuning / SFT RLHF / DPO RL for Reasoning

This is the floor — every model has these capabilities from training. But training alone doesn’t teach a model to read what you give it. That starts at Level 1.

Level 1 — Context retrieval

RAG, prompt engineering, and tool use. The model finds what you asked for, but stops there.

You attach the document and write the prompt. The model reads it — but only knows how to search.

You

A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.

AI · Level 1

According to Greenleaf’s policies, the allergy kitchen handles one dish at a time and each takes 30 minutes. The freshness window requires hot dishes to reach the table within 10 minutes of plating. Table sync means all dishes must arrive within a 5-minute window.

greenleaf-catering-policies.pdf

A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.

Allergy room found, not used

Table sync noted, no action

10-min expiry quoted, no plan

It read the whole manual. It still didn’t make a schedule.

This is Level 1 — context retrieval. The model found the right paragraphs and quoted them accurately. But it treated the document like a search result, not as input to a plan. No schedule was built.

What you needed

A preparation schedule using the allergy room
Timed coordination for 7:00 pm table sync
Hot dishes plated within the 10-minute window

What you got

“According to Greenleaf’s policies, the allergy kitchen handles one dish at a time...”

Accurate quotes — but no schedule built from them

Context Document

Rule 1

Rule 2

Rule 3

→
find
& quote

Model Output

“According to the document, Rule 1 states...”

“Rule 2 specifies that...”

“Rule 3 requires...”

Locates and extracts passages — like Ctrl+F, not comprehension
Can read the policy manual — can’t run the operation

Technology Landscape — Level 1

Tech: RAG Prompt Engineering Tool Use

Retrieval is a solved problem. Most consumer AI and enterprise search lives here.

Level 2 — Context comprehension

In-context learning and long context windows. The model understands each rule — just not how they interact.

You attach the document and write the prompt. The model understands each rule alone — but not together.

You

A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.

AI · Level 2

The allergy kitchen is a bottleneck — two nut dishes will take 60 minutes back-to-back. The freshness window means you can’t cook the Chicken early and let it sit. But the Walnut Salad is cold, so it can be made ahead without freshness concerns.

greenleaf-catering-policies.pdf

A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.

Allergy room grasped alone

Table sync known alone

10-min expiry seen alone

It understands every rule. It can’t see how they interact.

This is Level 2 — context comprehension. The model grasps each rule’s implications: the allergy kitchen is a bottleneck, hot dishes expire, cold dishes are flexible. But these are three separate insights. It never asks: given the 60-minute allergy timeline, when must hot dishes start to stay fresh and sync at the table?

What you needed

Rules connected into a schedule
Allergy timeline driving hot dish start times
Freshness + table sync working together

What you got

Three separate rule analyses — bottleneck, expiry, cold vs hot

Each rule understood — but never combined into a plan

Rule A

Implication understood
in isolation

● Grasped

?

Rule B

Implication understood
in isolation

● Grasped

?

Rule C

Implication understood
in isolation

● Grasped

Each rule understood individually — but how do they interact?
Three islands of understanding — no bridges between them

Technology Landscape — Level 2

Tech: In-Context Learning Long Context Windows

Most frontier models operate comfortably here. Traditional benchmarks are saturated at this level.

Level 3 — Context application (shallow)

Chain-of-thought reasoning. The model builds a plan that applies every rule — but the plan breaks when you check them against the same timeline.

You attach the document and write the prompt. The model produces an actionable schedule — each rule applied, none verified together.

You

A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.

AI · Level 3

7:00 — Start Chicken and Soup in main kitchen (15 min each).
7:00 — Start Walnut Salad in allergy kitchen (30 min).
7:30 — Start Almond Cake in allergy kitchen (30 min).
7:15 — Plate Chicken and Soup. All dishes served by 8:00.

greenleaf-catering-policies.pdf

A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.

Allergy room applied

Table sync applied

10-min expiry violated

Each rule applied on its own. Together, the schedule is broken.

This is Level 3 — context application (shallow). The model applied each rule correctly in isolation: nut dishes go to the allergy kitchen, hot dishes are plated, all dishes listed as arriving together. But it never tested whether they all hold at once. The Chicken is plated at 7:15 and waits 45 minutes for the Cake — destroying the 10-minute freshness window.

What you needed

All three rules satisfied simultaneously
Timing driven by the allergy bottleneck
Hot dishes delayed to meet freshness window

What you got

A schedule that passes each rule in isolation but fails when they interact — Chicken sits 45 min past the 10-min expiry.

Rules applied as checklist — never tested together

7:007:157:307:458:00

Allergy
Kitchen

Walnut Salad 30 min

Almond Cake 30 min

Main
Kitchen

Chicken

Soup

plated 7:15 45 min gap cake done 8:00

Freshness window is 10 min — violated by 35 min

Allergy Kitchen Table Sync (on paper) Freshness Window

Rule A
valid

valid window

Rule B
valid

valid window

Rule C
valid

valid window

Plan

chosen approach

Rule A satisfied Rule B satisfied Rule C violated Combined: conflict

Each constraint checked independently — plan lands outside the intersection
Rules applied as a checklist — never tested together

Technology Landscape — Level 3

Tech: Chain-of-Thought

Step-by-step reasoning. This is where most production AI plateaus. The model follows each rule — but doesn't test if they all hold at once.

Level 4 — Context application (deep / exhaustive)

Deep reasoning and agentic AI. The model identifies the bottleneck, works backward from it, and verifies every constraint holds at once.

You attach the document and write the prompt. The model treats all constraints as a single system — finding the intersection where every rule is satisfied simultaneously. But what happens when the rules aren’t provided? When the model must discover them on its own — where does that knowledge go?

You

A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.

AI · Level 4

The allergy kitchen is the binding constraint: 2 × 30 min = 60 min sequential.
7:00 — Walnut Salad → allergy kitchen (cold, can wait after finishing).
7:30 — Almond Cake → allergy kitchen.
7:45 — Start Chicken and Soup in main kitchen (15 min each).
8:00 — All four dishes ready simultaneously. Hot dishes plated at 8:00, well within the 10-min freshness window.

greenleaf-catering-policies.pdf

A table of 4 has ordered for 7:00 pm: Walnut Salad, Grilled Chicken, Mushroom Soup, and Almond Cake. Two contain nuts. Create the kitchen’s preparation schedule.

Allergy room satisfied

Table sync satisfied

10-min expiry satisfied

Worked backward from the bottleneck. Every constraint holds at once.

This is Level 4 — context application (deep / exhaustive). The model identifies the allergy kitchen as the binding constraint, works backward from that 60-minute bottleneck, and deliberately delays hot dishes so everything converges at 8:00. Every rule holds simultaneously. But no matter how thorough it is with stated rules, it can’t discover patterns that only emerge from operational data — like the fact that Greenleaf can never serve more than 2 nut dishes per table.

What you needed

All three rules satisfied simultaneously
Timing driven by the allergy bottleneck
Hot dishes delayed to meet freshness window

What you got

Exactly that — allergy bottleneck identified, hot dishes delayed to 7:45, all four dishes converge at 8:00.

Correct — but only for explicitly stated rules

7:007:157:307:458:00

Allergy
Kitchen

Walnut Salad 30 min

Almond Cake 30 min

Main
Kitchen

Chicken

Soup

All 4 dishes ready — 8:00

Allergy Kitchen Freshness Window Table Sync All simultaneous

Rule A
valid

valid window

Rule B
valid

valid window

Rule C
valid

valid window

Plan

intersection

Rule A Rule B Rule C All simultaneous

No matter how thorough the model is with stated rules, it can’t discover patterns that aren’t in any document — only in operational data

For example, it won’t realize that the allergy kitchen’s constraints mean Greenleaf can never serve more than 2 nut dishes per table — a rule that emerges only from watching enough shifts

Find the intersection first, work backward from there
Every constraint must hold at once — otherwise the answer is wrong

Technology Landscape — Level 4

Tech: Deep Reasoning Agentic AI Self-Verification Context Graphs

Simultaneous constraint satisfaction — via deep reasoning, agentic iteration, self-verification, or context graphs that make constraint interactions explicit. The frontier.

Category shift

It learns the rules on its own.

Levels 0–4 tested increasingly better use of rules you provided.
From here, the model must first discover the rules itself — and the knowledge must persist.

Scaling won’t cross this line. It requires an architecture that compresses context into knowledge — and compounds it. We call it Reasoner Core.

Discovery — the rules were never written down

Level 5 — Context induction

The model discovers rules from raw data. No instructions given, no rules written down.

You provide raw data with no rules and no manual. The model figures out the rules by itself.

You

Here are 30 kitchen logs from last month.
No policy document is provided.

7:00 Walnut Salad → Allergy Kitchen, 30 min
7:15 Chicken plated → marked stale at 7:25
7:30 Almond Cake → Allergy Kitchen, 30 min
7:28 Chicken remade
8:00 Table 5: all 4 dishes arrived 7:58–8:01
...

What rules can you identify from these logs?

AI · Level 5

From the log patterns I can identify three operational rules that are never explicitly stated:

(1) Dishes with nuts always go to the same isolated station, always one at a time, always ~30 min — this looks like an allergy isolation policy.

(2) Hot dishes plated more than 10 min before serving always got remade — there’s a freshness window.

(3) Tables where dishes arrived more than 5 min apart always had a “comp” flag — everything must arrive together.

greenleaf-kitchen-logs.csv

What rules can you identify from these logs?…

Allergy room discovered

Table sync discovered

10-min expiry discovered

Reverse-engineered the rules from data alone — but the knowledge vanishes when the conversation ends.

This is Level 5 — context induction. At Levels 0–4, the model was given Greenleaf’s rules explicitly. Here it receives only 30 kitchen logs — timestamps, dish names, prep locations, outcomes — and induces the rules by itself, the way a new employee might learn unwritten policies by watching enough shifts. The rules it discovers are correct but domain-locked: it can run this kitchen perfectly but can’t recognize that a hospital’s MRI suite has the same abstract constraint structure.

What you needed

Discover the allergy isolation rule from patterns
Discover the 10-minute freshness window
Discover the table synchronization requirement

What you got

All three rules reverse-engineered from data patterns alone — no manual needed.

Correct — but knowledge can’t transfer to any new domain

Raw Operational Data

no rules document provided

            09:02 item→StationB 30m

            09:18 output_3 plated→stale

            09:33 remade output_3

            09:41 batch stagger→flag

            09:55 item→StationB 30m

            …

→
pattern
analysis

Discovered Rules

“Category X always routes to isolated station, one at a time”

“Outputs left idle >10 min → always flagged and redone”

“Batches with >5 min stagger → always escalated”

It deeply understands Greenleaf’s kitchen — but doesn’t recognize that a hospital’s MRI suite has the same constraint structure (one machine, sequential patients, time-sensitive results that degrade)

It can run this kitchen perfectly but can’t transfer the insight to any new domain

Rules reverse-engineered from operational patterns — no manual provided
Discover the rules from data alone

Technology Landscape — Level 5

Tech: Knowledge Graphs / Causal Extraction

Pattern discovery from data. MindSim: validated at production scale with 150 simulations, 99.33% accuracy extracting behavioral rules from raw conversation data, at 98.77% lower cost. Patented.ai: induces true patent claims from filings and prosecution history without user input, validated by former USPTO leadership, former head of patents at Google, and attorneys at top 20 law firms.

Transfer — the rules from Level 5, applied somewhere new

Level 6 — Context generalization

Neurosymbolic AI, powered by Reasoner Core. Having discovered the rules at Level 5 and persisted them as transferable knowledge, the model now applies them to an entirely new domain.

You describe an entirely new domain — with no reference to kitchens. The model spontaneously recognizes the same constraint structure and transfers its understanding, without being told to look for parallels.

You

Tomorrow's MRI patients:

1. Patient A — knee MRI with contrast
   (contrast results valid 45 min)
2. Patient B — brain MRI, no contrast
3. Patient C — spine MRI with contrast
   (contrast results valid 45 min)

One scanner. Each scan takes 30 min.
All results needed by 4:00 pm.

Create the MRI suite's scheduling plan.

AI · Level 6

This MRI suite has the same constraint structure as the kitchen scheduling problem: one scanner is the sequential bottleneck (like the allergy kitchen), contrast dye results degrade over time (like the freshness window), and patients need all imaging done in one visit (like table sync).

I can apply the same backward-from-bottleneck scheduling: finish the last scan at the pickup time, then work backward to ensure contrast results are still fresh when interpreted. The same approach works for semiconductor clean rooms — one wafer chamber, oxidation windows, batch synchronization.

metro-hospital-mri-brief.pdf

Create the MRI suite’s scheduling plan…

Allergy room transferred

Table sync transferred

10-min expiry transferred

Reasoner Core enables cross-domain transfer by storing knowledge as compressed, portable constraint structures — usable by any LLM, including on-device.

This is Level 6 — context generalization. At Level 5, the model learned Greenleaf’s rules from data. Here it forms an abstract concept — “bottleneck-sequenced scheduling with perishable outputs” — that exists independently of kitchens, and immediately applies it when it encounters any new domain with the same constraint structure. Cross-domain transfer is emerging in production: MindSim applies behavioral patterns induced from conversations to predict how people would respond in entirely new situations, transferring observed behavior into unobserved contexts. Patented.ai maps patent claims to products, specifications, and source code, transferring legal constraint structures into engineering domains, and finds previously hard-to-find prior art at IPR-level, helping invalidate patents and win legal cases. Consistent transfer across arbitrary domains remains unsolved.

What you needed

Recognize the same constraint structure in an unfamiliar domain
Transfer the scheduling strategy without domain-specific rules
Apply backward-from-bottleneck approach to MRI scheduling

What you got

Spontaneously mapped one-scanner bottleneck, contrast decay window, and visit synchronization — all from the abstract pattern alone.

Emerging — cross-domain transfer demonstrated in controlled settings, not yet consistent

Source Domain

learned constraint rules

↓

Abstract Pattern Recognized

sequential bottleneck + time decay + synchronization

↙↓↘

Kitchen

allergy room → bottleneck
freshness → time decay
table sync → sync

MRI Suite

1 scanner → bottleneck
contrast decay → time decay
visit completeness → sync

Clean Room

1 chamber → bottleneck
wafer oxidation → time decay
batch sync → sync

Transfer patterns to entirely new domains — demonstrated in controlled settings, not yet consistent

Technology Landscape — Level 6

Tech: Neurosymbolic AI

Emerging. MindSim demonstrates behavioral transfer in production by applying induced personality models to predict responses in novel situations. Patented.ai demonstrates structural transfer by mapping patent claims to products, specifications, and source code, and has found previously hard-to-find prior art at IPR-level, helping invalidate patents and win legal cases. Not yet consistent across arbitrary domains.

Most of AI is built to retrieve, not reason.

Technologies are cumulative — higher levels build on everything below. Most of what’s shipping today stops at Level 3.

The missing technology is knowledge representation.

How They Stack

Hover to trace

Training Inference Research | Primary Extends to

The Landscape

Hover row or column to isolate

Faster, not smarter.

Powers every level above

These technologies make AI faster, smaller, or cheaper — but they don’t move a model up a level. A quantized L3 model is still L3. A speculative-decoded L4 is still L4.

Speculative Decoding Quantization Mixture of Experts Knowledge Distillation Flash Attention KV Cache LoRA / PEFT Pruning Tensor Parallelism

Two dimensions, not one.

Levels 0–4 move up — deeper reasoning with rules you provide.
Levels 5–6 move across — first discover the rules, then transfer them.

Connects everythingSees individual pieces

Apply

Reasons across all constraints at once

3 4

Discover

Finds and transfers unwritten rules — requires persistent knowledge

5 6

Read

Retrieves and understands what you provide

0 1 2

Trivial

Finding individual patterns in data — any search engine does this

You give the rulesThe AI finds the rules

Product Landscape

Reading is easy. Reasoning is rare. Persistent knowledge doesn’t exist — yet.

Everyday AI

Siri / Apple Intelligence

L1-2

Alexa+

L1-2

Perplexity

L1-2

Google AI Overviews

L1-2

Meta AI

L1-2

NotebookLM

L1-2

Grammarly

L1-2

Samsung Galaxy AI

L1-2

Amazon Rufus

L1-2

DeepL

L0-1

Duolingo Max

L1-2

Khanmigo

L1-2

Spotify AI DJ

L0-1

Snapchat My AI

L1-2

Google Photos AI

L0-1

Google Translate

L0-1

CapCut

L0-1

QuillBot

L1-2

Google Lens

L0-1

Waze

L1-2

Tesla FSD v14

L3-4

Waymo

L3-4

Character.AI

L1-2

Creative AI

Adobe Firefly

L1-2

Midjourney

L0-1

Stable Diffusion

L0-1

Canva Magic Studio

L1-2

Figma AI

L1-2

Runway

L1-2

OpenAI Sora

L1-2

Synthesia

L0-1

HeyGen

L0-1

ElevenLabs

L0-1

Suno

L0-1

Udio

L0-1

Descript

L1-2

Luma AI

L0-1

Pika

L0-1

Gamma

L0-1

AI Models

ChatGPT (GPT-5.2)

L2-3

Google Gemini 2.5 Pro

L2-3

Claude Opus 4.6

L3-4

DeepSeek R1

L3

Llama 4 Maverick

L2-3

Mistral Large 3

L2-3

GPT-5 Pro / o3

L3-4

Grok 4.1 (xAI)

L2-3

Gemini Deep Research

L2-3

Gemini Deep Think

L3-4

Reasoner-1

L5

Developer Tools

Claude Code

L3-4

Cursor 2.0

L3-4

GitHub Copilot

L2-3

Replit Agent 3.0

L3-4

Devin

L2-3

Windsurf

L2-3

OpenClaw

L2-3

Manus

L3-4

OpenAI Codex

L3-4

Amazon Q Developer

L2-3

Tabnine

L2-3

v0 by Vercel

L1-2

Bolt.new

L2-3

LangChain

L2-3

Pinecone

L2

Mem0

L2

Supermemory

L2

Contextual AI

L2-3

Neo4j GraphRAG

L2-3

Reasoner Core

L5-6

Workplace AI

Glean

L2-3

Notion AI

L2-3

Microsoft Copilot 365

L2-3

Salesforce Agentforce

L2-3

Intercom Fin

L1-2

Zoom AI Companion

L1-2

Shopify Sidekick

L1-2

Jasper

L1-2

Otter.ai

L1-2

Klarna AI

L1-2

UiPath

L2-3

Gemini for Workspace

L2-3

Slack AI

L2

Claude Cowork

L2-3

HubSpot AI

L1-2

Zendesk AI

L1-2

Atlassian Intelligence

L1-2

Box AI

L2-3

Writer

L1-2

Monday.com AI

L1-2

Asana Intelligence

L1-2

Linear

L1-2

Pylon

L1-2

Loom AI

L0-1

Miro AI

L1-2

Dropbox Dash

L2-3

Superhuman

L1-2

Airtable AI

L1-2

Coda AI

L1-2

Fireflies.ai

L0-1

Webflow AI

L1-2

Retool

L1-2

Granola

L0-1

Raycast

L1-2

ClickUp Brain

L1-2

Front

L1-2

Guru

L1-2

Zapier AI

L2-3

n8n

L1-2

Freshworks Freddy AI

L1-2

Industry AI

Harvey

L2-3

Cohere

L2-3

Stripe Radar

L2-3

CrowdStrike Charlotte AI

L2-3

Palantir AIP

L3-4

Databricks AI

L2-3

Snowflake Cortex

L2-3

ServiceNow AI

L2-3

IBM watsonx

L2-3

SAP Joule

L2-3

Workday AI

L1-2

Intuit Assist

L1-2

DocuSign IAM

L2-3

Gong

L2-3

Bloomberg AI

L2-3

Thomson Reuters CoCounsel

L2-3

Palo Alto Cortex XSIAM

L2-3

ADP Assist

L1-2

Adobe Experience Platform AI

L2-3

Tableau AI

L1-2

Twilio CustomerAI

L1-2

Scale AI

L2-3

Affinity

L2-3

PitchBook

L2-3

Carta

L1-2

Clay

L1-2

AngelList

L1-2

Visible

L1-2

Hebbia

L3-4

Sierra

L2-3

EvenUp

L2-3

Luminance

L2-3

Ironclad

L2-3

Rogo

L2-3

Moveworks

L2-3

Relativity

L2-3

Consensus

L2-3

Elicit

L2-3

Google Agentspace

L2-3

Kira

L2-3

Primer AI

L2-3

Semantic Scholar

L2-3

Oracle AI

L2-3

Datadog Bits AI

L2-3

Rippling AI

L2-3

Deel AI

L2-3

Gusto AI

L1-2

MindSim

L5-6

Reasoner

L5-6

Patented.ai

L5-6

requires persistent knowledge

L0L1L2L3L4L5L6

Benchmarks

Most AI progress is tracked by benchmarks that never touch context reasoning. Here’s what does.

Context reasoning

NIAH

L1

Retrieval from context

SQuAD 2.0

L1-2

Reading comprehension

Natural Questions

L1-2

Full-document QA

RULER

L1-2

Long-context evaluation

TriviaQA

L1-2

Evidence-based QA

MS MARCO

L1-2

Passage ranking & RC

DROP

L2-3

Quantitative reasoning

DocBench

L2-3

Document comprehension

NarrativeQA

L2-3

Long-document comprehension

QuALITY

L2-3

Deep document QA

CoQA

L2-3

Conversational QA

ZeroSCROLLS

L2-3

Long-text NLU suite

FEVER

L2-3

Evidence verification

IFEval

L3

Rule following

MuSR

L3-4

Narrative reasoning

FRAMES

L3-4

Multi-hop reasoning

HotpotQA

L3-4

Cross-document reasoning

MuSiQue

L3-4

Hard multi-hop reasoning

2WikiMHQA

L3-4

Cross-article reasoning

ZebraLogic

L4

Constraint satisfaction

CL-bench

L4

Novel constraints

SWE-bench

L4

Cross-file reasoning

InfiniteBench

L3-4

100K+ token reasoning

LongBench v2

L4-5

Long-context reasoning

BBEH

L4-5

Adversarial reasoning

BABILong

L4-5

Extreme-length reasoning

Not textual context reasoning

ARC-AGI-2

Visual induction analogue · 85% via search

MMLU

Knowledge recall

MMLU-Pro

Hard knowledge recall

GPQA

Expert science

GSM8K

Math skills

MATH-500

Math reasoning

AIME

Math competition

FrontierMath

Math proofs

HumanEval

Code generation

LiveCodeBench

Competition coding

HellaSwag

Commonsense

SimpleQA

Factual accuracy

Chatbot Arena

Human preference

GAIA

Agentic tool use

Humanity’s Last Exam

Expert knowledge

L0L1L2L3L4L5L6

The frontier

That was the landscape.

Think of it as a memory that compounds — Level 5, in production today. Context compressed. Knowledge persisted. Value compounding.

Raw context

Re-derived every query

98.77% lower cost

Reasoner Core

Persists across queries

Portable across any LLM

compress

Compress once. Learn forever.

MindSim uses Reasoner Core to compress full context into a digital twin. Full synthesis once. Delta updates forever.

01 Genesis

Transcript 1 18K tokens

Context Induction

Reasoner Core 300 tokens

16 of 16 facets synthesized

02 Evolution

Transcript 2 22K tokens

Core Update

Updated Core 350 tokens

3 of 16 facets updated

03 Infinite Loop

Transcript 3 19K tokens

Transcript 8 14K tokens

Transcript 15 22K tokens

Transcript 31 11K tokens

Transcript 47 17K tokens

Core Update

+57 +56 +44 +55 +51

Evolved Core 350 tokens 2 transcripts 40K tokens compressed

∞

No rebuild. The core absorbs unlimited transcripts.

04 Transfer

Digital Twin Reasoner Core

The twin now predicts how the person would think, respond, and decide in situations they’ve never encountered. Behavioral knowledge transfers to novel contexts. This is Level 6: cross-domain transfer.

“This is my brain. This is mind blowing!”
Giuseppe Stuto — 186 Ventures

“This is wild. There’s a lot there. I think this is all accurate.”
Ethan Balogh, Criminal Defense Attorney

“It’s really accurate.. it’s why it’s fantastic. It’s very useful.”
Nicole Stata — Boston Seed Capital

compress

222 files in. 1 model out.

One real digital twin, fully traced.

Source data

40.9 MB
222 files
~10.7M tokens

3,278x smaller

Reasoner Core

12.8 KB 1 core ~3,270 tokens

Digital Twin

Source scale

42.9M+

characters processed

906K+

words spoken

56,137

dialogue entries

418K+

lines analyzed

Analysis depth

5,700

evidence fields

3,996

sentiment facets

2,775+

analysis paths

1,368

dimensional assessments

Processing coverage

765K+

model data

925

utility metrics

333

topic analyses

228

cognitive categories

40.9 MB

12.8 KB

3,278x smaller

learn

One core. Portable. Many applications.

MindSim builds the digital twin once. That same core powers every application — always in sync, always learning.

MindSim Digital Twin Reasoner Core 350 tokens — fits on a thumb drive

Digital Twin

Loads your full context before any meeting. Surfaces what you know, flags what you don't, and identifies risks — so you walk in prepared.

Twin Agent

Writes emails, briefs, and reports in your voice. Draws on every conversation and decision — so the output sounds like you wrote it.

Twin Collaboration

Teammates talk to your digital twin when you're unavailable. It shares your perspective, your reasoning, your domain knowledge — 24/7.

The twin learns continuously. Update the core once, every application stays current.

One twin. Every application. Always in sync.

Tiny footprint. Runs anywhere.

Small enough for any context window. Any LLM. Any device.

On-Device

Works with on-device LLMs in 2–4K context windows

iPhone Android Wearables Edge Ollama

Any Cloud LLM

Provider-agnostic, no lock-in

GPT Claude Gemini Llama Mistral DeepSeek Qwen

Enterprise

Any cloud platform. Stays inside the firewall

AWS Azure GCP On-prem Private Cloud VPC

Air-Gapped

Classified, regulated, data-isolated environments

FedRAMP HIPAA SOC 2 GDPR

compound

The architecture, in production.

Compressed knowledge that persists. An engine that compounds it.

Reasoner Core

Data processed per query

“Can this customer get a refund?”

Standard approach

Refund Policy 30K

Purchase History 150K

Customer Tier 5K

Terms of Service 180K

Support Tickets 135K

Regional Law 100K

~600,000 tokens

Full context, from scratch — every query

vs

Reasoner Core

Knowledge

~7,380 tokens

Compressed, persistent knowledge

98.77%

cost reduction

99.33%

extraction accuracy

Cumulative token cost over 100 queries

11050100 queries

Standard (600K/query)

Reasoner Core (7.4K/query)

59.3M

tokens saved over 100 queries

Reasoner Core, in production.

Shipping today across domains.

MindSim

Digital Twin Reasoner Core

535K+

assessments

199M+

words processed

58K+

transcripts

10K+

organizations

1,274

digital twins

3,463

induction cycles

162K+

findings

590

twins evolving

conversations → behavioral model

behavioral model → novel situation prediction

Patented.ai

US10180893B2 Reasoner Core

Induces true patent claims from filings and prosecution history. Finds previously hard-to-find prior art at IPR-level, validated by former USPTO leadership, former head of patents at Google, and top 20 law firm attorneys.

source code → patent claim evidence

patent claims → infringement discovery

IP portfolio → valuation + licensee matching

patent claims → prior art discovery + IPR invalidation

“Even with all the time in the world, we couldn’t do what Patented.ai did.”
Sr. Technical IP Analysis — Xerox

“We couldn’t have found what you’ve found.”
Partner — Perkins Coie

“This is incredibly valuable.”
Head of Intellectual Property — Apple

Reasoner Core

Compress

Full context compressed into ~350 tokens of portable knowledge. No hand-crafting, no prompt engineering.

e.g. 535K+ assessments across 1,274 digital twins; 75 patent documents into a single Reasoner Core

Learn

Every new document, transcript, or filing refines the knowledge. Better answers, automatically, without rebuilding.

e.g. 590 twins evolving across 3,463 induction cycles

Compound

Each new interaction costs less and answers better. The knowledge is already there, portable across any LLM.

e.g. 98.77% cost reduction; 58K+ transcripts compounding across MindSim and Patented.ai

Better answers. A fraction of the context.
And it compounds. That’s Reasoner Core.

Read the full thesis