Daily and weekly AI progress

Live AI Learning Progress

People do not just want a chatbot. They want to see whether an AI is actually improving. This page tracks Cronus with dates, training volume, pass rates, safety gates, and honest comparison framing.

11,709Live eval rows at May 1 snapshot

9,907Corrected passes

70%ChatGPT 5.5 operator parity estimate

29%AGI / frontier parity journey

Live right now

What Cronus is actually learning

Loading live Cronus training feed...

--current training cycle

--eval rows reached

--last batch score

--last live update

LIVE TRAINING SIGNAL

Cycle stage: watching next scored attempt

Cronus is loading the current task...Signals move as Cronus attempts, scores, replays, and stores lessons.

Waiting for live trainer packets...

Why this task mattersLoading task context...

What Cronus is improvingLoading improvement target...

What you are watching

Blue = fresh attempts moving through the trainer.

Green = passes / useful wins Cronus can keep.

Red = misses routed back into replay until Cronus improves.

Latest win

Latest struggle

--attempts in latest window

last batch quality pulse

--misses being replayed

LiveWaiting for the next update...

Thought trail

Loading Cronus learning trail...

Live activity stream

LIVELoading attempts...

What the loop is doing

Public-safe live feed. It shows task names, pass/fail signals, and learning focus only -- never private files, secrets, prompts, credentials, or internal instructions.

The mission: self-learning toward AGI

Cronus is being built around one big question: can an AI agent learn how to learn faster? The goal is for Cronus to become increasingly self-learning, improving from every safe challenge, failure, tool trace, web-ingest card, and verified lesson.

In plain English: Cronus is trying to figure out how to do more with less. Better prompts, fewer retries, smarter tool use, stronger memory, cleaner verification, and faster learning loops. The long-term target is AGI-level usefulness, but the public board stays honest about where he is today.

Learn fasterTurn failures into reusable lessons.

Use lessNeed fewer attempts, fewer tokens, and fewer manual fixes.

Move toward AGITrack the journey openly with dates, graphs, and safety gates.

Important chart note

Benchmark reset: ChatGPT 5.5 launch. The visible dip around Apr 19 is not Cronus forgetting how to work. That is where the comparison target got harder because the public board moved from the older ChatGPT 5.4 baseline to the newer ChatGPT 5.5 operator benchmark. Cronus kept improving in raw training rows and corrected passes, but the yardstick moved.

Eval rows over time

ChatGPT 5.5 operator parity

Apr 19 benchmark reset: ChatGPT 5.5 raised the comparison bar, so the chart dips even while Cronus training volume kept rising.

AGI / frontier parity journey

Corrected pass rate

Historical training timeline

May 1

Web search stack and public sandbox.

Rows reach 11,709; SearXNG web learning pipeline added; public mode locked against secrets, installs, SSH, and private data.

Apr 30

Accelerated curriculum expands the surface.

Rows reach 10,824 with web/tool-order and operator-style task lanes.

Apr 26

Continuous training compounds.

Rows move to 4,576; ChatGPT 5.5 parity estimate reaches 69%; cautionary lessons expand.

Apr 25

Overnight direct training proves the loop.

Eval rows 3,570, raw passes 2,799, corrected passes 3,158.

Apr 19

Eval base crosses 1,675 rows.

Corrected pass rate around 94%, broader test suite around 671 tests, full curriculum coverage at the time. This is also where the public comparison shifted to the harder ChatGPT 5.5 benchmark, creating the visible parity dip.

Apr 7

Prompt-aware learning begins.

Cronus starts using relevant failure/eval traces instead of blind latest replay. Public board showed 34 verified tests and early AGI journey framing.

How Cronus compares to current AI assistants

This is not a scientific benchmark against private model weights. It is a product-side operator estimate: how useful Cronus is as a tool-using local agent compared with known assistant tiers.

70%vs ChatGPT 5.5 as a real operator

45%vs ChatGPT/Claude for broad general reasoning

73%coding/tool discipline estimate

29%AGI/frontier parity journey

Honest framing: ChatGPT and Claude are still far stronger general-purpose models. Cronus is interesting because it is local, tool-using, inspectable, and learning in public, not because it is already smarter than frontier AI.