Daily and weekly AI progress

Live AI Learning Progress

People do not just want a chatbot. They want to see whether an AI is actually improving. This page tracks Cronus with dates, training volume, pass rates, safety gates, and honest comparison framing.

11,709Live eval rows at May 1 snapshot
9,907Corrected passes
70%ChatGPT 5.5 operator parity estimate
29%AGI / frontier parity journey
Live right now

What Cronus is actually learning

Loading live Cronus training feed...

--current training cycle
--eval rows reached
--last batch score
--last live update
LIVE TRAINING SIGNAL
Cycle stage: watching next scored attempt
Cronus is loading the current task...Signals move as Cronus attempts, scores, replays, and stores lessons.
Waiting for live trainer packets...
Why this task mattersLoading task context...
What Cronus is improvingLoading improvement target...
What you are watching
Blue = fresh attempts moving through the trainer.
Green = passes / useful wins Cronus can keep.
Red = misses routed back into replay until Cronus improves.

Latest win

Loading...

Latest struggle

Loading...

--attempts in latest window
--
last batch quality pulse
--misses being replayed
LiveWaiting for the next update...

Thought trail

Loading Cronus learning trail...

Live activity stream

LIVELoading attempts...

What the loop is doing

Public-safe live feed. It shows task names, pass/fail signals, and learning focus only -- never private files, secrets, prompts, credentials, or internal instructions.

The mission: self-learning toward AGI

Cronus is being built around one big question: can an AI agent learn how to learn faster? The goal is for Cronus to become increasingly self-learning, improving from every safe challenge, failure, tool trace, web-ingest card, and verified lesson.

In plain English: Cronus is trying to figure out how to do more with less. Better prompts, fewer retries, smarter tool use, stronger memory, cleaner verification, and faster learning loops. The long-term target is AGI-level usefulness, but the public board stays honest about where he is today.

Learn fasterTurn failures into reusable lessons.
Use lessNeed fewer attempts, fewer tokens, and fewer manual fixes.
Move toward AGITrack the journey openly with dates, graphs, and safety gates.

Important chart note

Benchmark reset: ChatGPT 5.5 launch. The visible dip around Apr 19 is not Cronus forgetting how to work. That is where the comparison target got harder because the public board moved from the older ChatGPT 5.4 baseline to the newer ChatGPT 5.5 operator benchmark. Cronus kept improving in raw training rows and corrected passes, but the yardstick moved.

Eval rows over time

Apr 7: 3434Apr 8: 123123Apr 12: 466466Apr 19: 1,6751,675Apr 25: 3,5703,570Apr 26: 4,5764,576Apr 30: 10,82410,824May 1: 11,70911,709Apr 7Apr 8Apr 12Apr 19Apr 25Apr 26Apr 30May 1

ChatGPT 5.5 operator parity

Apr 19 benchmark reset: ChatGPT 5.5 raised the comparison bar, so the chart dips even while Cronus training volume kept rising.

Apr 7: 57%57%Apr 8: 61%61%Apr 12: 86%86%Apr 19: 66%66%Apr 25: 68%68%Apr 26: 69%69%Apr 30: 69%69%May 1: 70%70%Apr 7Apr 8Apr 12Apr 19Apr 25Apr 26Apr 30May 1

AGI / frontier parity journey

Apr 7: 14%14%Apr 8: 17%17%Apr 12: 24%24%Apr 19: 27%27%Apr 25: 28%28%Apr 26: 29%29%Apr 30: 29%29%May 1: 29%29%Apr 7Apr 8Apr 12Apr 19Apr 25Apr 26Apr 30May 1

Corrected pass rate

Apr 7: 82%82%Apr 8: 83%83%Apr 12: 91%91%Apr 19: 94%94%Apr 25: 88%88%Apr 26: 88%88%Apr 30: 85%85%May 1: 85%85%Apr 7Apr 8Apr 12Apr 19Apr 25Apr 26Apr 30May 1

Historical training timeline

Web search stack and public sandbox.

Rows reach 11,709; SearXNG web learning pipeline added; public mode locked against secrets, installs, SSH, and private data.

Accelerated curriculum expands the surface.

Rows reach 10,824 with web/tool-order and operator-style task lanes.

Continuous training compounds.

Rows move to 4,576; ChatGPT 5.5 parity estimate reaches 69%; cautionary lessons expand.

Overnight direct training proves the loop.

Eval rows 3,570, raw passes 2,799, corrected passes 3,158.

Eval base crosses 1,675 rows.

Corrected pass rate around 94%, broader test suite around 671 tests, full curriculum coverage at the time. This is also where the public comparison shifted to the harder ChatGPT 5.5 benchmark, creating the visible parity dip.

Prompt-aware learning begins.

Cronus starts using relevant failure/eval traces instead of blind latest replay. Public board showed 34 verified tests and early AGI journey framing.

How Cronus compares to current AI assistants

This is not a scientific benchmark against private model weights. It is a product-side operator estimate: how useful Cronus is as a tool-using local agent compared with known assistant tiers.

70%vs ChatGPT 5.5 as a real operator
45%vs ChatGPT/Claude for broad general reasoning
73%coding/tool discipline estimate
29%AGI/frontier parity journey
Honest framing: ChatGPT and Claude are still far stronger general-purpose models. Cronus is interesting because it is local, tool-using, inspectable, and learning in public, not because it is already smarter than frontier AI.