Absolute Zero Reasoner. AI News – #2 May

3min.

Comments:0

19 May 2025

Absolute Zero Reasoner. AI News – #2 Mayd-tags
Artificial intelligence that invents its own tasks and learns to solve them, without needing a single byte of human-provided data? It sounds like a science fiction breakthrough, and it's called Absolute Zero Reasoner (AZR). This novel AI model is sparking as much excitement as it is questions. Is it a true revolution in how machines are trained, a step towards AI that learns like a human? Or is it "just" an incredibly clever automation of processes we already know?

3min.

Comments:0

19 May 2025

What is Absolute Zero Reasoner – AZR?

Absolute Zero Reasoner (AZR), developed by researchers from Tsinghua University, the Beijing Institute for General Artificial Intelligence, and Pennsylvania State University, is an artificial intelligence model designed to autonomously develop reasoning abilities. Its fundamental feature is the ability to independently generate tasks for itself and then solve them. Importantly, the learning process of AZR is based on verifying the correctness of these solutions through an objective, external mechanism – in this case, a code executor – and does not require any pre-prepared training data by humans. This model operates within the RLVR (Reinforcement Learning from Verifiable Reward) paradigm, called “Absolute Zero,” meaning its development is driven by a reward system based on verifiable results of its own work. Sounds complicated? Let me explain!

AZR can autonomously generate tasks and then solve them, maximizing its own learning progress. Most importantly – it does this without the need to use any external data prepared by humans. It’s a bit like giving AI a sandbox and a shovel, and it starts building increasingly complex castles, learning from every grain of sand.

How does AZR work?

At the heart of AZR is a clever mechanism in which the AI model plays two roles simultaneously:

  1. Proposer generates new tasks or problems. Crucially, it is motivated to create challenges that are “just right” for it – neither too easy (boring!), nor too hard (frustrating!). The goal is to find the perfect point that ensures optimal development.
  2. Solver attempts to solve the tasks created by the Proposer.

The entire system learns through interaction with an environment that provides verifiable feedback. In the case of AZR, this environment is the code executor. It can objectively check whether the generated code works correctly and produces the expected results. It’s a bit like a judge at a competition who fairly evaluates the performance.

This creates a loop of continuous improvement:

  • AI proposes a task.
  • The environment evaluates whether the task is “learning-worthy” (reward for the Proposer).
  • AI tries to solve the task.
  • The environment verifies the correctness of the solution (reward for the Solver).

And so on, in a loop, with the AI becoming better and better at inventing useful tasks and solving them, starting from absolute zero – hence the name.

How AZR works
source: https://www.researchgate.net/publication/391493002_Absolute_Zero_Reinforced_Self-play_Reasoning_with_Zero_Data

Why code? The universal language of AI

The creators of AZR focused on the domain of coding. Why?

  • Programming languages can describe almost any computational process. The ability to reason about code can translate to general logical thinking skills.
  • Code allows the creation of complex, structured problems.
  • The code executor provides clear, objective feedback – it either works or it doesn’t.

AZR learns three fundamental types of reasoning about code: deduction (predicting outcomes), abduction (inferring inputs from outcomes), and induction (creating a program based on examples).

Revolution or clever automation? What AZR changes (and what it doesn’t)

AZR’s results are impressive. The model can outperform other systems in certain tasks that were trained on huge, human-prepared datasets. It excels particularly well at generalizing skills to new, previously unknown domains. Sounds like a revolution, right?

However, some temper the enthusiasm. They point out that although AZR is impressive, it does not eliminate the fundamental problems of large language models (LLMs). It is rather a very advanced automation of synthetic data generation. Here, AI does not learn to “think” in a completely new way but rather becomes a master at solving specific tests and tasks it sets for itself. The concept of “self-play” is also not new.

However, the fact that AZR achieves such good results in coding and math tasks training completely without external data is extremely interesting. This is where the potential paradigm shift lies – instead of flooding AI with tons of data, we allow it to explore and discover on its own.

Is AZR a step towards AI that learns like a human?

Let’s think about how humans learn. Sure, we use books and teachers (which resembles supervised learning in AI), but a huge part of our knowledge comes from interacting with the world, experimenting, setting challenges for ourselves, and learning from successes and failures. AZR tries to imitate this very process, driven by curiosity and exploration.

Is Absolute Zero Reasoner a true breakthrough and milestone? It is certainly a fascinating research direction that shows the path to more autonomous and, perhaps, more “intelligent” artificial intelligence might run through self-driven discovery of knowledge. As always, the future will show how much these promising concepts will change the AI landscape. One thing is certain – it’s worth staying tuned, so keep up with AI news with Delante and subscribe to our newsletter!

Author
Maciej Jakubiec - Junior SEO Specialist
Author
Maciej Jakubiec

SEO Specialist

A marketing graduate specializing in e-commerce from the University of Economics in Kraków – part of Delante’s SEO team since 2022. A firm believer in the importance of well-crafted content, and apart from being an SEO, a passionate music producer crafting sounds since his early teens.

Author
Maciej Jakubiec - Junior SEO Specialist
Author
Maciej Jakubiec

SEO Specialist

A marketing graduate specializing in e-commerce from the University of Economics in Kraków – part of Delante’s SEO team since 2022. A firm believer in the importance of well-crafted content, and apart from being an SEO, a passionate music producer crafting sounds since his early teens.