LoginJoin GenScrap
Back to Public Gallery
From: k1ito-techby k1itoabout 2 months ago

#[2508.20722] rStar2-Agent: Agentic Reasoning Technical Report

URL: https://www.arxiv.org/abs/2508.20722

Captured: 2025/9/6 17:39:22


Computer Science > Computation and Language [Submitted on 28 Aug 2025] rStar2-Agent: Agentic Reasoning Technical Report Ning Shang, Yifei Liu, Yi Zhu, Li Lyna Zhang, Weijiang Xu, Xinyu Guan, Buze Zhang, Bingcheng Dong, Xudong Zhou, Bowen Zhang, Ying Xin, Ziming Miao, Scarlett Li, Fan Yang, Mao Yang We introduce rStar2-Agent, a 14B math reasoning model trained with agentic reinforcement learning to achieve frontier-level performance. Beyond current long CoT, the model demonstrates advanced cognitive behaviors, such as thinking carefully before using Python coding tools and reflecting on code execution feedback to autonomously explore, verify, and refine intermediate steps in complex problem-solving. This capability is enabled through three key innovations that makes agentic RL effective at scale: (i) an efficient RL infrastructure with a reliable Python code environment that supports high-throughput execution and mitigates the high rollout costs, enabling training on limited GPU resources (64 MI300X GPUs); (ii) GRPO-RoC, an agentic RL algorithm with a Resample-on-Correct rollout strategy that addresses the inherent environment noises from coding tools, allowing the model to reason more effectively in a code environment; (iii) An efficient agent training recipe that starts with non-reasoning SFT and progresses through multi-RL stages, yielding advanced cognitive abilities with minimal compute cost. To this end, rStar2-Agent boosts a pre-trained 14B model to state of the art in only 510 RL steps within one week, achieving average pass@1 scores of 80.6% on AIME24 and 69.8% on AIME25, surpassing DeepSeek-R1 (671B) with significantly shorter responses. Beyond mathematics, rStar2-Agent-14B also demonstrates strong generalization to alignment, scientific reasoning, and agentic tool-use tasks. Code and training recipes are available at this https URL. Subjects: Computation and Language (cs.CL) Cite as: arXiv:2508.20722 [cs.CL] (or arXiv:2508.20722v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2508.20722 Focus to learn more Submission history From: Li Lyna Zhang [view email] [v1] Thu, 28 Aug 2025 12:45:25 UTC (1,217 KB) Access Paper: View PDF HTML (experimental) TeX Source Other Formats view license Current browse context: cs.CL < prev | next > new | recent | 2025-08 Change to browse by: cs References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

About this Scrapbook
See Also

Other scraps from "k1ito-tech"

diskcache

いい選択です 👍 diskcache は ディスクにキャッシュを保存できるライブラリ で、メモリを圧迫せずに大量のデータをキャッシュできます。しかも API がシンプルで、Webアプリや機械学習の前処理結果キャッシュなどにもよく使われます。 --- インストール bash pip inst...

about 1 month ago
#python caching#diskcache+3

Best mcp server development sdk?

If by “MCP server” you mean a server implementing the Model Context Protocol (MCP) to allow LLMs / AI agents to interact with external tools/data sour...

about 1 month ago
#model context protocol#mcp sdk+3

Daytona Sandbox:開発環境の新たな可能性

Daytona Sandbox:開発環境の新たな可能性 Daytona Sandboxとは Daytona Sandboxは、開発者がクラウド上で瞬時に開発環境を構築・共有できる革新的なプラットフォームです。従来のローカル開発環境の制約を取り払い、どこからでもアクセス可能な統一された開発体験...

about 2 months ago
#daytona#sandbox+3

E2B example in Python

step-by-step E2B example in Python that shows stateful execution, installing packages, uploading a file, and doing a quick SQLite query—all inside a s...

about 2 months ago
#e2b#python+3

# Agentic workflow patterns - AWS Prescriptive Guidance

Agentic workflow patterns integrate modular software agents with structured large language model (LLM) workflows, enabling autonomous reasoning and ac...

2 months ago
#aws#agentic ai+3

Amazon EC2 Single GPU P5 instances are now generally available

What's New at AWS - Cloud Innovation & News URL: https://aws.amazon.com/jp/about-aws/whats-new/2025/08/amazon-p5-single-gpu-instances-now-available/...

2 months ago
#AWS EC2#NVIDIA H100+3

Want to create your own articles?

Get Started