NAVIGATION

What is a Outcome Reward Model?

Definition

Outcome Reward Model

An Outcome Reward Model (ORM) is a feedback mechanism that scores only the final response generated by a model, without evaluating the correctness of intermediate reasoning steps. It is simpler to train but less granular than step-by-step reward models.

Detailed Deep Dive

Outcome Reward Models (ORMs) are feedback systems that score only the final correctness or quality of a model's complete response. While ORMs are easy to configure and require less annotation effort than process-based models, they provide less guidance during multi-step reasoning. Without step-level feedback, ORMs can inadvertently reward model outputs that reach correct conclusions through incorrect or hallucinated logical steps.

Frequently Asked Questions

Q:Why would you use an ORM instead of a PRM?

ORMs are much easier and cheaper to train because labeling only the final correctness of a response is faster than labeling every reasoning step.

Q:What is the risk of using only an ORM for reasoning models?

It can reward "logical alignment by coincidence" where a model arrives at the correct answer through flawed logic or guessing.

Quick Facts

  • CategoryModel Training
  • Key ApplicationBasic classification, text summarization, and simple question-answering validation

Coverage Trend12 Weeks

12w agoToday

Outcome Reward Model Media Coverage & Intelligence

No Direct Outcome Reward Model News Today

We currently have no direct coverage articles matching "Outcome Reward Model" in the database archive. Explore trending global AI topics below instead.

Trending AI Stories

VentureBeatJul 1, 2026

The Control Gap: Enterprise AI organizations have an ownership problem, not a technology problem - and most are governing it by hand

AI portfolios are expanding far faster than the ability to govern them across enterprises. Most organizations run a contested field of platforms, each claiming

TechCrunch AIJul 1, 2026

SpaceX has an AI device prototype, and it sure sounds phone-ish

SpaceX reportedly showed investors a "handset-like" AI device before going public. It could be another signal SpaceX wants to expand into wireless.

TechCrunch AIJul 1, 2026

Ashton Kutcher leaving Sound Ventures to launch new VC firm with Morgan Beller

The actor and investor is joining forces with Morgan Beller, who was previously a GP at NFX, to invest in early-stage startups.

WiredJul 1, 2026

You Can Now Sound the Alarm on AI Behaving Badly

Are you worried your AI chatbot is trying to build a bomb or leak personal information about you? There's a website for that.