What is a Outcome Reward Model?
Outcome Reward Model
An Outcome Reward Model (ORM) is a feedback mechanism that scores only the final response generated by a model, without evaluating the correctness of intermediate reasoning steps. It is simpler to train but less granular than step-by-step reward models.
Detailed Deep Dive
Outcome Reward Models (ORMs) are feedback systems that score only the final correctness or quality of a model's complete response. While ORMs are easy to configure and require less annotation effort than process-based models, they provide less guidance during multi-step reasoning. Without step-level feedback, ORMs can inadvertently reward model outputs that reach correct conclusions through incorrect or hallucinated logical steps.
Frequently Asked Questions
Q:Why would you use an ORM instead of a PRM?
ORMs are much easier and cheaper to train because labeling only the final correctness of a response is faster than labeling every reasoning step.
Q:What is the risk of using only an ORM for reasoning models?
It can reward "logical alignment by coincidence" where a model arrives at the correct answer through flawed logic or guessing.
Quick Facts
- CategoryModel Training
- Key ApplicationBasic classification, text summarization, and simple question-answering validation
Coverage Trend12 Weeks
Related AI Terms
Outcome Reward Model Media Coverage & Intelligence
No Direct Outcome Reward Model News Today
We currently have no direct coverage articles matching "Outcome Reward Model" in the database archive. Explore trending global AI topics below instead.
Trending AI Stories
The Control Gap: Enterprise AI organizations have an ownership problem, not a technology problem - and most are governing it by hand
AI portfolios are expanding far faster than the ability to govern them across enterprises. Most organizations run a contested field of platforms, each claiming
SpaceX has an AI device prototype, and it sure sounds phone-ish
SpaceX reportedly showed investors a "handset-like" AI device before going public. It could be another signal SpaceX wants to expand into wireless.
Ashton Kutcher leaving Sound Ventures to launch new VC firm with Morgan Beller
The actor and investor is joining forces with Morgan Beller, who was previously a GP at NFX, to invest in early-stage startups.
You Can Now Sound the Alarm on AI Behaving Badly
Are you worried your AI chatbot is trying to build a bomb or leak personal information about you? There's a website for that.