---
type: Glossary Term
title: RLHF (Reinforcement Learning from Human Feedback)
description: "The dominant method for aligning LLMs with human preferences. Humans rate model outputs, and the model is trained to prefer higher-rated answers. Can lead to Mo"
resource: "https://www.contextstudios.ai/glossary/rlhf"
category: engineering
language: en
timestamp: "2026-02-05T22:07:47.547Z"
---

# RLHF (Reinforcement Learning from Human Feedback)

The dominant method for aligning LLMs with human preferences. Humans rate model outputs, and the model is trained to prefer higher-rated answers. Can lead to Mode Collapse as 'typical' answers are systematically preferred.

## Business Value

RLHF is how models like ChatGPT and Claude become helpful and safe. Understanding its mechanics helps you predict model behavior and work around its limitations.

## Context Studios Perspective

RLHF is powerful but imperfect. We help clients understand where RLHF-induced behaviors help or hinder their use cases – and how to prompt around limitations.
