Verifiers Library (Prime Intellect)¶
An open-source Python library by Prime Intellect for building reinforcement learning environments for LLM agents as distributable software artifacts. Introduced by stefano-fiorucci (AI Engineer 2026) as the primary tooling layer for rl-environment-engineering.
"Verifiers lets us focus on the task and the rewards rather than the infrastructure."
Design philosophy¶
Environments are Python packages — they can be pip install-ed, versioned, and shared like any other library. This fights environment fragmentation: the problem where RL training environments are locked to a specific training stack and impossible to reuse across teams or frameworks.
Core abstractions¶
Environment types¶
| Class | Use case |
|---|---|
SingleTurnEnv |
One prompt → one completion → reward (reverse text, factual QA) |
MultiTurnEnv |
Iterative model↔world exchange (games, double-check, dialogue) |
ToolEnv |
Model can call Python-defined tools, receive results, continue reasoning |
MCPEnv |
Auto-connects to MCP servers to expose tools |
StatefulToolEnv |
Per-rollout persistent state (DB connections, session IDs) |
All types inherit from MultiTurnEnv, which implements the core rollout loop.
Key hooks¶
load_environment()— entry point; loads dataset, initialises parsers and reward functions, returns configured envsetup_state()— populates per-rollout state dictionaryenv_response()— world logic: parses action, updates state, returns next message(s) or terminates@is_donedecorator — stopping condition; checked after every turnRubric— weighted collection of reward functions combined into a single scalar
Reward functions¶
Defined as plain Python functions. Fiorucci uses: winner_reward_fn (weight 1.0), format_reward_fn (weight 0.2, regex-based), invalid_move_penalty (flat −0.1).
Model serving¶
Verifiers abstracts model serving via OpenAI-compatible API endpoints — works with OpenAI, OpenRouter, or local models via vLLM/llama.cpp. Handles parallel rollouts automatically.
Training integration¶
Integrates with: Prime RL (Fiorucci's primary training runner), TinkerRL, SkyRL. Includes its own SimpleTrainer. Supports GRPO and CISPO.
Environments Hub¶
Paired with the Environments Hub — a community space for sharing RL environments publicly. Verifiers (tooling) + Hub (sharing) together aim to provide an open-source alternative as a market for closed-source environments emerges.
Cross-references¶
- rl-environment-engineering — the architectural pattern Verifiers implements
- rl-with-verifiable-rewards — the training paradigm Verifiers enables
- rl-curriculum-opponent-skill — curriculum implemented inside a Verifiers
MultiTurnEnv - harness-engineering — complementary framing from Lopopolo: harnesses for agent execution at inference time