Verifiers Library (Prime Intellect)¶

An open-source Python library by Prime Intellect for building reinforcement learning environments for LLM agents as distributable software artifacts. Introduced by stefano-fiorucci (AI Engineer 2026) as the primary tooling layer for rl-environment-engineering.

"Verifiers lets us focus on the task and the rewards rather than the infrastructure."

Design philosophy¶

Environments are Python packages — they can be pip install-ed, versioned, and shared like any other library. This fights environment fragmentation: the problem where RL training environments are locked to a specific training stack and impossible to reuse across teams or frameworks.

Core abstractions¶

Environment types¶

Class	Use case
`SingleTurnEnv`	One prompt → one completion → reward (reverse text, factual QA)
`MultiTurnEnv`	Iterative model↔world exchange (games, double-check, dialogue)
`ToolEnv`	Model can call Python-defined tools, receive results, continue reasoning
`MCPEnv`	Auto-connects to MCP servers to expose tools
`StatefulToolEnv`	Per-rollout persistent state (DB connections, session IDs)

All types inherit from MultiTurnEnv, which implements the core rollout loop.

Key hooks¶

load_environment() — entry point; loads dataset, initialises parsers and reward functions, returns configured env
setup_state() — populates per-rollout state dictionary
env_response() — world logic: parses action, updates state, returns next message(s) or terminates
@is_done decorator — stopping condition; checked after every turn
Rubric — weighted collection of reward functions combined into a single scalar

Reward functions¶

Defined as plain Python functions. Fiorucci uses: winner_reward_fn (weight 1.0), format_reward_fn (weight 0.2, regex-based), invalid_move_penalty (flat −0.1).

Model serving¶

Verifiers abstracts model serving via OpenAI-compatible API endpoints — works with OpenAI, OpenRouter, or local models via vLLM/llama.cpp. Handles parallel rollouts automatically.

Training integration¶

Integrates with: Prime RL (Fiorucci's primary training runner), TinkerRL, SkyRL. Includes its own SimpleTrainer. Supports GRPO and CISPO.

Environments Hub¶

Paired with the Environments Hub — a community space for sharing RL environments publicly. Verifiers (tooling) + Hub (sharing) together aim to provide an open-source alternative as a market for closed-source environments emerges.

Cross-references¶

rl-environment-engineering — the architectural pattern Verifiers implements
rl-with-verifiable-rewards — the training paradigm Verifiers enables
rl-curriculum-opponent-skill — curriculum implemented inside a Verifiers MultiTurnEnv
harness-engineering — complementary framing from Lopopolo: harnesses for agent execution at inference time