Skip to content

Verifiers Library (Prime Intellect)

An open-source Python library by Prime Intellect for building reinforcement learning environments for LLM agents as distributable software artifacts. Introduced by stefano-fiorucci (AI Engineer 2026) as the primary tooling layer for rl-environment-engineering.

"Verifiers lets us focus on the task and the rewards rather than the infrastructure."

Design philosophy

Environments are Python packages — they can be pip install-ed, versioned, and shared like any other library. This fights environment fragmentation: the problem where RL training environments are locked to a specific training stack and impossible to reuse across teams or frameworks.

Core abstractions

Environment types

Class Use case
SingleTurnEnv One prompt → one completion → reward (reverse text, factual QA)
MultiTurnEnv Iterative model↔world exchange (games, double-check, dialogue)
ToolEnv Model can call Python-defined tools, receive results, continue reasoning
MCPEnv Auto-connects to MCP servers to expose tools
StatefulToolEnv Per-rollout persistent state (DB connections, session IDs)

All types inherit from MultiTurnEnv, which implements the core rollout loop.

Key hooks

  • load_environment() — entry point; loads dataset, initialises parsers and reward functions, returns configured env
  • setup_state() — populates per-rollout state dictionary
  • env_response() — world logic: parses action, updates state, returns next message(s) or terminates
  • @is_done decorator — stopping condition; checked after every turn
  • Rubric — weighted collection of reward functions combined into a single scalar

Reward functions

Defined as plain Python functions. Fiorucci uses: winner_reward_fn (weight 1.0), format_reward_fn (weight 0.2, regex-based), invalid_move_penalty (flat −0.1).

Model serving

Verifiers abstracts model serving via OpenAI-compatible API endpoints — works with OpenAI, OpenRouter, or local models via vLLM/llama.cpp. Handles parallel rollouts automatically.

Training integration

Integrates with: Prime RL (Fiorucci's primary training runner), TinkerRL, SkyRL. Includes its own SimpleTrainer. Supports GRPO and CISPO.

Environments Hub

Paired with the Environments Hub — a community space for sharing RL environments publicly. Verifiers (tooling) + Hub (sharing) together aim to provide an open-source alternative as a market for closed-source environments emerges.

Cross-references