Stefano Fiorucci¶
AI and software engineer. Works on AI orchestration at Deepset, where he develops haystack — an open-source LLM framework for building production-grade NLP and AI pipelines. Outside of work, focuses on small language models, fine-tuning, and reinforcement learning.
Talk: Let LLMs Wander (AI Engineer 2026)¶
Fiorucci presented "Let LLMs Wander: Engineering RL Environments" at the AI Engineer conference (uploaded 2026-04-08, ~40m). The talk covered:
- Mapping classic RL concepts (agent, environment, reward, trajectory) to the language model domain
- Introduction to Verifiers, an open-source Python library by Prime Intellect for building RL environments as distributable software artifacts
- A full experiment: training LFM-2 (Liquid AI) from weak tic-tac-toe play to master-level via SFT warm-up + GRPO/CISPO RL
Key thesis: "We did not just show the model how to play. We gave it a space to play and guided it through rewards." This succinctly captures the shift from supervised-fine-tuning-sft (statistical imitation) to rl-with-verifiable-rewards (environment-driven exploration).
Key contributions to the wiki¶
- rl-environment-engineering — the main frame of the talk
- rl-with-verifiable-rewards — DeepSeek R1 paradigm explained
- llm-wandering — exploration vs exploitation for LLMs
- verifiers-library — the Verifiers open-source tool
- rl-curriculum-opponent-skill — curriculum via opponent difficulty ramping
- synthetic-sft-bootstrap — SFT warm-up before RL
See also¶
- faye-zhang — complementary post-training perspective (sub-agents for post-training pipelines)
- distill-to-small-task-model — related pattern: small models trained to beat large ones on specific tasks