RL Framework Integration#
Note
This page is still being filled in. TRL integration is covered below; torchforge and SkyRL integrations are planned.
Use OpenEnv with popular RL frameworks like TRL, torchforge, and SkyRL.
Overview#
OpenEnv environments are designed to integrate seamlessly with RL training frameworks. The standard step(), reset(), state() API makes it easy to use environments in training loops.
TRL Integration#
TRL (Transformer Reinforcement Learning) is the recommended framework for training language models with RL.
from trl import GRPOTrainer
from openenv import AutoEnv, AutoAction
env = AutoEnv.from_env("textarena")
TextAction = AutoAction.from_env("textarena")
# Use with TRL's GRPO trainer
trainer = GRPOTrainer(
model=model,
reward_model=reward_model,
# ... TRL config
)
See the Wordle with GRPO tutorial for a complete example.
Generic Training Loop#
For custom training setups:
from openenv import AutoEnv, AutoAction
env = AutoEnv.from_env("my-env")
Action = AutoAction.from_env("my-env")
with env.sync() as client:
for episode in range(num_episodes):
result = client.reset()
while not result.terminated:
# Get action from your policy
action = policy(result.observation)
# Take step
result = client.step(action)
# Update policy with reward
policy.update(result.reward)
Next Steps#
Reward Design - Design effective reward functions
Wordle with GRPO - Complete TRL example