RL Framework Integration#

Note

This page is still being filled in. TRL integration is covered below; torchforge and SkyRL integrations are planned.

Use OpenEnv with popular RL frameworks like TRL, torchforge, and SkyRL.

Overview#

OpenEnv environments are designed to integrate seamlessly with RL training frameworks. The standard step(), reset(), state() API makes it easy to use environments in training loops.

TRL Integration#

TRL (Transformer Reinforcement Learning) is the recommended framework for training language models with RL.

from trl import GRPOTrainer
from openenv import AutoEnv, AutoAction

env = AutoEnv.from_env("textarena")
TextAction = AutoAction.from_env("textarena")

# Use with TRL's GRPO trainer
trainer = GRPOTrainer(
    model=model,
    reward_model=reward_model,
    # ... TRL config
)

See the Wordle with GRPO tutorial for a complete example.

Generic Training Loop#

For custom training setups:

from openenv import AutoEnv, AutoAction

env = AutoEnv.from_env("my-env")
Action = AutoAction.from_env("my-env")

with env.sync() as client:
    for episode in range(num_episodes):
        result = client.reset()

        while not result.terminated:
            # Get action from your policy
            action = policy(result.observation)

            # Take step
            result = client.step(action)

            # Update policy with reward
            policy.update(result.reward)

Next Steps#

Reward Design - Design effective reward functions
Wordle with GRPO - Complete TRL example