Environment Anatomy#

A deep dive into the structure of OpenEnv environments.

Components#

Every OpenEnv environment consists of:

my_env/
β”œβ”€β”€ openenv.yaml          # Manifest file
β”œβ”€β”€ my_env/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ client.py         # Client classes
β”‚   β”œβ”€β”€ server.py         # Server/Environment
β”‚   └── models.py         # Pydantic models
β”œβ”€β”€ Dockerfile            # Container definition
β”œβ”€β”€ pyproject.toml        # Package metadata
└── README.md             # Documentation

The Manifest (openenv.yaml)#

name: my_env
version: 0.1.0
description: My custom environment

client:
  class_name: MyEnvClient
  module: my_env.client

action:
  class_name: MyAction
  module: my_env.models

observation:
  class_name: MyObservation
  module: my_env.models

default_image: my-env:latest
spec_version: 1

Models (Pydantic)#

Custom Action, Observation, and State types subclass the base classes from openenv.core.env_server.types β€” not pydantic.BaseModel directly. The base Observation already carries done and reward fields, which step() populates; Action and State add metadata plumbing used by the server.

from openenv.core.env_server.types import Action, Observation, State


class MyAction(Action):
    command: str
    args: list[str] = []


class MyObservation(Observation):
    output: str
    success: bool


class MyState(State):
    history: list[str] = []

Environment Class#

Environments subclass the abstract Environment[ActT, ObsT, StateT] base and implement reset, step, and the state property. Reward and termination are carried on the returned observation β€” they are not a tuple return value.

from openenv.core.env_server.interfaces import Environment


class MyEnvironment(Environment[MyAction, MyObservation, MyState]):
    def reset(self, seed=None, episode_id=None, **kwargs) -> MyObservation:
        ...

    def step(self, action: MyAction, timeout_s=None, **kwargs) -> MyObservation:
        ...

    @property
    def state(self) -> MyState:
        ...

Server (FastAPI)#

Use create_app from openenv.core.env_server to wrap the environment as a FastAPI application. Pass the environment class (used as a factory so each WebSocket session gets its own instance) along with the action and observation types:

from openenv.core.env_server import create_app

app = create_app(
    MyEnvironment,
    MyAction,
    MyObservation,
    env_name="my_env",
)

This is what the environment’s server/app.py entry point typically does β€” see envs/echo_env/server/app.py for a minimal real example.

Rewards via the Rubric#

Rewards are computed inside the environment, not by external code. The base Environment accepts an optional rubric on __init__ β€” pass it to super().__init__(rubric=...), call self._reset_rubric() from reset, and self._apply_rubric(action, observation) from step (or _apply_rubric_async from step_async). The Rubrics tutorial covers the composable API end-to-end.

Next Steps#