Normalize validation (in-loop)
Normalize validation (in-loop)
When serge produces a fix through the tasks flow, the LLM proposes a logical patch — it does not run the target repo’s build commands. For repos with a consistency gate (regenerated files, formatting, lint), that means the opened PR can fail repo-consistency CI even when the fix itself is correct, forcing a follow-up commit.
Normalize validation closes that gap by making the repo’s own normalizer
(e.g. make style && make fix-repo) part of the LLM loop. When the model
emits a patch, serge applies it to the worktree and runs the normalizer. If the
normalizer rejects the patch, that failure is fed straight back to the model
as a new turn in the same conversation, so it can correct the patch — exactly
the feedback a human contributor would get from CI, but before the PR is opened.
When the normalizer passes, the worktree already holds the applied + normalized
result and serge commits it as one clean patch. The PR is conformant the
moment it opens.
This is opt-in, per deployment/repo. When TASK_NORMALIZE_COMMAND is unset,
validation is skipped entirely and serge stays repo-agnostic, exactly as before.
How it runs
prepare_task(): ← the agentic LLM loop
loop:
LLM emits a candidate patch
reset worktree → apply patch → run normalizer ← the verification gate
├─ patch won't apply → feed `git apply` error back → LLM revises
├─ normalizer exits != 0 → feed its output back → LLM revises
└─ normalizer passes → accept; worktree = applied + normalized
publish_task():
stage the prepared worktree → commit → open/update PR ← Git Data API
The model gets up to TASK_NORMALIZE_MAX_RETRIES corrective re-prompts (default
2 → 3 patch attempts total). It applies to both new_pr and existing_pr
modes. A typical normalizer rejection the model learns to fix: “you edited the
auto-generated modeling_x.py instead of its modular_x.py source” — the
normalizer’s own error text carries that signal.
What happens when validation can’t finish
The loop never costs you the fix:
- Retries exhausted (the model couldn’t satisfy the normalizer): serge falls back to committing the model’s last patch raw — the same PR you’d get today, which the repo’s CI then flags. Better an imperfect PR than a lost fix.
- Sandbox unavailable / timeout (infrastructure, not the model’s fault): the applied patch is accepted un-normalized rather than blaming the model.
No injection surface
The normalize command is operator/repo configuration, never request-supplied.
The /tasks request cannot name a command — so there is nothing to allowlist
and no command-injection vector to contain. The OIDC repository claim
authorizes which repo serge acts on; the operator decides what command runs.
Sandbox backends
The normalizer runs arbitrary repo build code, so it executes network-isolated,
with the worktree as the only writable path and no serge secrets in its
environment. TASK_SANDBOX_BACKEND selects how:
| Backend | Isolation | When to use |
|---|---|---|
docker |
Throwaway container: --network none, read-only rootfs, --cap-drop ALL, no-new-privileges, pids cap, runs as serge’s uid:gid. |
The portable default. Works on any host with a Docker daemon — no Kubernetes required. |
kubernetes |
One-shot Job in a locked-down namespace (non-root, no-privileged, deny-all egress), worktree on a shared RWX volume. | k8s deployments that want pod-level isolation. (Implemented in Phase 1.) |
bwrap |
bubblewrap over serge’s own venv (--unshare-net). |
Dev/test only — viable just when the command needs no deps beyond serge’s. |
auto |
docker when an image is set and the docker CLI is present, else bwrap. | Convenient default for mixed environments. |
Kubernetes is never mandatory: a classic Docker deployment is a first-class,
fully-isolated backend. Pick docker and you need nothing else.
Setting up the Docker image
The docker backend runs the command in a throwaway container built from an
image with the target repo’s toolchain baked in. serge runs it roughly as:
docker run --rm --init --network none --read-only \
--tmpfs /tmp --cap-drop ALL --security-opt no-new-privileges \
--pids-limit 512 --user <serge-uid>:<serge-gid> \
--volume <worktree>:<worktree>:rw --workdir <worktree> \
<TASK_NORMALIZE_IMAGE> bash -lc 'make style && make fix-repo'
-
Write a Dockerfile that installs the toolchain the normalizer needs — and nothing it doesn’t (no torch / model deps for
make fix-repo). Seedocker/Dockerfile.task-runnerfor a worked transformers example (just the[quality]extra). - Build and tag it, pinning versions so the normalizer produces the same
output the repo’s own CI would:
docker build -f docker/Dockerfile.task-runner \ --build-arg TRANSFORMERS_REF=main \ -t serge/transformers-quality:latest .Build it wherever serge runs (it must be present in the local Docker daemon — serge never pulls at normalize time, and the container has no network). Rebuild when the repo bumps its pinned tool versions.
- Point serge at it via
TASK_NORMALIZE_IMAGE.
Constraints on the image / command
- Deps must be baked in. The container has no network;
pip installat run time will fail. Install everything at build time. - No
.pycwrites to site-packages — the rootfs is read-only. serge setsPYTHONDONTWRITEBYTECODE=1; informational. - The command must not need git history or remotes. The checkout is a
detached
git worktreewhose gitdir lives outside the bind mount, and there is no network, so commands that fetch/diff againstoriginwon’t work. File-based normalizers (make fix-repo,ruff, codegen) are fine — serge does the staging/diffing itself, on the host, after the command runs.
The bwrap fallback
If you don’t set an image (or set TASK_SANDBOX_BACKEND=bwrap), the command
runs under bubblewrap using serge’s own venv — viable only when the command
needs no dependencies beyond what serge already has installed. make fix-repo
for transformers needs the [quality] toolchain, so use the docker backend
for it.
Configuration
See configuration for the full env var table. The minimum for transformers:
TASK_NORMALIZE_COMMAND=bash -lc 'make style && make fix-repo'
TASK_NORMALIZE_IMAGE=serge/transformers-quality:latest
TASK_SANDBOX_BACKEND=docker
The per-repo write opt-in (task_write_enabled on the repo’s provider config)
is the same as for the LLM task flow — the hook commits through the same path.
Guiding the model
The model is always told to fix the root cause and to use suppressions
(# noqa, # type: ignore) only as a last resort. You can add more policy two
ways, both repo-agnostic:
TASK_NORMALIZE_GUIDANCE— operator free-text appended to the system prompt and to the normalize-failure feedback (e.g. “never add a new dependency to satisfy a check”).- The repo’s conventions file (
REVIEW_RULES_PATH, default.ai/review-rules.md; can point atAGENTS.md) — read from the checked-out branch and injected into the patch-writing prompt, so the model knows the repo’s rules (e.g. “editmodular_*.py, never the generatedmodeling_*.py”) before it writes the patch, not just after the normalizer rejects it.
Security model
Same trust boundary as the LLM task flow, with stronger isolation for running the repo’s build code:
- No secrets in the sandbox. The command gets a scrubbed env; the docker
container additionally has no network, a read-only rootfs, all capabilities
dropped,
no-new-privileges, and a pids cap. - serge owns the git write. The command only mutates a throwaway worktree; serge commits via the Git Data API. The installation token never enters the container.
- Operator-controlled command and image. Both
TASK_NORMALIZE_COMMANDandTASK_NORMALIZE_IMAGEare set by the operator, never by the caller. - The result is a PR a human reviews before merge.