LLM Providers
serge talks to OpenAI-compatible chat completion endpoints. The
endpoint should support:
POST {base}/chat/completions
GET {base}/models
The reviewer asks for JSON output during full reviews. If a provider ignores JSON mode, the reviewer attempts to extract JSON from the returned text.
Common Bases
| Provider | Base URL |
|---|---|
| OpenAI | https://api.openai.com/v1 |
| Hugging Face Router | https://router.huggingface.co/v1 |
| Local vLLM/TGI/LM Studio | your local /v1 endpoint |
| Custom | any compatible endpoint |
The web app has built-in provider choices for Hugging Face, OpenAI, Anthropic, and custom endpoints. Custom provider configs must include an API base URL.
Model Selection
Set LLM_MODEL or the Action input llm_model to choose a model explicitly.
If omitted, the reviewer asks the endpoint for /models and uses the first
returned model.
In the web app, model selection follows this order:
- model entered on the review form;
- provider config default model;
- provider-specific static default, if any;
- provider auto-discovery.
Streaming
LLM_STREAM=true consumes streaming SSE responses. Streaming is useful for the
web app because the UI can show tokens, reasoning chunks, tools, and progress
live.
The Action defaults streaming off; server env defaults streaming on.
Reasoning Models
For models that spend completion tokens on reasoning before emitting JSON,
increase LLM_MAX_TOKENS. If a provider supports a reasoning_effort field,
set LLM_REASONING_EFFORT to pass it through.
Hugging Face Billing Header
LLM_BILL_TO sends X-HF-Bill-To for Hugging Face Router requests. Use it
when your Hugging Face token’s Inference Providers permission is scoped to an
organization.