Developer Documentation
Who this is for: contributors and advanced users looking for development-oriented docs.
Areas
- Architecture.
- Runtime internals.
- Evaluation.
- Tracing.
- Drivers.
- Extension points.
- Callback and interaction routing.
LLM Usage Accounting
Evaluator LLM accounting is trace-driven.
- Drivers normalize provider-specific usage at the model boundary into
input_tokens,input_cached_tokens,output_tokens,output_cached_tokens, andtotal_tokens. llm.responseevents carry both normalizedusageand provider-nativeraw_usage.runtime.agent_finishedpublishesusage_selfandusage_inclusive.runtime.session_finishedpublishesusage_session_totals.- Evaluator backend state aggregates from trace once, then reuses that summary for websocket results, API responses, CLI output, and the web UI.
Semantics:
selfmeans tokens spent by that specific agent run only.inclusivemeans the agent run plus all descendant sub-agent runs.sessionmeans the full run session and is available immediately after execution, independent of scoring.output_cached_tokensremains0until a provider exposes a real output-cache field.
Model Validation
Structured model-output validation is extensible at runtime.
- Driver parsing still strictly enforces the one-JSON-object contract for structured modes.
AgentHostowns a model validation chain so error rewriting and extra validation rules live outsideagent.py.- Exception validators can rewrite model-call failures into clearer messages while preserving strict failure behavior.
- Response validators can reject parsed
ModelResponseobjects beforeAgentDecisionconversion.
The default exception validator improves the common JSONDecodeError: Extra data case into a clearer explanation: the model returned more than one JSON value in a single structured response, which usually means it emitted multiple decisions in one turn.
Runtime extension points:
host.register_model_exception_validator(...)host.register_model_response_validator(...)
These validators are code-level extension points, not config-driven plugins.
Programmatic Workflow Execution
The framework now supports deterministic, code-driven orchestration without entering the parent agent’s LLM decision loop.
The public surface is agent-owned:
Agent.execute_programmatic_workflow(...)ProgrammaticWorkflowWorkflowCallSubagentStepWorkflowCallSubagentsStepWorkflowBranchStepWorkflowReturnStepWorkflowRaiseStep
The important design point is parity with native orchestration. Programmatic workflow steps reuse the same parent-owned subagent path as call_subagent and call_subagents, so they still produce:
- parent-side
runtime.audit.named_eventrecords such assubagent_call,subagent_result,subagent_batch_started, andsubagent_batch_finished - parent hook history around single-child calls
- transcript updates like
<subagent_call>,<subagent_result>, and<subagent_results> - the same callback routing and batch resume behavior already implemented in
host.call_subagent(...)andhost.call_subagent_batch(...)
The first iteration is intentionally small and extensible:
- branching is Python-driven, using callables against
ProgrammaticWorkflowState - workflow return values are normalized into
AgentResult - step outputs are stored in
ProgrammaticWorkflowState.step_results - there is no separate expression language or persistence format yet
This makes AgentBehavior.before_run(...) a supported place to run deterministic controller logic while still preserving the framework’s native trace and callback semantics.
For the full developer guide, examples, and authoring guidance, see Programmatic Workflow Agents.
Evaluator Agent Model Overrides
agent_framework_evaluator now supports run-scoped model overrides for the agent under test. This is separate from DEFAULT_EVAL_MODEL, which still controls the evaluator/scoring LLM.
Two scopes are supported:
root_only— only the tested/top-level agent uses the selected override modelall_agents— every agent invoked during that run uses the selected override model
The important runtime detail is where the override is applied:
root_onlyis applied to the root invocation clone inAgentHost.run_agent(...), so cached agent definitions are not mutated for later runsall_agentsis applied at agent-load time through the host/registry path, so it supersedes.envDEFAULT_MODEL,.envAGENT_MODELS, and adjacent runtime.jsonmodeldeclarations for that host instance
Evaluator surfaces:
- Web UI: free-text model field with completions from
.envDEFAULT_MODEL, left empty by default - CLI:
--agent-model-overrideand--agent-model-override-scope {root_only,all_agents} - Initializers:
DEFAULT_AGENT_MODEL_OVERRIDE/get_default_agent_model_override()andDEFAULT_AGENT_MODEL_OVERRIDE_SCOPE/get_default_agent_model_override_scope()