Coding Agents

Coding agent targets evaluate AI coding assistants and CLI-based agents. These targets require a judge_target to run LLM-based evaluators.

Prompt format

Agent providers receive a structured prompt document with two sections: a preread block listing files the agent must read, and the user query containing the eval input.

File handling

When an eval test includes type: file inputs, agent providers do not receive the file content inline. Instead, they receive:

A preread block with file:// URIs pointing to absolute paths on disk
The user query with <file: path="..."> reference tags

The agent is expected to read the files itself using its filesystem tools.

This differs from LLM providers, which receive file content embedded directly in the prompt as XML:

<file path="src/example.ts">
// file content is inlined here
</file>

Example prompt

Given an eval with a guideline file and a file input:

input:
  - role: user
    content:
      - type: file
        value: ./src/example.ts
      - type: text
        value: Review this code

The agent receives a prompt like:

Read all guideline files:
* [guidelines.md](file:///abs/path/guidelines.md).

Read all input files:
* [example.ts](file:///abs/path/src/example.ts).

If any file is missing, fail with ERROR: missing-file <filename> and stop.
Then apply system_instructions on the user query below.

[[ ## user_query ## ]]
<file: path="./src/example.ts">
Review this code

The preread block instructs the agent to read both guideline and input files before processing the query. If a system_prompt is configured on the target, it is passed separately via the provider SDK (not in the prompt document).

Claude

targets:
  - name: claude_agent
    provider: claude
    workspace_template: ./workspace-templates/my-project
    judge_target: azure-base

Field	Required	Description
`workspace_template`	No	Path to workspace template directory
`cwd`	No	Working directory (mutually exclusive with workspace_template)
`judge_target`	Yes	LLM target for evaluation

Codex CLI

targets:
  - name: codex_target
    provider: codex
    workspace_template: ./workspace-templates/my-project
    judge_target: azure-base

Field	Required	Description
`workspace_template`	No	Path to workspace template directory
`cwd`	No	Working directory (mutually exclusive with workspace_template)
`judge_target`	Yes	LLM target for evaluation

Copilot CLI

targets:
  - name: copilot
    provider: copilot
    model: gpt-5-mini
    judge_target: azure-base

Field	Required	Description
`model`	No	Model to use (defaults to copilot’s default)
`workspace_template`	No	Path to workspace template directory
`cwd`	No	Working directory (mutually exclusive with workspace_template)
`judge_target`	Yes	LLM target for evaluation

Pi Coding Agent

targets:
  - name: pi_target
    provider: pi-coding-agent
    workspace_template: ./workspace-templates/my-project
    judge_target: azure-base

Field	Required	Description
`workspace_template`	No	Path to workspace template directory
`cwd`	No	Working directory (mutually exclusive with workspace_template)
`judge_target`	Yes	LLM target for evaluation

VS Code

targets:
  - name: vscode_dev
    provider: vscode
    workspace_template: ${{ WORKSPACE_PATH }}
    judge_target: azure-base

Field	Required	Description
`executable`	No	Path to VS Code binary. Supports `${{ ENV_VAR }}` syntax or literal paths. Defaults to `code` (or `code-insiders` for the insiders provider).
`workspace_template`	No	Path to workspace template directory or `.code-workspace` file
`judge_target`	Yes	LLM target for evaluation

Using a custom executable path:

targets:
  - name: vscode_dev
    provider: vscode
    executable: ${{ VSCODE_CMD }}
    workspace_template: ${{ WORKSPACE_PATH }}
    judge_target: azure-base

VS Code Insiders

targets:
  - name: vscode_insiders
    provider: vscode-insiders
    workspace_template: ${{ WORKSPACE_PATH }}
    judge_target: azure-base

Same configuration as VS Code.

Custom CLI Agent

Evaluate any command-line agent:

targets:
  - name: local_agent
    provider: cli
    command: 'python agent.py --prompt-file {PROMPT_FILE} --output {OUTPUT_FILE}'
    workspace_template: ./workspace-templates/my-project
    judge_target: azure-base

Field	Required	Description
`command`	Yes	Command to run. `{PROMPT}` is inline prompt text and `{PROMPT_FILE}` is a temp file path containing the prompt.
`workspace_template`	No	Path to workspace template directory
`cwd`	No	Working directory (mutually exclusive with workspace_template)
`judge_target`	Yes	LLM target for evaluation

Mock Provider

For testing the evaluation harness without calling real providers:

targets:
  - name: mock_target
    provider: mock

Known limitations

VS Code

The VS Code provider uses a subagent file-messaging architecture. AgentV provisions pre-configured VS Code workspace directories (subagents), dispatches requests by writing prompt files, and the AI agent writes its response to a file. Lock files control concurrency.

Per-target worker limit: VS Code evals run with 1 worker per target because the provider requires window focus to dispatch requests. When multiple targets are configured (e.g., vscode + copilot), they run concurrently — the single-worker limit only applies within each VS Code target. Subagents are provisioned automatically if needed.
Windows only: VS Code is not available on Linux CI. E2E testing must be done on a Windows machine.
.code-workspace support: When your eval uses workspace.template with a .code-workspace file, the template folders are opened in the VS Code window alongside the subagent directory.

Copilot CLI

MCP OAuth token expiration: If your copilot CLI has MCP servers configured that use OAuth authentication, expired tokens will block eval execution. The copilot CLI attempts to re-authenticate via a browser OAuth flow, which cannot complete in non-interactive mode and causes the eval to hang indefinitely. Before running evals, either re-authenticate your MCP servers manually (copilot → /mcp) or remove MCP servers with expired tokens. See copilot-cli#1797 and copilot-cli#1491 for upstream tracking.
Windows shell shim vs process spawn: On Windows, copilot -h may work in PowerShell while AgentV still fails with spawn copilot ENOENT. Shell commands can execute copilot.ps1/copilot.bat, but AgentV launches a subprocess that expects a directly spawnable executable path. If this occurs, set an explicit target executable (for example via env var):

targets:
  - name: copilot
    provider: copilot
    executable: ${{ COPILOT_EXE }}
    judge_target: azure-base

Use a native binary path for COPILOT_EXE (for example copilot.exe from @github/copilot-win32-x64).

Claude Code

Run evals externally: Run agentv evals from outside Claude Code. Running agentv eval with the claude target from within a Claude Code session can cause unintended behavior — the spawned Claude agent may interfere with the parent session.
ANTHROPIC_API_KEY overrides subscription auth: Claude Code loads .env from the working directory on startup. If your .env contains ANTHROPIC_API_KEY, the spawned Claude Code process will use that API key instead of your Claude subscription (Max/Pro). If the API key has insufficient credits, evals will fail with “Credit balance is too low”. To use subscription auth, remove ANTHROPIC_API_KEY from your .env file.