Local Fleet Setup
This tutorial walks through setting up PanCode with local inference engines across multiple machines. By the end, you will have a working multi-machine fleet with Ollama, LM Studio, and llama.cpp.
Prerequisites
Section titled “Prerequisites”- PanCode installed (Installation Guide)
- At least one local inference engine installed
- Network access between machines (if using multiple)
Step 1: Install Local Engines
Section titled “Step 1: Install Local Engines”Install one or more of these inference engines on your machines.
Ollama
Section titled “Ollama”# Install Ollamacurl -fsSL https://ollama.com/install.sh | sh
# Pull a modelollama pull llama3.2
# Ollama serves on port 11434 by defaultollama serveLM Studio
Section titled “LM Studio”Download from lmstudio.ai. Load a model and start the local server. LM Studio serves on port 1234 by default and provides an OpenAI-compatible API.
llama.cpp (llama-server)
Section titled “llama.cpp (llama-server)”# Build llama.cppgit clone https://github.com/ggerganov/llama.cppcd llama.cpp && make
# Start the server with a model./llama-server -m models/codestral.gguf --port 8080llama-server provides a REST API on port 8080 by default.
Step 2: Configure Machines
Section titled “Step 2: Configure Machines”If your engines run on machines other than localhost, set PANCODE_LOCAL_MACHINES in your .env file:
PANCODE_LOCAL_MACHINES=mini=192.168.86.141,dynamo=192.168.86.143Format: name=address pairs separated by commas. PanCode always scans localhost in addition to any machines listed here.
With 3 machines and 3 engine types, PanCode probes 9 endpoints at boot.
Step 3: Configure Models
Section titled “Step 3: Configure Models”Set model environment variables in your .env file:
PANCODE_MODEL=localhost-ollama/llama3.2PANCODE_WORKER_MODEL=dynamo-ollama/codellamaPANCODE_SCOUT_MODEL=localhost-ollama/llama3.2Model references use the format provider/model-id where the provider is machine-engine.
Choosing Models by Role
Section titled “Choosing Models by Role”| Role | Recommended Characteristics |
|---|---|
Orchestrator (PANCODE_MODEL) | Strong reasoning, large context window |
Worker (PANCODE_WORKER_MODEL) | Good at code generation, moderate context |
Scout (PANCODE_SCOUT_MODEL) | Fast, small, good at following simple instructions |
Step 4: Create a Preset
Section titled “Step 4: Create a Preset”Edit ~/.pancode/panpresets.yaml to define a preset for your fleet:
local: description: "Local inference via homelab engines" model: localhost-ollama/llama3.2 workerModel: dynamo-ollama/codellama scoutModel: localhost-ollama/llama3.2 reasoning: medium safety: auto-edit
local-max: description: "Local fleet with high reasoning" model: mini-lmstudio/qwen2.5-coder-32b workerModel: dynamo-ollama/codellama scoutModel: localhost-ollama/llama3.2 reasoning: high safety: auto-editPanCode seeds this file on first run. After that, it never overwrites your edits.
Step 5: Boot PanCode
Section titled “Step 5: Boot PanCode”pancode --preset localExpected output:
[pancode] Preset: local (Local inference via homelab engines)Starting PanCode session "pancode-a3f2b1"...PanCode enters the interactive shell after boot completes.
Step 6: Verify the Fleet
Section titled “Step 6: Verify the Fleet”Check Health
Section titled “Check Health”/doctorExpected output:
Health Report: 7 pass, 0 warn, 0 fail
[OK] runtime-dir Runtime directory exists and is writable [OK] orphan-workers No active worker processes [OK] stale-runs No stale runs detected [OK] provider-health 3 provider(s) tracked, all healthy or degraded ...Check Models
Section titled “Check Models”/modelsLists all discovered models with their provider, model ID, and capabilities.
Check Runtimes
Section titled “Check Runtimes”/runtimesShows registered runtimes (Pi native plus any discovered CLI agents).
Check Boot Performance
Section titled “Check Boot Performance”/perfDisplays timing for each boot phase. Phases exceeding 500ms are flagged.
Step 7: Run a Dispatch
Section titled “Step 7: Run a Dispatch”Switch to Build mode (Shift+Tab until you see Build), then ask PanCode to perform a task:
You: "Dispatch the scout agent to explore the project structure and report the main directories"PanCode dispatches a worker subprocess running the scout agent. The worker explores the codebase and returns findings to the orchestrator.
Monitor the dispatch:
/runs # View dispatch history/metrics # View aggregate statistics/cost # View per-run cost breakdownUnderstanding Discovery
Section titled “Understanding Discovery”Cold Boot
Section titled “Cold Boot”On the first run or with --rediscover, PanCode performs cold discovery:
- Probes each machine + engine combination (tiered timeouts: 500ms, then 1000ms)
- Lists models from responding endpoints
- Matches models against the knowledge base for capability metadata
- Writes results to
~/.pancode/model-cache.yamland~/.pancode/panproviders.yaml
Average cold boot: ~1150ms (varies with number of machines and endpoints).
Warm Boot
Section titled “Warm Boot”On subsequent boots, PanCode reads from ~/.pancode/model-cache.yaml with zero network I/O.
Average warm boot: ~120ms.
After the shell is interactive, PanCode runs a background refresh. If the model count changes, a message appears on stderr. Changes take effect on the next boot.
Forcing Rediscovery
Section titled “Forcing Rediscovery”pancode --rediscoverIgnores the cache and performs full cold discovery. Useful after adding a new engine or machine.
Troubleshooting
Section titled “Troubleshooting”Engine Not Discovered
Section titled “Engine Not Discovered”- Verify the engine is running:
curl http://localhost:11434/api/tags(Ollama) - Check the port matches the expected default
- For remote machines, verify network connectivity:
ping 192.168.86.141 - Run
pancode --rediscoverto force a fresh scan
Model Not Available
Section titled “Model Not Available”- Verify the model is loaded in the engine:
ollama list - Check
/modelsinside PanCode for the full model list - Model references are case-sensitive and must match exactly
Slow Boot
Section titled “Slow Boot”If boot exceeds the 3-second budget:
- Run
/perfto identify the slow phase - Phase 3 (discovery) is usually the bottleneck on cold boot
- Reduce the number of dead endpoints (machines that are off)
- Use warm boot (do not pass
--rediscoverunless needed)
See Also
Section titled “See Also”- Providers Guide: Provider architecture and model resolution
- Configuration Guide: Environment variables and presets
- Multi-Agent Dispatch Tutorial: Dispatch patterns