Why I built a dedicated workspace for AI coding agents
Running one AI coding agent in a terminal is straightforward. Running three or four in parallel on the same codebase is where things start to fall apart. They step on each other's files, you lose track of which agent is waiting for input, and there's no isolation between tasks. I built Canopy to fix that.
Canopy is a review-first workspace: you delegate tasks to agents, each isolated in its own git worktree, and review the results before merging. This post covers the three core architectural decisions behind it: worktrees for file isolation, PTY heuristics for agent state detection, and a multi-process Electron security model. All of it is open source and MIT licensed.
The problem with terminals
AI coding agents like Claude Code, Gemini CLI, Codex, and OpenCode all run in your terminal. That's fine when you're running one at a time. But this isn't a single-tool workflow anymore: the 2025 Stack Overflow survey found that 84% of developers use or plan to use AI tools, JetBrains put that number at 85%, and The Pragmatic Engineer's survey of nearly a thousand engineers found that 70% juggle two to four AI tools simultaneously. Once you're delegating multiple tasks in parallel, terminal tabs don't cut it anymore.
Three agents working on the same repo means three sets of uncommitted changes colliding in the same working directory. You alt-tab between terminals trying to remember which one finished and which one is waiting for approval. There's no overview, no isolation, and no coordination layer.
People are working around this. The common pattern is tmux splits or multiple terminal windows, each running a separate agent in its own git worktree. Addy Osmani described spinning up fresh worktrees so "multiple AI coding sessions in parallel on the same repo" don't interfere. Mike Mason documented the same approach: "git worktrees with 3-4 Claude instances on different tasks, cycling through to check progress." It works, but it's manual. You're running git worktree add by hand, naming branches, remembering which terminal has which task, and as Simon Willison noted, "the natural bottleneck on all of this is how fast I can review the results."
The tooling around this hasn't caught up. VS Code's integrated terminal gives you panels, but those panels share a single working directory. tmux and modern terminals like Warp or Kitty are excellent at managing sessions, but they have no concept of agent state. They can't tell you whether Claude is thinking, waiting for input, or done. You're still eyeballing output streams.
Why Electron (and not something else)
The obvious question. Yes, Electron. Here's my thinking.
Canopy isn't a fork of VS Code. I looked at that path early on, but VS Code is roughly 500,000 lines of TypeScript across four strict dependency layers, with a proprietary DI system wiring together five separate process types. The terminal subsystem alone injects around 30 services (the editor, the extension host, the theme engine, workspace context) just to render a shell. You can't extract it without dragging in the entire workbench.
Under all of that, the terminal is really just a wrapper around xterm.js and node-pty, both standalone npm packages. Canopy uses them directly. No Monaco editor, no extension host, no language server plumbing in between.
There's also the maintenance side. The VS Code team now merges over 100 commits per day into a codebase of hundreds of thousands of lines of TypeScript. Cursor maintains a dedicated team just for rebasing upstream changes, and Windsurf faces the same burden. Both still lagged months behind upstream. In October 2025, OX Security found that both forks were stuck on VS Code 1.99 while upstream had moved to 1.103+, inheriting 94+ known Chromium vulnerabilities in the process. Cursor rebuilt their interface from scratch with Cursor 3, but the underlying editor is still VS Code 1.105.1 with no rebase planned. The fork tax doesn't go away just because you redesign the surface.
xterm.js and node-pty are both battle-tested and run natively in Electron. The broader AI SDK ecosystem (Anthropic, OpenAI, MCP tooling) is JavaScript-first too. Tauri would give you a smaller download (~3-15MB versus ~150MB) and roughly half the idle memory, but rebuilding the entire backend in Rust and losing the JavaScript AI SDK ecosystem is a steep price for those gains.
Tauri uses platform WebViews: WebKit on macOS, WebView2 on Windows, WebKitGTK on Linux. For standard UI that's fine, but for terminal emulation it's a problem. xterm.js relies heavily on Canvas and WebGL APIs that behave differently across these engines. WebKitGTK on Linux is particularly rough for WebGL performance. Consistent cross-platform terminal rendering needs a consistent rendering engine.
The tradeoff: Canopy uses more RAM than a native app (~150-300MB idle) and starts slower than a TUI. For a tool you open once and leave running all day while it manages your agents, that's an acceptable cost. If it's not acceptable for you, that's a legitimate reason to pass on Canopy. Worth noting: this is a workspace with an embedded terminal, not a standalone terminal emulator. Different category than something like Hyper.
That said, Canopy's memory budget gets serious attention. An adaptive resource profile system monitors memory pressure, battery state, and worktree count, then scales polling intervals, WebGL budgets, and hibernation thresholds across all processes. Backgrounded projects get their process priority lowered and polling paused. Terminal scrollback multipliers were tuned to save ~50-60MB across a dozen open terminals. The terminal data path uses adaptive batching with watermark-based backpressure to keep the renderer smooth during heavy output. Each project gets its own WebContentsView with a dedicated V8 context, which dropped project switch time from 500-1500ms to under 16ms.
Worktree isolation
The core isolation mechanism is git worktree. Every worktree in Canopy gets its own isolated git working directory on disk. When you create a worktree through the UI, Canopy runs something equivalent to:
git worktree add -b feature/task-a ../worktrees/task-a mainThis creates a new working directory with its own branch, checked out from main. Agents running in that worktree can only see and modify files in its working directory. Two agents in different worktrees on the same repo literally cannot touch each other's files. The isolation runs deep: each project gets its own dedicated workspace host process with per-view stores backed by dedicated MessagePorts, so there's no cross-project state contamination even at the IPC level.
Git worktrees share the parent repository's object store and full history. The only additional disk cost is the working directory checkout itself. In a 1.7GB repository, ten parallel clones consume 17GB. Ten worktrees consume 6.7GB, and each one is created in under 100ms with zero network I/O. Branches created in any worktree are immediately visible from every other worktree and the main checkout. No extra remotes, no syncing. Worktrees have become the standard isolation primitive for parallel AI agent workflows, used by Claude Code, Augment, and others.
The design isn't without constraints, though. Git forbids checking out the same branch in two worktrees simultaneously, so Canopy enforces unique branches per panel. If you try to create a worktree for a branch that's already checked out elsewhere, it generates a suffixed alternative automatically. The branch-lock constraint is actually useful here: it makes it structurally impossible for two agents to commit to the same branch.
Each worktree also needs its own dependency installation. node_modules, virtual environments, and similar build artifacts aren't shared. Using pnpm or a package manager with a global store helps with disk space, but the install step is unavoidable.
The disk-sharing benefit disappears for repos with submodules. Each worktree gets its own copy of submodule content and needs git submodule update --init independently. For most repos this isn't relevant, but it's worth knowing if you work with large submodule trees.
State detection through the terminal
Some AI coding CLIs now offer structured output modes for headless pipelines. Claude Code has --output-format stream-json, Gemini CLI has --output-format stream-json. But these are designed for non-interactive, machine-to-machine execution. They don't emit lifecycle states like "thinking" or "waiting for input." They assume the agent runs autonomously to completion with tool approvals auto-accepted.
Canopy takes a different approach: it wraps agents in their native interactive mode, preserving the CLI's own UI, approval flows, and multi-turn persistence. You don't need to change how you run your agents or give up manual approval of tool calls. In that interactive context, the only way to detect agent state is to watch the PTY output directly. Canopy does this with a detection pipeline running on a 50ms polling loop: per-agent regex patterns match against terminal output (after stripping ANSI sequences), CPU hysteresis catches silent inference periods where the terminal is quiet but the agent is still working, and prompt detection identifies approval prompts and shell returns across all agents.
Is it perfect? No. It's heuristic. Terminal resize events, spinner animations, and cosmetic redraws can all produce false signals, and Canopy handles each specifically. The whole pipeline eventually grew to 12 standalone subsystems because the original monolith hit 1,573 lines. It's fast enough and wrong rarely enough to be practical.
Process isolation
Running AI agents means running arbitrary code. If a prompt injection tricks an agent into executing a malicious script, the workspace itself becomes an attack surface. This isn't theoretical: in 2025, CVE-2025-53773 demonstrated how prompt injection in source code files could trick Copilot into modifying its own configuration to enable arbitrary shell execution (CVSS 7.8). The AIShellJack study tested 314 attack payloads against AI coding tools and achieved success rates as high as 84% for remote command execution. Canopy's security model is built around containment: preventing a rogue workspace script from breaking out of the sandbox.
Canopy splits work across four process types:
- Main process handles orchestration, git operations, and IPC routing
- PTY host runs as a sandboxed
UtilityProcessthat manages all terminal I/O, isolated from the main process with a 512MB memory limit - Workspace host runs as a separate
UtilityProcessper project for worktree state management - Renderer handles the UI in a fully sandboxed process
Every IPC channel validates the sender's URL against a trusted origin allowlist before any handler fires. This is enforced globally, not per-handler, so a new IPC channel can't accidentally skip validation. Security-sensitive IPC payloads are validated with Zod schemas at the boundary. An environment variable filter prevents credentials from leaking into spawned PTY processes, and the git wrapper is hardened against malicious repo config to block RCE via core.fsmonitor and similar vectors.
Electron Fuses lock down the binary itself:
runAsNode: falseprevents the signed binary from being repurposed as a general Node.js interpreter viaELECTRON_RUN_AS_NODE=1onlyLoadAppFromAsar: truecombined withenableEmbeddedAsarIntegrityValidation: truemeans the app can only load from a signed, integrity-checked ASAR archive. You can't swap in a modifiedapp/folderenableNodeOptionsEnvironmentVariable: falseblocks debug attachment via environment variables
What Canopy is not
Canopy is not a traditional IDE. There's no code editor, no file tree, no IntelliSense. You can think of it as an Integrated Delegation Environment: you delegate tasks to agents, review what they produce, and decide what ships.
The natural follow-up: why not a VS Code extension? VS Code's Extension API is deliberately restrictive. Extensions can't manage terminal processes at the PTY level, can't orchestrate worktrees across panels, and can't alter core UI. Canopy sits alongside your editor, not inside it. The same way tmux doesn't compete with vim.
It's not an orchestration framework either. There's no agent-to-agent messaging, no task dependency graphs, no automatic retry if an agent task fails. You delegate tasks, monitor progress, review the output, and decide what to merge. The human stays in the loop.
The real bottleneck isn't agent throughput. It's your capacity to review and merge what they produce. Google's DORA 2024 report found that for every 25% increase in AI coding adoption, delivery stability dropped 7.2%, because review queues absorb the speed gains. Faros AI's study of 10,000+ developers confirmed it: high-AI-adoption teams merge 98% more PRs, but review time spikes 91% and organizational delivery metrics stay flat. LinearB's analysis of 8.1 million pull requests found that AI-generated PRs have a 32.7% acceptance rate versus 84.4% for human-written code. Generation is cheap. Review is the bottleneck.
Running four agents in parallel doesn't mean four times the output if you can't review four diffs simultaneously. What Canopy does is keep the agents organized and isolated so the review process isn't also fighting file conflicts and lost context.
And it's not tied to any single agent. Claude Code, Gemini CLI, Codex, OpenCode, or whatever ships next month. LLM capabilities leapfrog every few months. Any CLI tool that runs in a terminal works in Canopy without configuration or API keys. The workspace layer should outlive any individual model or provider.
What's next
Canopy is still early. The state detection heuristics need tuning for edge cases. The review workflow could be smoother. Cross-worktree diffs work but the UI is minimal.
What's on the roadmap: better notification controls for agent state changes and a more polished diff review experience. The full source is on GitHub, and the documentation covers everything from installation to advanced worktree management.
If you're already running AI coding agents in terminal tabs and want something purpose-built, give Canopy a go. It's free, MIT licensed, no account required, and designed to stay out of your way.