Voice Input
Speech-to-text dictation with real-time transcription and optional AI-powered text correction, integrated into every agent's input bar.
Overview
Voice Input lets you dictate commands and prompts directly into any agent's hybrid input bar. Audio is streamed live to Deepgram for real-time transcription, with text appearing in the input bar as you speak. An optional AI correction layer (powered by OpenAI) can clean up the raw transcript, fixing punctuation, technical terms, and filler words before you send the command.
Two things are required to get started: a Deepgram API key for transcription, and microphone permission at the OS level. AI text correction is optional and requires a separate OpenAI API key.
Prerequisites
- Deepgram API key for speech-to-text transcription. Sign up at console.deepgram.com and create an API key. Transcription costs depend on the model: Nova-3 runs at $0.0077/min, Nova-2 at $0.0043/min.
- Microphone access granted at the OS level (macOS, Windows, or Linux).
- OpenAI API key (optional) for AI text correction and file reference resolution. Only needed if you want the AI correction layer on top of the raw transcription.
Setup
Enable Voice Input
- Open Settings > Integrations.
- In the Speech-to-Text section, toggle Voice Input on.
- Enter your Deepgram API key (starts with
dg_). The key is validated when you save. - Choose your language from the 10 supported options: English, Spanish, French, German, Japanese, Chinese, Korean, Portuguese, Italian, and Russian.
- Pick a transcription model. Nova-3 (default) is the latest and most accurate. Nova-2 is a stable, lower-cost alternative.
- Select a paragraph break strategy. Spoken commands (default, English only) lets you say "new paragraph" to insert paragraph breaks. Manual Enter requires you to press Enter instead. Non-English languages fall back to Manual Enter automatically.
Grant Microphone Permission
Canopy needs microphone access from your operating system before it can capture audio. The settings panel shows your current permission status and provides instructions specific to your OS.
Open System Settings > Privacy & Security > Microphone and enable Canopy. If permission hasn't been requested yet, Canopy shows a Request button that triggers the system dialog.
If permission was previously denied, click Open System Settings in the Canopy settings panel to go directly to the Microphone privacy settings. Toggle Canopy on, then click Re-check back in Canopy.
Open Windows Settings > Privacy & security > Microphone and make sure desktop app access is allowed. Canopy shows a Request button if permission hasn't been determined yet.
If access was denied, click Open System Settings in the Canopy settings panel to jump to the Microphone privacy page. Enable access, then click Re-check in Canopy.
Grant microphone access through your system audio settings. Linux doesn't support the in-app Request button, so you'll need to configure permissions through your desktop environment or audio manager (e.g. PipeWire, PulseAudio).
Once permissions are granted externally, click Re-check in the Canopy settings panel to confirm access.
Starting and Stopping Dictation
There are three ways to start voice dictation:
- Mic button in the hybrid input bar. Each agent panel has its own mic button. Click it to start recording for that panel.
- Keyboard shortcut: Cmd+Shift+V on macOS, Ctrl+Shift+V on Windows/Linux. Toggles voice dictation when the input bar does not have focus. (When the input bar is focused, the same shortcut pastes clipboard content as plain text instead.)
- Action palette: search for "Toggle Voice Dictation" in the action palette.
While recording, the mic button shows an animated ring that pulses with your audio level. A square stop icon replaces the mic icon. Click it again, press the shortcut, or press Enter to stop recording.
Text appears in the input bar as you speak. Interim (in-progress) transcription shows at reduced opacity, then solidifies once Deepgram confirms the final text. The session goes through five states: idle, connecting (up to 10 seconds), recording, finishing (up to 3 seconds to drain final text), and then back to idle.
Toolbar Indicator
When a recording session is active, a mic icon appears in the global toolbar with a pulsing accent dot and elapsed time in M:SS format. The tooltip shows which project and worktree is recording. Clicking the toolbar indicator focuses the panel where recording is happening, but it doesn't stop the session.
Submitting While Recording
Pressing Enter while recording or while AI corrections are still in flight triggers the wait-before-submit flow:
- Canopy stops the recording session.
- A spinner overlay appears on the input bar, which becomes read-only.
- Canopy waits up to 10 seconds for any pending AI corrections to settle.
- Once all corrections are complete (or the timeout is reached), the text is submitted automatically.
Press Escape to cancel the wait and keep the text in the input bar for further editing.
Paragraph Breaks
Spoken Commands (Default, English Only)
With the spoken-command strategy selected, you can say formatting commands while dictating:
- "New paragraph", "next paragraph", or "start a new paragraph" inserts a blank line (
\n\n). - "New line" or "line break" inserts a single newline (
\n).
These commands are handled by Deepgram's Dictation mode and stripped from the transcript. You can also press Enter to commit the current paragraph manually at any time.
Manual Enter
With the manual strategy, paragraph breaks are inserted by pressing Enter only. Spoken formatting commands are disabled. Choose this if you dictate in a non-English language or prefer explicit control over paragraph breaks.
AI Text Correction
The AI correction layer reviews the raw transcription and fixes common issues: punctuation, filler words (um, uh), technical term spelling, and homophones. It uses a confidence-based system where Deepgram's per-word confidence scores determine how much AI intervention is needed.
- Words with confidence below 0.8 are flagged as uncertain and prioritised for correction.
- If all words in a segment have confidence above 0.85, the AI call is skipped entirely. Good transcription doesn't need fixing.
- Text currently being corrected shows a green dotted underline in the input bar, then resolves in place once the correction arrives.
Enable AI Correction
- In Settings > Integrations, scroll to the AI Text Correction section (visible once Voice Input is enabled).
- Toggle AI Text Correction on.
- Enter your OpenAI API key (starts with
sk-). - Choose a correction model. GPT-5 Mini (recommended) applies paragraph-level correction with higher quality. GPT-5 Nano is faster and lower cost, better suited for lower-latency correction. This is a single model selection that applies to all correction passes.
Custom Instructions
You can add project-specific correction rules in the Custom Instructions textarea. These are appended to the core correction prompt, so the AI applies them alongside its built-in rules. For example, you might write: "Always capitalise ProductName as one word" or "React component names should use PascalCase."
The Inspect core prompt toggle in settings lets you view the full base correction prompt (read-only). Your project name and custom dictionary terms are injected into this prompt automatically.
File Reference Resolution
When AI correction is enabled, Canopy can detect spoken file references and resolve them into @file links. Use natural phrases while dictating:
- "Link to the auth helper"
- "At file the button component"
- "Reference the user model"
- "Add file the config service"
- "Open the main layout"
Canopy's correction prompt detects these patterns and sends them to a file resolver, which searches the project file tree and uses AI to pick the best match. The resolved path appears as an @path/to/file reference that renders as a clickable file chip in the input bar.
If resolution fails, the text falls back to @?description so you can see what was being looked up and fix it manually.
This feature requires AI correction to be enabled and is toggled via Resolve file references in settings (on by default when your OpenAI key is configured).
Custom Dictionary
The custom dictionary lets you add up to 100 domain-specific terms that Canopy sends to Deepgram as recognition hints. These terms boost transcription accuracy for project names, product names, technical vocabulary, and abbreviations that the base model might not recognize.
Add terms in Settings > Integrations under the Speech-to-Text section. Each term appears as a removable pill. These same terms are also injected into the AI correction prompt, where they're treated as highest-priority required substitutions.
Your project name is automatically included in the recognition hints, so there's no need to add it manually.
Settings Reference
All voice input settings are in Settings > Integrations.
Speech-to-Text
| Setting | Values / Notes |
|---|---|
| Voice Input | Enabled / Disabled (default: Disabled) |
| Deepgram API Key | dg_... prefix, validated on save |
| Language | English, Spanish, French, German, Japanese, Chinese, Korean, Portuguese, Italian, Russian |
| Transcription Model | Nova-3 ($0.0077/min, default) or Nova-2 ($0.0043/min) |
| Paragraph Breaks | Spoken commands (default, English only) or Manual Enter |
| Custom Dictionary | Up to 100 terms, sent as Deepgram keyterm hints |
AI Text Correction
This section is only visible when Voice Input is enabled.
| Setting | Values / Notes |
|---|---|
| AI Text Correction | Enabled / Disabled (default: Disabled) |
| OpenAI API Key | sk-... prefix, validated on save |
| Correction Model | GPT-5 Mini (recommended, full paragraph) or GPT-5 Nano (faster, lower cost) |
| Resolve File References | Enabled by default when OpenAI key is configured |
| Custom Instructions | Free-form textarea for project-specific correction rules |
Keyboard Shortcuts
| Action | Shortcut |
|---|---|
| Toggle voice dictation (global) | Cmd+Shift+V / Ctrl+Shift+V |
| Paste as plain text (when input bar has focus) | Cmd+Shift+V / Ctrl+Shift+V |
| Commit paragraph / submit | Enter |
| Cancel voice wait-submit | Escape |
The Cmd+Shift+V shortcut is context-sensitive. It toggles voice dictation when the input bar does not have focus, and pastes clipboard content as plain text when it does. See Keyboard Shortcuts for the full reference.
Troubleshooting
Mic button is hidden
The microphone button only appears when Voice Input is fully configured. Check that:
- Voice Input is enabled in Settings > Integrations.
- A valid Deepgram API key has been entered.
- Microphone permission is granted at the OS level.
"Connection timed out"
Canopy couldn't reach Deepgram within 10 seconds. Verify your internet connection and check that your Deepgram API key is valid and has available credits.
"Invalid API key"
Shown when saving settings if the API key format is incorrect or the key is no longer valid. Re-enter a valid key (Deepgram keys start with dg_, OpenAI keys start with sk-).
Spoken paragraph commands aren't working
Spoken commands (like "new paragraph") only work when:
- The language is set to English.
- The paragraph break strategy is set to Spoken commands.
AI correction not running
Check that AI Text Correction is toggled on and a valid OpenAI API key is entered. If transcription confidence is consistently high (all words above 0.85), the correction call is intentionally skipped because the raw transcription doesn't need fixing.
File references showing as @?
The @?description format means file resolution failed for that reference. This can happen if the project file tree doesn't contain a close match, or if the description was too vague for the AI to resolve. Try using more specific descriptions when dictating file references.