Skip to main content

Voice Input

Speech-to-text dictation with real-time transcription and optional AI-powered text correction, integrated into every agent's input bar.

Updated
Reviewed

Overview

Voice Input lets you dictate commands and prompts directly into any agent's hybrid input bar. Audio is streamed live to Deepgram for real-time transcription, with text appearing in the input bar as you speak. An optional AI correction layer (powered by OpenAI) can clean up the raw transcript, fixing punctuation, technical terms, and filler words before you send the command.

Two things are required to get started: a Deepgram API key for transcription, and microphone permission at the OS level. AI text correction is optional and requires a separate OpenAI API key.

Note
Voice Input is disabled by default. Enable it in Settings > Integrations under the Speech-to-Text section.

Prerequisites

  • Deepgram API key for speech-to-text transcription. Sign up at console.deepgram.com and create an API key. Transcription costs depend on the model: Nova-3 runs at $0.0077/min, Nova-2 at $0.0043/min.
  • Microphone access granted at the OS level (macOS, Windows, or Linux).
  • OpenAI API key (optional) for AI text correction and file reference resolution. Only needed if you want the AI correction layer on top of the raw transcription.

Setup

Enable Voice Input

  1. Open Settings > Integrations.
  2. In the Speech-to-Text section, toggle Voice Input on.
  3. Enter your Deepgram API key (starts with dg_). The key is validated when you save.
  4. Choose your language from the 10 supported options: English, Spanish, French, German, Japanese, Chinese, Korean, Portuguese, Italian, and Russian.
  5. Pick a transcription model. Nova-3 (default) is the latest and most accurate. Nova-2 is a stable, lower-cost alternative.
  6. Select a paragraph break strategy. Spoken commands (default, English only) lets you say "new paragraph" to insert paragraph breaks. Manual Enter requires you to press Enter instead. Non-English languages fall back to Manual Enter automatically.

Grant Microphone Permission

Canopy needs microphone access from your operating system before it can capture audio. The settings panel shows your current permission status and provides instructions specific to your OS.

Open System Settings > Privacy & Security > Microphone and enable Canopy. If permission hasn't been requested yet, Canopy shows a Request button that triggers the system dialog.

If permission was previously denied, click Open System Settings in the Canopy settings panel to go directly to the Microphone privacy settings. Toggle Canopy on, then click Re-check back in Canopy.

Starting and Stopping Dictation

There are three ways to start voice dictation:

  • Mic button in the hybrid input bar. Each agent panel has its own mic button. Click it to start recording for that panel.
  • Keyboard shortcut: Cmd+Shift+V on macOS, Ctrl+Shift+V on Windows/Linux. Toggles voice dictation when the input bar does not have focus. (When the input bar is focused, the same shortcut pastes clipboard content as plain text instead.)
  • Action palette: search for "Toggle Voice Dictation" in the action palette.

While recording, the mic button shows an animated ring that pulses with your audio level. A square stop icon replaces the mic icon. Click it again, press the shortcut, or press Enter to stop recording.

Text appears in the input bar as you speak. Interim (in-progress) transcription shows at reduced opacity, then solidifies once Deepgram confirms the final text. The session goes through five states: idle, connecting (up to 10 seconds), recording, finishing (up to 3 seconds to drain final text), and then back to idle.

Toolbar Indicator

When a recording session is active, a mic icon appears in the global toolbar with a pulsing accent dot and elapsed time in M:SS format. The tooltip shows which project and worktree is recording. Clicking the toolbar indicator focuses the panel where recording is happening, but it doesn't stop the session.

Tip
The Cmd+Shift+V shortcut is context-sensitive. When the input bar has focus, this same shortcut pastes clipboard content as plain text instead of toggling voice. To start recording from a focused input bar, use the mic button directly.

Submitting While Recording

Pressing Enter while recording or while AI corrections are still in flight triggers the wait-before-submit flow:

  1. Canopy stops the recording session.
  2. A spinner overlay appears on the input bar, which becomes read-only.
  3. Canopy waits up to 10 seconds for any pending AI corrections to settle.
  4. Once all corrections are complete (or the timeout is reached), the text is submitted automatically.

Press Escape to cancel the wait and keep the text in the input bar for further editing.

Note
If corrections don't settle within 10 seconds, Canopy submits the text anyway. This safety valve ensures that pressing Enter never blocks indefinitely.

Paragraph Breaks

Spoken Commands (Default, English Only)

With the spoken-command strategy selected, you can say formatting commands while dictating:

  • "New paragraph", "next paragraph", or "start a new paragraph" inserts a blank line (\n\n).
  • "New line" or "line break" inserts a single newline (\n).

These commands are handled by Deepgram's Dictation mode and stripped from the transcript. You can also press Enter to commit the current paragraph manually at any time.

Note
Spoken paragraph commands only work in English. For all other languages, the strategy automatically falls back to manual Enter, regardless of what's selected in settings.

Manual Enter

With the manual strategy, paragraph breaks are inserted by pressing Enter only. Spoken formatting commands are disabled. Choose this if you dictate in a non-English language or prefer explicit control over paragraph breaks.

AI Text Correction

The AI correction layer reviews the raw transcription and fixes common issues: punctuation, filler words (um, uh), technical term spelling, and homophones. It uses a confidence-based system where Deepgram's per-word confidence scores determine how much AI intervention is needed.

  • Words with confidence below 0.8 are flagged as uncertain and prioritised for correction.
  • If all words in a segment have confidence above 0.85, the AI call is skipped entirely. Good transcription doesn't need fixing.
  • Text currently being corrected shows a green dotted underline in the input bar, then resolves in place once the correction arrives.
Note
Prompt caching keeps AI correction costs minimal. The system prompt is structured so that only the user message changes per request, allowing the provider to cache the fixed portion.

Enable AI Correction

  1. In Settings > Integrations, scroll to the AI Text Correction section (visible once Voice Input is enabled).
  2. Toggle AI Text Correction on.
  3. Enter your OpenAI API key (starts with sk-).
  4. Choose a correction model. GPT-5 Mini (recommended) applies paragraph-level correction with higher quality. GPT-5 Nano is faster and lower cost, better suited for lower-latency correction. This is a single model selection that applies to all correction passes.

Custom Instructions

You can add project-specific correction rules in the Custom Instructions textarea. These are appended to the core correction prompt, so the AI applies them alongside its built-in rules. For example, you might write: "Always capitalise ProductName as one word" or "React component names should use PascalCase."

The Inspect core prompt toggle in settings lets you view the full base correction prompt (read-only). Your project name and custom dictionary terms are injected into this prompt automatically.

File Reference Resolution

When AI correction is enabled, Canopy can detect spoken file references and resolve them into @file links. Use natural phrases while dictating:

  • "Link to the auth helper"
  • "At file the button component"
  • "Reference the user model"
  • "Add file the config service"
  • "Open the main layout"

Canopy's correction prompt detects these patterns and sends them to a file resolver, which searches the project file tree and uses AI to pick the best match. The resolved path appears as an @path/to/file reference that renders as a clickable file chip in the input bar.

If resolution fails, the text falls back to @?description so you can see what was being looked up and fix it manually.

This feature requires AI correction to be enabled and is toggled via Resolve file references in settings (on by default when your OpenAI key is configured).

Tip
Use natural phrases like "link to the auth helper" or "at the Button component". You don't need exact file names or paths. Canopy resolves the description against the project file tree.

Custom Dictionary

The custom dictionary lets you add up to 100 domain-specific terms that Canopy sends to Deepgram as recognition hints. These terms boost transcription accuracy for project names, product names, technical vocabulary, and abbreviations that the base model might not recognize.

Add terms in Settings > Integrations under the Speech-to-Text section. Each term appears as a removable pill. These same terms are also injected into the AI correction prompt, where they're treated as highest-priority required substitutions.

Your project name is automatically included in the recognition hints, so there's no need to add it manually.

Tip
Add technical names, internal package names, and unusual abbreviations your team uses. This directly improves transcription accuracy for those terms, both at the Deepgram level and in the AI correction pass.

Settings Reference

All voice input settings are in Settings > Integrations.

Speech-to-Text

SettingValues / Notes
Voice InputEnabled / Disabled (default: Disabled)
Deepgram API Keydg_... prefix, validated on save
LanguageEnglish, Spanish, French, German, Japanese, Chinese, Korean, Portuguese, Italian, Russian
Transcription ModelNova-3 ($0.0077/min, default) or Nova-2 ($0.0043/min)
Paragraph BreaksSpoken commands (default, English only) or Manual Enter
Custom DictionaryUp to 100 terms, sent as Deepgram keyterm hints

AI Text Correction

This section is only visible when Voice Input is enabled.

SettingValues / Notes
AI Text CorrectionEnabled / Disabled (default: Disabled)
OpenAI API Keysk-... prefix, validated on save
Correction ModelGPT-5 Mini (recommended, full paragraph) or GPT-5 Nano (faster, lower cost)
Resolve File ReferencesEnabled by default when OpenAI key is configured
Custom InstructionsFree-form textarea for project-specific correction rules

Keyboard Shortcuts

ActionShortcut
Toggle voice dictation (global)Cmd+Shift+V / Ctrl+Shift+V
Paste as plain text (when input bar has focus)Cmd+Shift+V / Ctrl+Shift+V
Commit paragraph / submitEnter
Cancel voice wait-submitEscape

The Cmd+Shift+V shortcut is context-sensitive. It toggles voice dictation when the input bar does not have focus, and pastes clipboard content as plain text when it does. See Keyboard Shortcuts for the full reference.

Troubleshooting

Mic button is hidden

The microphone button only appears when Voice Input is fully configured. Check that:

  • Voice Input is enabled in Settings > Integrations.
  • A valid Deepgram API key has been entered.
  • Microphone permission is granted at the OS level.

"Connection timed out"

Canopy couldn't reach Deepgram within 10 seconds. Verify your internet connection and check that your Deepgram API key is valid and has available credits.

"Invalid API key"

Shown when saving settings if the API key format is incorrect or the key is no longer valid. Re-enter a valid key (Deepgram keys start with dg_, OpenAI keys start with sk-).

Spoken paragraph commands aren't working

Spoken commands (like "new paragraph") only work when:

  • The language is set to English.
  • The paragraph break strategy is set to Spoken commands.
Note
Non-English languages fall back to manual Enter automatically, even if the Spoken commands strategy is selected. This is a limitation of the underlying Deepgram Dictation mode.

AI correction not running

Check that AI Text Correction is toggled on and a valid OpenAI API key is entered. If transcription confidence is consistently high (all words above 0.85), the correction call is intentionally skipped because the raw transcription doesn't need fixing.

File references showing as @?

The @?description format means file resolution failed for that reference. This can happen if the project file tree doesn't contain a close match, or if the description was too vague for the AI to resolve. Try using more specific descriptions when dictating file references.