The music_generate tool lets the agent create music or audio through the
shared music-generation capability with configured providers — Google,
MiniMax, and workflow-configured ComfyUI today.
For session-backed agent runs, OpenClaw starts music generation as a background task, tracks it in the task ledger, then wakes the agent again when the track is ready so the agent can tell the user and attach the finished audio. In group/channel chats that use message-tool-only visible delivery, the agent relays the result through the message tool. If the completion agent writes only a private final reply, OpenClaw falls back to a direct channel send with the generated media. The completion wake explicitly warns the agent that normal final replies are private in those routes.
Quick start
The agent calls `music_generate` automatically. No tool
allow-listing needed.
</Step>
</Steps>
For direct synchronous contexts without a session-backed agent run,
the built-in tool still falls back to inline generation and returns
the final media path in the tool result.
Example prompts:
Generate a cinematic piano track with soft strings and no vocals.
Generate an energetic chiptune loop about launching a rocket at sunrise.
Supported providers
| Provider | Default model | Reference inputs | Supported controls | Auth |
|---|---|---|---|---|
| ComfyUI | workflow | Up to 1 image | Workflow-defined music or audio | COMFY_API_KEY, COMFY_CLOUD_API_KEY |
lyria-3-clip-preview | Up to 10 images | lyrics, instrumental, format | GEMINI_API_KEY, GOOGLE_API_KEY | |
| MiniMax | music-2.6 | None | lyrics, instrumental, durationSeconds, format=mp3 | MINIMAX_API_KEY or MiniMax OAuth |
Capability matrix
The explicit mode contract used by music_generate, contract tests, and the
shared live sweep:
| Provider | generate | edit | Edit limit | Shared live lanes |
|---|---|---|---|---|
| ComfyUI | ✓ | ✓ | 1 image | Not in the shared sweep; covered by extensions/comfy/comfy.live.test.ts |
| ✓ | ✓ | 10 images | generate, edit | |
| MiniMax | ✓ | — | None | generate |
Use action: "list" to inspect available shared providers and models at
runtime:
/tool music_generate action=list
Use action: "status" to inspect the active session-backed music task:
/tool music_generate action=status
Direct generation example:
/tool music_generate prompt="Dreamy lo-fi hip hop with vinyl texture and gentle rain" instrumental=true
Tool parameters
Async behavior
Session-backed music generation runs as a background task:
- Background task:
music_generatecreates a background task, returns a started/task response immediately, and posts the finished track later in a follow-up agent message. - Duplicate prevention: while a task is
queuedorrunning, latermusic_generatecalls in the same session return task status instead of starting another generation. Useaction: "status"to check explicitly. - Status lookup:
openclaw tasks listoropenclaw tasks show <taskId>inspects queued, running, and terminal status. - Completion wake: OpenClaw injects an internal completion event back into the same session so the model can write the user-facing follow-up itself.
- Prompt hint: later user/manual turns in the same session get a small
runtime hint when a music task is already in flight, so the model does
not blindly call
music_generateagain. - No-session fallback: direct/local contexts without a real agent session run inline and return the final audio result in the same turn.
Task lifecycle
| State | Meaning |
|---|---|
queued | Task created, waiting for the provider to accept it. |
running | Provider is processing (typically 30 seconds to 3 minutes depending on provider and duration). |
succeeded | Track ready; the agent wakes and posts it to the conversation. |
failed | Provider error or timeout; the agent wakes with error details. |
Check status from the CLI:
openclaw tasks list
openclaw tasks show <taskId>
openclaw tasks cancel <taskId>
Configuration
Model selection
{
agents: {
defaults: {
musicGenerationModel: {
primary: "google/lyria-3-clip-preview",
fallbacks: ["minimax/music-2.6"],
},
},
},
}
Provider selection order
OpenClaw tries providers in this order:
modelparameter from the tool call (if the agent specifies one).musicGenerationModel.primaryfrom config.musicGenerationModel.fallbacksin order.- Auto-detection using auth-backed provider defaults only:
- current default provider first;
- remaining registered music-generation providers in provider-id order.
If a provider fails, the next candidate is tried automatically. If all fail, the error includes details from each attempt.
Set agents.defaults.mediaGenerationAutoProviderFallback: false to use only
explicit model, primary, and fallbacks entries.
Provider notes
Choosing the right path
- Shared provider-backed when you want model selection, provider failover, and the built-in async task/status flow.
- Plugin path (ComfyUI) when you need a custom workflow graph or a provider that is not part of the shared bundled music capability.
If you are debugging ComfyUI-specific behavior, see [ComfyUI](/docs/openclaw-docs/providers/comfy. If you are debugging shared provider behavior, start with [Google (Gemini)](/docs/openclaw-docs/providers/google or [MiniMax](/docs/openclaw-docs/providers/minimax.
Provider capability modes
The shared music-generation contract supports explicit mode declarations:
generatefor prompt-only generation.editwhen the request includes one or more reference images.
New provider implementations should prefer explicit mode blocks:
capabilities: {
generate: {
maxTracks: 1,
supportsLyrics: true,
supportsFormat: true,
},
edit: {
enabled: true,
maxTracks: 1,
maxInputImages: 1,
supportsFormat: true,
},
}
Legacy flat fields such as maxInputImages, supportsLyrics, and
supportsFormat are not enough to advertise edit support. Providers
should declare generate and edit explicitly so live tests, contract
tests, and the shared music_generate tool can validate mode support
deterministically.
Live tests
Opt-in live coverage for the shared bundled providers:
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.ts
Repo wrapper:
pnpm test:live:media music
This live file loads missing provider env vars from ~/.profile, prefers
live/env API keys ahead of stored auth profiles by default, and runs both
generate and declared edit coverage when the provider enables edit
mode. Coverage today:
google:generatepluseditminimax:generateonlycomfy: separate Comfy live coverage, not the shared provider sweep
Opt-in live coverage for the bundled ComfyUI music path:
OPENCLAW_LIVE_TEST=1 COMFY_LIVE_TEST=1 pnpm test:live -- extensions/comfy/comfy.live.test.ts
The Comfy live file also covers comfy image and video workflows when those sections are configured.
Related
- [Background tasks](/docs/openclaw-docs/automation/tasks — task tracking for detached
music_generateruns - [ComfyUI](/docs/openclaw-docs/providers/comfy
- [Configuration reference](/docs/openclaw-docs/gateway/config-agents#agent-defaults —
musicGenerationModelconfig - [Google (Gemini)](/docs/openclaw-docs/providers/google
- [MiniMax](/docs/openclaw-docs/providers/minimax
- [Models](/docs/openclaw-docs/concepts/models — model configuration and failover
- [Tools overview](/docs/openclaw-docs/tools