Lipsync is the simplest way to produce avatar video: pass a public audio URL
plus an avatar, and the platform downloads the audio, renders the whole clip in
one pass, stores the MP4, and returns its URL. No LLM, no TTS, no streaming —
ideal for backend jobs, batch pipelines, and pre-rendered content.
Request
POST /api/v1/lipsync
| Field | Required | Description |
|---|
audioUrl | yes | Public URL of the speech audio to lipsync. |
avatarId | one of | An avatar you created (ava_…). |
portraitUrl | one of | A portrait image to register on the fly. |
backgroundId | no | Background id (defaults to plain). |
Provide either avatarId or portraitUrl.
Example
import { RealtimeAvatarClient } from "realtime-avatar";
const client = RealtimeAvatarClient.platform({
apiKey: process.env.REALTIME_AVATAR_API_KEY!,
});
const result = await client.lipsync({
audioUrl: "https://example.com/speech.mp3",
avatarId: "ava_…",
});
console.log(result.url); // public MP4 URL
console.log(result.frames, result.fps, result.durationSeconds);
Response
{
"url": "https://.../lipsync/lip_….mp4",
"avatarId": "ava_…",
"frames": 150,
"fps": 25,
"durationSeconds": 6.0
}
Rendering scales with clip length and runs on a GPU, so a lipsync call can take
several seconds. Keep request timeouts generous (the SDKs default to a long
timeout) and run batches concurrently.
Lipsync bills realtime seconds against your wallet, the same as streaming turns.
See Credits and billing.