A turn streams a live talking-avatar response: interleaved text, PCM audio,
and video frames, delivered as a compact binary application/x-avatar-mux body
so playback can start within a few hundred milliseconds.
A turn response is not JSON. Decode it with readAvatarMux or play it with
AvatarPlayer. Never call res.json() on a turn.
Prepare, then turn
Call prepare once per session to warm the avatar; then run turns.
import { RealtimeAvatarClient } from "realtime-avatar";
const client = RealtimeAvatarClient.platform({
apiKey: process.env.REALTIME_AVATAR_API_KEY!,
});
await client.prepare({ avatar_id: "ava_…" });
// Speak literal text:
const stream = await client.turn({
avatar_id: "ava_…",
mode: "speak_text",
text: "Hey chat, welcome to the stream!",
});
// Or run a persona reply from chat history:
// const stream = await client.turn({ avatar_id: "ava_…", mode: "chat", messages: [...] });
Consuming the stream
If you only need the events (no canvas), iterate them directly:
for await (const event of stream.events) {
if (event.header.type === "text_delta") {
// accumulate transcript
} else if (event.header.type === "audio") {
// PCM16 chunk
} else if (event.header.type === "video") {
// JPEG/I420 frames
}
}
For real playback in a browser, use the AvatarPlayer.
Modes
mode | Behavior |
|---|
speak_text | Speaks text verbatim (no LLM). |
chat | Runs the avatar’s persona over messages and speaks the reply. |
Send an Idempotency-Key header on turns; reusing one returns 409 by design,
which makes safe retries trivial.