Realtime turns - Realtime Avatar

A turn streams a live talking-avatar response: interleaved text, PCM audio, and video frames, delivered as a compact binary application/x-avatar-mux body so playback can start within a few hundred milliseconds.

A turn response is not JSON. Decode it with readAvatarMux or play it with AvatarPlayer. Never call res.json() on a turn.

Prepare, then turn

Call prepare once per session to warm the avatar; then run turns.

import { RealtimeAvatarClient } from "realtime-avatar";

const client = RealtimeAvatarClient.platform({
  apiKey: process.env.REALTIME_AVATAR_API_KEY!,
});

await client.prepare({ avatar_id: "ava_…" });

// Speak literal text:
const stream = await client.turn({
  avatar_id: "ava_…",
  mode: "speak_text",
  text: "Hey chat, welcome to the stream!",
});

// Or run a persona reply from chat history:
// const stream = await client.turn({ avatar_id: "ava_…", mode: "chat", messages: [...] });

Consuming the stream

If you only need the events (no canvas), iterate them directly:

for await (const event of stream.events) {
  if (event.header.type === "text_delta") {
    // accumulate transcript
  } else if (event.header.type === "audio") {
    // PCM16 chunk
  } else if (event.header.type === "video") {
    // JPEG/I420 frames
  }
}

For real playback in a browser, use the AvatarPlayer.

Modes

`mode`	Behavior
`speak_text`	Speaks `text` verbatim (no LLM).
`chat`	Runs the avatar’s persona over `messages` and speaks the reply.

Send an Idempotency-Key header on turns; reusing one returns 409 by design, which makes safe retries trivial.

Non-streaming lipsync Web playback

​Prepare, then turn

​Consuming the stream

​Modes

Prepare, then turn

Consuming the stream

Modes