A turn streams a live talking-avatar response: interleaved text, PCM audio, and video frames, delivered as a compact binary application/x-avatar-mux body so playback can start within a few hundred milliseconds.
A turn response is not JSON. Decode it with readAvatarMux or play it with AvatarPlayer. Never call res.json() on a turn.

Prepare, then turn

Call prepare once per session to warm the avatar; then run turns.
import { RealtimeAvatarClient } from "realtime-avatar";

const client = RealtimeAvatarClient.platform({
  apiKey: process.env.REALTIME_AVATAR_API_KEY!,
});

await client.prepare({ avatar_id: "ava_…" });

// Speak literal text:
const stream = await client.turn({
  avatar_id: "ava_…",
  mode: "speak_text",
  text: "Hey chat, welcome to the stream!",
});

// Or run a persona reply from chat history:
// const stream = await client.turn({ avatar_id: "ava_…", mode: "chat", messages: [...] });

Consuming the stream

If you only need the events (no canvas), iterate them directly:
for await (const event of stream.events) {
  if (event.header.type === "text_delta") {
    // accumulate transcript
  } else if (event.header.type === "audio") {
    // PCM16 chunk
  } else if (event.header.type === "video") {
    // JPEG/I420 frames
  }
}
For real playback in a browser, use the AvatarPlayer.

Modes

modeBehavior
speak_textSpeaks text verbatim (no LLM).
chatRuns the avatar’s persona over messages and speaks the reply.
Send an Idempotency-Key header on turns; reusing one returns 409 by design, which makes safe retries trivial.