Source kinds
Portrait
A single image. The model animates the face from audio. Fast to register,
great for most use cases.
Source video
Real footage. The model lipsyncs against the actual frames for natural head
motion. The clip is preprocessed into a reusable
.avtrv cache and
ping-pong looped for any audio length.Preparation
prepare warms the avatar so the first turn is fast: it downloads and normalizes
the portrait (or loads the source-video cache) ahead of time.
- Streaming turns: call
prepareonce per session before the firstturn. - Lipsync: prepares implicitly — you don’t need a separate call.
Voices
Each avatar can carry a default voice from the Cartesia voice list, or let the platform match one automatically at creation time. Override per turn withvoice_id.
See Create avatars to register one.