Generation Modes
If you've started a blank project and aren't sure what to upload or which mode to use, this guide will help you figure out where to start.
The three main generation modes
Video Generation
Text to Video You only need a written prompt. No assets required. Good for when you have a clear idea of what you want but no reference images or footage yet.
Best models for this: Kling 3.0, Veo 3, Sora 2.

Image to Video You upload a starting image and the model animates it into a video. This is the most common workflow for blank projects. Your starting image becomes the first frame of your video.
Best models for this: Kling 3.0 (recommended — best quality and most cost effective), Seedance 2.0, Hailuo.

Motion Control/ Reference to Video You upload an existing video as a reference or starting point. Use this when you want to match the motion, style, or composition of an existing video.
Available on Kling O3 and Seedance 2.0.


What assets do I need to upload?
It depends on what you're trying to make:
Just a concept or scene — no assets needed, use text to video
A specific character or actor in a scene — upload a photo of your character, use image to video
A specific environment or setting — upload a background image, use image to video
Matching the motion of an existing video — upload a video, use motion control on Kling 3.0. or reference to video on Seedance 2.0
How to build a blank project from scratch
Find an inspiration image or video that matches the vibe you're going for and screenshot it
Go to Agent Mode in Ava Studio and attach the screenshot along with the character you want in the scene. Ask it to generate a starting frame image using GPT-2 Image or Nano Banana 2
Repeat until you have starting frame images for each scene you want to build
For each scene, go back to Agent Mode, attach your starting frame image, describe what you want to happen in the video, and specify Kling 3.0 as your model
If you like the output, move to the next scene. If you want to extend it, click the extend icon on the video
Repeat per scene until your full video is assembled
Which model should I use?
Kling 3.0
Most use cases
Best quality, most affordable, recommended starting point
Kling O3
Image or video reference workflows
More control over character and environment
Seedance 2.0
General image to video
Good quality but limited to 720p, no real human faces, may need a video upscaler
Veo 3
Realism-focused videos
High quality but costly and more restricted
Sora 2
Realism-focused videos
Similar to Veo 3, good for cinematic outputs
Hailuo
Text or image to video
Good option, newer model still being evaluated
A note on start and end frames
Start and end frame controls let you define exactly how a scene begins and ends. Upload an image as your start frame to anchor the first moment of your video. This feature is available on Veo 3. Note that on some models this feature may be intermittently unavailable.
Tips
If you want a character to hold a consistent look across multiple scenes, generate a character sheet first. In Agent Mode prompt: create a character sheet of this character and use that as your reference image going forward
For motion control, screenshot the first frame of a video whose movement you want to replicate, match the pose using Nano Banana 2, then use Kling 3.0 Motion Control and prompt: same motion control as the video, no distortions
Last updated