Generation Modes

If you've started a blank project and aren't sure what to upload or which mode to use, this guide will help you figure out where to start.

The three main generation modes

Video Generation

Text to Video You only need a written prompt. No assets required. Good for when you have a clear idea of what you want but no reference images or footage yet.

Best models for this: Kling 3.0, Veo 3, Sora 2.

Image to Video You upload a starting image and the model animates it into a video. This is the most common workflow for blank projects. Your starting image becomes the first frame of your video.

Best models for this: Kling 3.0 (recommended — best quality and most cost effective), Seedance 2.0, Hailuo.

Motion Control/ Reference to Video You upload an existing video as a reference or starting point. Use this when you want to match the motion, style, or composition of an existing video.

Available on Kling O3 and Seedance 2.0.


What assets do I need to upload?

It depends on what you're trying to make:

  • Just a concept or scene — no assets needed, use text to video

  • A specific character or actor in a scene — upload a photo of your character, use image to video

  • A specific environment or setting — upload a background image, use image to video

  • Matching the motion of an existing video — upload a video, use motion control on Kling 3.0. or reference to video on Seedance 2.0


How to build a blank project from scratch

  1. Find an inspiration image or video that matches the vibe you're going for and screenshot it

  2. Go to Agent Mode in Ava Studio and attach the screenshot along with the character you want in the scene. Ask it to generate a starting frame image using GPT-2 Image or Nano Banana 2

  3. Repeat until you have starting frame images for each scene you want to build

  4. For each scene, go back to Agent Mode, attach your starting frame image, describe what you want to happen in the video, and specify Kling 3.0 as your model

  5. If you like the output, move to the next scene. If you want to extend it, click the extend icon on the video

  6. Repeat per scene until your full video is assembled


Which model should I use?

Model
Best for
Notes

Kling 3.0

Most use cases

Best quality, most affordable, recommended starting point

Kling O3

Image or video reference workflows

More control over character and environment

Seedance 2.0

General image to video

Good quality but limited to 720p, no real human faces, may need a video upscaler

Veo 3

Realism-focused videos

High quality but costly and more restricted

Sora 2

Realism-focused videos

Similar to Veo 3, good for cinematic outputs

Hailuo

Text or image to video

Good option, newer model still being evaluated


A note on start and end frames

Start and end frame controls let you define exactly how a scene begins and ends. Upload an image as your start frame to anchor the first moment of your video. This feature is available on Veo 3. Note that on some models this feature may be intermittently unavailable.


Tips

  • If you want a character to hold a consistent look across multiple scenes, generate a character sheet first. In Agent Mode prompt: create a character sheet of this character and use that as your reference image going forward

  • For motion control, screenshot the first frame of a video whose movement you want to replicate, match the pose using Nano Banana 2, then use Kling 3.0 Motion Control and prompt: same motion control as the video, no distortions

Last updated