There are tools to assist with prompts (nodes or workflows in ComfyUI?), but the WAN model itself also seems to have inherent randomness.
You are not doing anything âwrong.â WAN is doing what itâs designed to do: if you do not over-constrain it, it happily improvises cinematic chaos.
You can make it much more obedient, but you have to (1) understand where the randomness comes from, (2) prompt in a structured way, and (3) be careful with LoRAs.
Iâll walk through:
- Why WAN 2.2 video feels random
- A stable base setup (before advanced tricks)
- A practical prompting framework just for WAN
- How LoRAs affect your scenes
- A concrete âdinner sceneâ recipe
- Extra tools and advanced tricks if you want to go further
1. Background: what WAN 2.2 is doing under the hood
WAN 2.2 is a modern video diffusion model: it takes text (and optionally an image or two) and denoises a block of video frames. It uses a Mixture of Experts (MoE) design with âhigh-noiseâ and âlow-noiseâ experts, so different parts of the denoising trajectory are handled by different sub-models. (ComfyUI)
Key consequences:
- It is very good at complex, cinematic motion and multi-character interaction.
- It is also happy to âfill in the gapsâ when the prompt is vague, letting the MoE pick whatever it thinks is a plausible movie-like beat. (Instasd)
For Image-to-Video (I2V) specifically:
- The start image anchors the first part of the clip.
- As denoising progresses, the model has more freedom. If you do not explicitly constrain motion, characters, and camera, it can invent new actions, extra people, or odd behavior. Official docs show that different workflows (T2V, I2V, TI2V, FLF2V) control this via start/last frames and motion modules. (ComfyUI)
Your âtwo guys jump in and everyone kissesâ outcome is exactly what happens when WANâs âcinematic defaultsâ take over because the instructions werenât tight enough.
2. Why your results feel random
Several things stack together:
2.1. Under-specified prompts + MoE âfilling in defaultsâ
Community prompt guides for Wan 2.2 keep repeating the same warning:
- Prompts that are too short or vague push the model to use its own cinematic defaults.
- Good Wan 2.2 prompts tend to be 80â120 words and explicitly describe shot type, motion, and constraints. (Instasd)
If you say something like:
âa romantic dinner, couple embracing, talkingâ
WAN fills in:
- How many people exactly?
- Do other people exist in the restaurant?
- Is the camera static, panning, dollying, cutting?
- Are there transitions or emotional beats (hug, kiss, stand up, walk)?
Because it is trained on cinematic material, âsurprise third wheelâ and âsudden kissâ are perfectly plausible continuations.
2.2. Few-step + CFG=1 = negative prompts mostly do nothing
Many Wan 2.2 deployments (especially Lightning and Rapid AIO styles) run at 4â8 steps for speed. To make that work, they often fix CFG (classifier-free guidance) = 1. (GitHub)
For standard diffusion:
- CFG>1 lets you use a negative prompt list (âno kissing, no extra peopleâ) because the model compares âwith promptâ vs. âwithout promptâ.
For low-step, CFG=1 models:
- If CFG=1, classic negative prompts are effectively disabled. Blog posts about NAG (Normalized Attention Guidance) spell this out: CFG=1 stops the usual negative-prompt blending from working in a few-step setting. (ainvfx.com)
- Users on Reddit experimenting with Wan 2.2 T2V see exactly this: with CFG=1, changing the negative prompt often does nothing visible. (Reddit)
So, if you were relying on:
Positive: âromantic dinnerâ
Negative: âno extra people, no kissingâ
there is a good chance the negative part wasnât actually constraining anything in a 4-step / CFG=1 setup.
2.3. Seed + stochastic sampling
WAN is still a diffusion model:
- The seed initializes the noise; different seeds = different micro-actions even with identical prompts.
- If you do not fix the seed, ârandom weirdnessâ will vary every run; you cannot learn cause/effect reliably.
This is a general diffusion thing; Wanâs official repo and the Diffusers integration both expose seed via a torch generator. (GitHub)
2.4. LoRAs absolutely can change motion and behavior
LoRA is a fine-tuning trick: instead of changing the whole model, you add small trainable matrices (adapters) into attention layers. This is widely used in Stable Diffusion and works similarly for Wan. (Hugging Face)
For Wan 2.2:
- People train character LoRAs (faces, outfits), style LoRAs, and even motion LoRAs. (YouTube)
- Motion LoRAs can bias how characters move (e.g., very animated body language, dynamic camera moves, orbital shots).
- Articles on Wan 2.x LoRA training explicitly mention splitting high-noise vs. low-noise LoRAs and note that Wan 2.1 LoRAs often transfer to Wan 2.2 because the architecture is similar. (Medium)
So yes: the LoRAs you load absolutely can be part of why people are suddenly hugging, kissing, or flailing.
There is even an official warning in the Wan2.2 Animate docs: for some tasks they do not recommend certain Wan 2.2 LoRAs because weight changes can cause unexpected behavior. (GitHub)
3. Build a stable base before you fight the randomness
Before tweaking prompts, lock down the setup so you can actually see what the prompt is doing.
3.1. Use a known-good workflow
If possible:
- Start from the official ComfyUI Wan2.2 I2V template or the ComfyUI Wan2.2 native workflows published by ComfyOrg. (ComfyUI)
- Or use a well-tested Wan 2.2 I2V/Lightning workflow from a tutorial (like the NextDiffusion fast I2V guide) and keep its sampler + step settings. (NextDiffusion)
This avoids weirdness from broken custom graphs.
3.2. Fix the seed
- Pick a seed (e.g., 123456) and keep it constant while you experiment with prompts.
- Only change the seed after you like the behavior and want variants.
Most Wan 2.2 Comfy workflows have a seed field on the sampler or Wan node; Diffusers examples pass a torch generator with .manual_seed(). (GitHub)
3.3. Start without LoRAs
- Do at least one full I2V run with no LoRAs.
- Then add one LoRA at a time and see how it changes motion and style.
Reddit LoRA threads for Wan 2.2 keep stressing that for character/motion LoRAs, you should keep values modest and be aware of interaction between high-noise and low-noise branches. (Reddit)
3.4. Use short test clips
Nine hours for a clip suggests:
- Too many frames
- Too high resolution
- Possibly non-Lightning model or additional heavy nodes
Official Wan 2.2 docs and tutorials often demo clips around 2â4 seconds (48â96 frames at ~24fps) and 480pâ720p. (ComfyUI)
Use that scale for prompt testing; only go longer or higher-res for a final render.
4. A WAN-specific prompting framework
You want to over-specify the scene so the MoE has less freedom to invent twists. Prompt guides for Wan 2.2 all converge on similar ingredients: shot order, camera language, motion modifiers, aesthetics, and explicit spatial/temporal constraints. (Instasd)
A simple but powerful structure:
- Cast and count
- Setting and time
- Camera and framing
- Action timeline (what they do)
- Motion boundaries (what must not happen)
- Visual style & mood
4.1. Cast and count
Make it painfully clear how many people there are.
Bad:
âa couple at dinnerâ
Better:
âexactly two people: one man and one woman, both in their 30s, sitting at the same dining table. No other people in the restaurant. No background characters.â
WAN is trained on busy environments. If you do not forbid âbackground extrasâ, it may add them.
4.2. Setting and time
Anchor the environment so the model does not shift locations mid-clip.
âIndoor restaurant, warm candlelit evening, wooden table, blurred background tables, no TV screens, no windows in frame.â
The prompt frameworks for Wan 2.2 emphasize specifying time of day, type of lighting, and environment to keep coherence. (MimicPC)
4.3. Camera and framing
Wan 2.2 responds well to camera language (âmedium shotâ, âstatic cameraâ, âslow zoom outâ) and guides treat this as a first-class part of the prompt. (Instasd)
Examples:
- âstatic camera, eye-level, medium shot of both people from the waist upâ
- âcamera does not move, no zoom, no panâ
Or, if you do want movement:
- âslow cinematic dolly in towards the couple over three secondsâ
4.4. Action timeline
Instead of one vague verb (âembracing and talkingâ), describe a tiny story:
âThe man keeps his arm gently around the womanâs shoulders. They lean slightly toward each other. They talk quietly and smile. They sometimes nod, but they remain seated the entire time. They do not stand up. They do not wave. They do not look off-screen.â
Prompt guides for Wan 2.2 explicitly recommend ordering prompt content like:
Opening shot â Camera motion â Reveal / payoff (Instasd)
You can compress that into one sentence or a few short sentences.
4.5. Motion boundaries (very important for you)
Because negative prompts are weak or disabled at CFG=1, you must phrase âno Xâ as positive constraints.
Instead of:
âromantic dinner, no random people, no kissingâ
Use:
âThere are only these two people and nobody else ever enters the frame.
They do not kiss; they only talk and smile.
They remain seated at the table the whole time.â
You are telling the model the allowed motion space.
Advanced: If you move away from CFG=1 or use special nodes like NAG or CFGlessNegativePrompt, you can reclaim some power from genuine negative prompts. NAG and related methods were designed specifically to bring back negative control in few-step/CFG=1 distillations like Wan2.2 Lightning. (ainvfx.com)
But even with those tricks, good positive constraints are still essential.
4.6. Visual style and mood
Keep style tags late in the prompt so they donât overshadow structure:
âcinematic, naturalistic lighting, soft shallow depth of field, subtle film grain, realistic skin tonesâ
Wan-specific prompt resources give big libraries of style, lens, and lighting tags; these are helpful after you lock down who-does-what-where. (MimicPC)
5. LoRAs: what they do to behavior and how to keep control
5.1. What a LoRA actually does
LoRA (Low-Rank Adaptation) injects small trainable matrices into the model so it can learn a new style, character, or behavior without retraining the whole network. (Hugging Face)
For Wan 2.2:
- Character LoRAs: keep faces/clothes consistent.
- Style LoRAs: push toward a particular art or cinematography style.
- Motion LoRAs: bias camera/character motion (e.g., orbital shots, dynamic handheld). (YouTube)
All of them change the probability of different actions and compositions.
5.2. How LoRAs can make your scenes weirder
- A character LoRA trained on scenes with lots of hugging or kissing may pull the model toward those behaviors, even when your prompt is neutral.
- A motion LoRA trained for âdynamic group dancingâ might increase the chance of sudden entrances and exaggerated gestures.
- Reddit reports that some Wan 2.2 Lightning LoRAs significantly change motion, and users switch between I2V vs T2V LoRAs to fix motion quality. (Reddit)
So yes: your LoRAs can absolutely be contributing to âtwo guys jump inâ and âeveryone kissesâ.
5.3. Best practices for LoRAs on video
-
Test on single images first.
Generate a few still images with the LoRA at different strengths. Check that it is not adding unwanted extra characters or weird posing even in static mode.
-
Use moderate strengths.
Many Wan 2.2 LoRA guides suggest starting around 0.6â0.8 and only going higher if needed. (Reddit)
-
Avoid stacking many LoRAs at full power.
Use at most a character + a style LoRA to start; adding more can multiply odd behavior.
-
Follow model-specific warnings.
The official Wan2.2 Animate repo explicitly says not to use some Wan 2.2 LoRAs there because they cause âunexpected behavior.â (GitHub)
That is a strong sign you should be careful about LoRAs in general.
-
Lock in your prompt first.
Get a decent behavior without LoRAs, then add the LoRA on top. If something changes, you know it is the LoRA.
6. A concrete recipe for your âdinner sceneâ
Assume:
- Wan 2.2 I2V model or Rapid AIO that respects prompts similarly
- CFG=1, 4â8 steps (Lightning-style or Rapid AIO style)
- ~72 frames (~3 seconds at 24fps)
- 720p or 480p for testing
6.1. Structure the prompt
Example prompt (you can adapt words, but keep the structure):
âExactly two people: one man and one woman in their 30s, sitting close together at the same dinner table.
Warm, cozy restaurant interior at night, candlelight on the table, blurred background, no TV screens, no other people anywhere in the scene.
Static camera, eye-level, medium shot showing both of them from the waist up. The camera does not move, no zoom, no pan.
The man keeps his arm gently around the womanâs shoulders and they lean slightly toward each other. They talk quietly and smile. They sometimes nod and make small hand gestures, but they remain seated at the table the entire time. They do not stand up, they do not wave, and nobody ever enters or leaves the frame. They do not kiss, they simply talk and smile.
Cinematic, realistic style, natural warm lighting, soft shallow depth of field, subtle film grain, realistic skin tones.â
Notice how it:
- Fixes count (âexactly two peopleâ, âno other peopleâ).
- Fixes camera (âstaticâ, âno zoom, no panâ).
- Fixes motion (âremain seatedâ, âdo not stand upâ, âdo not kissâ).
- Uses multiple short sentences: Wan tends to handle that well. (Instasd)
6.2. Basic generation checklist
- Use a single, clean Wan 2.2 I2V workflow. (ComfyUI)
- Fix the seed.
- No LoRAs for first tests.
- 72 frames, 480pâ720p, 4â8 steps, CFG=1. (GitHub)
- Render; if motion still looks too random, tighten the action and boundary sentences even more (âvery small movements onlyâ, âno large gesturesâ, etc.).
- Once behavior is good, add LoRAs slowly and watch for new oddities.
7. Advanced knobs if you want even tighter control
These are optional, but they can help.
7.1. FirstâLast frame (FLF2V)
WAN officially supports FirstâLast Frame to Video: you give a start and an end image and it interpolates motion in between. ComfyUI has native FLF2V workflows and tutorials. (Notes)
Use it when:
- You want to constrain where characters end up, not just where they start.
- You want to avoid the model drifting into random poses by the end.
You can make the last frame only slightly different (e.g., same pose but slightly different expressions) to keep motion subtle.
7.2. Fun Camera Control
ComfyUIâs Wan2.2 Fun Camera workflow adds discrete camera motion controls (pan, zoom, etc.), so your prompt focuses more on what people do, while the node handles how the camera moves. (ComfyUI)
You might prefer:
- âThey sit and talk, camera slowly zooms inâ
with the motion defined in the Fun Camera node, not in prose.
7.3. Negative guidance for CFG=1 (NAG, CFGlessNegativePrompt, etc.)
Because CFG=1 breaks classic negatives, some node packs implement attention-based negatives:
- NAG (Normalized Attention Guidance) and similar methods inject negative constraints directly into attention maps and work well with 4-step WAN-like models. (ainvfx.com)
- CFGlessNegativePrompt (ConDelta node) is specifically meant to use a negative prompt even when you keep CFG at 1 by modifying conditioning rather than relying on classic CFG. (RunComfy)
If your environment supports these, you can move some âno Xâ rules back into a true negative prompt. But this is an advanced step; do it only after you already get decent results from good positive prompts.
8. Curated resources if you want to go deeper
Grouped so you can bookmark them:
Official / semi-official WAN docs
- Wan2.2 GitHub repo â canonical CLI usage, tasks, and notes (T2V, I2V, TI2V). (GitHub)
- ComfyUI Wan2.2 official tutorials & native workflows â how to load I2V/T2V/FLF2V templates, plus explanations of frame length, resolution, and parameters. (ComfyUI)
Prompting guides
- Wan 2.2 Prompting Guides (Shot Order, Camera, Motion, Aesthetics) â frameworks and example prompts; emphasize 80â120 word prompts and structured camera language. (Instasd)
- Reddit Wan2.2 prompt builder threads â community-shared prompt builders and example prompts to copy and tweak. (Reddit)
LoRA background and WAN-specific LoRA info
- Hugging Face âUsing LoRA for Stable Diffusion fine-tuningâ â clear explanation of LoRA in simple terms. (Hugging Face)
- Wan 2.2 LoRA training tutorials (AI Toolkit, Musubi-tuner, etc.) â step-by-step guides to training style/character/motion LoRAs, with notes about high-noise vs low-noise experts. (YouTube)
Negative prompting and CFG=1
- NAG / MagCache blog â explains why few-step models set CFG=1 and how that disables classic negative prompts, plus how NAG fixes it. (ainvfx.com)
- CFGlessNegativePrompt node docs â how to use a negative prompt without relying on CFG. (RunComfy)
- Wan 2.2 CFG & negative prompt discussions â real user experiments showing negative prompts barely work when CFG=1 unless special tricks are used. (Reddit)