What the model watched -- CinePrompt Field Notes

ByteDance suspended the global launch of Seedance 2.0 on Friday. Copyright disputes with Disney and major Hollywood studios. The short version: Disney claims ByteDance pre-packaged copyrighted characters from Star Wars and Marvel as if they were public-domain clip art. A cease-and-desist in February. Legal teams scrambling. Engineers adding safeguards. A mid-March worldwide release shelved indefinitely.

That is the intellectual property story. It is worth knowing. But there is a more interesting story underneath it.

The invisible filmography

Every AI video model has a filmography. Not in the sense that it made films. In the sense that it watched them. Millions of hours of video data, scraped, curated, licensed, borrowed, or stolen, depending on who you ask and which model you are asking about. That filmography is the model's entire visual education.

When you type "golden hour backlight, warm tones, shallow depth of field" and the output feels correct, it is because the model saw thousands of frames that matched that description. When you type "Soviet realist documentary grain, handheld, Dziga Vertov" and get something that looks like a stock photo wearing a vintage filter, it is because the model probably never watched Vertov. Or watched so little of it that the signal drowned in the noise of ten million other clips.

The training data is not a footnote. It is the taste.

What Disney found

The Seedance case is dramatic because the training data included characters that audiences recognize instantly. Darth Vader is not ambiguous. Spider-Man is not a borderline case. When users generated viral videos of Tom Cruise and Brad Pitt in a fistfight, nobody needed a reverse-image search to identify the source material.

But every model carries the same inheritance in less visible form. The lighting setups that Veo defaults to were learned from someone's footage. The camera movements Runway reproduces were performed by someone operating a real camera. The composition biases that put subjects dead center of every frame are the statistical average of every deliberately composed shot in the training set.

The Seedance suspension made the invisible visible. Not because ByteDance did something fundamentally different from every other model provider. Because they did it with characters that have lawyers.

The curriculum you cannot see

Here is the practical problem for anyone prompting these models.

You do not know what they watched. No model provider publishes a complete training data manifest. You learn the boundaries of a model's visual education through experimentation. Type a prompt. See what comes back. If the output connects to your intent, the model saw something relevant. If it does not, you are referencing a lecture it never attended.

This is why the same cinematographic keyword produces wildly different results across models. "Chiaroscuro" might land on one model because its training included enough Caravaggio reproductions and film noir frames to build a real association. On another, the word produces vague high-contrast shadows because the training data linked it to thumbnails and style guides, not paintings and films.

It is also why model temperaments exist and why they matter more than benchmarks. When Veo produces polished, art-directed output regardless of what you asked for, that is a training data statement. Veo watched beautiful things. When Runway responds to raw, unglamorous prompting more faithfully than its competitors, that means Runway's training included enough unpolished footage that the model can locate it when asked. The personality is the curriculum.

Owned footage, owned aesthetics

Netflix's acquisition of InterPositive looks different in this context. InterPositive trains custom AI models on a production's own dailies. The model learns the specific visual logic of one film: its lighting, color palette, editorial rhythm. The training data is the production itself.

That is the cleanest answer to this entire problem. You know exactly what the model watched because you shot it. No copyright questions. No invisible curriculum. No surprise that the model generates something it was never supposed to have seen. The aesthetic inheritance is yours because the footage was yours.

For everyone else, working with general-purpose models, the question persists. Your prompt is a request addressed to a visual education you did not design and cannot audit.

What this means for prompting

Two practical takeaways.

First: when a prompt fails, the failure might not be your language. It might be the model's vocabulary. If the training data for a specific look is thin, no amount of prompt refinement will extract what the model cannot recognize. Switching models is sometimes more productive than rewriting prompts. Different filmographies, different visual vocabularies, different results from identical words. Kling's training clearly included dense physical-world footage, which is why its textures feel grounded. Sora's training leaned narrative, which is why it reads prompts like stage directions. WAN's training favored saturation and visual density, which is why sparse frames fight it. These are not bugs. They are biographies.

Second: reference images bypass the training data gap entirely. When you feed a model an image and ask it to animate, you are handing it the visual information directly instead of asking it to recall something from its education. This is why img2vid workflows keep arriving as the answer to problems that text prompts cannot solve. The image carries what the model might not have learned. You stop asking the model to remember. You show it instead.

The quiet variable

Training data was always the thing shaping every output that nobody discussed because nobody could see it. A copyright dispute involving the most recognizable fictional characters on the planet just pulled the curtain back.

Every model you prompt was educated by footage it watched. That footage was chosen by someone, for reasons you do not know, with biases you cannot measure. The prompt is your half of the conversation. The training data is the model's half. And the model will never tell you what it studied.

The best you can do is learn what each model responds to, notice where the vocabulary goes dead, and work within the filmography you have been given. Or hand the model a reference frame and skip the conversation entirely.

Bruce Belafonte is an AI filmmaker at Light Owl. He has never received a straight answer about what any model watched and has stopped expecting one.