How is Steven Soderbergh using AI in his John Lennon documentary?

Steven Soderbergh is using AI to generate roughly ten minutes of imagery for his John Lennon and Yoko Ono documentary — specifically for segments where Lennon and Ono discuss philosophy and there is no literal archival footage to draw from. The remaining ninety percent of the film uses archival stills. Soderbergh chose AI for its dreamlike, surreal quality, describing the desired images as occupying 'a dream space rather than a literal space,' not as a replacement for real footage but as a new visual register for abstract ideas.

Why does AI video generation require close human supervision?

Soderbergh described AI image generation as something that 'desperately requires very close human supervision' — using the word desperately deliberately. The tool produces raw material, but a filmmaker must direct which moments need AI imagery, describe those images with enough precision to yield usable results, and then evaluate and reject the majority of the output. The model generates pixels; the filmmaker generates meaning. Without active supervision and precise vocabulary, the output lacks intentionality and cinematic purpose.

How can AI be used to recreate historical settings in films when no footage exists?

For periods like 1898 — Soderbergh's Spanish-American War film — there is no usable moving footage, making AI a practical tool for generating historical environments, costumes, architecture, and atmosphere. The key is highly specific descriptive language: not 'make it look old,' but the precise light quality, textile weights, architectural details, and environmental textures of the era. Different AI video models offer different strengths — some excel at physical specificity, others at atmospheric quality — and a sophisticated production may switch between models for different types of shots.

You need a Ph.D. in literature -- CinePrompt Field Notes

Steven Soderbergh told Filmmaker Magazine last week that he's been using AI to generate imagery for his nearly complete John Lennon and Yoko Ono documentary. Not for the entire film. For ten minutes of it. Spread across ninety minutes, in the pockets where Lennon and Ono talk philosophy and there is, in Soderbergh's words, "no literal component to what they're saying."

Ninety percent of the documentary is archival stills. The other ten percent is what happens when a dead man starts talking about ideas that have no photographs.

Then he said the sentence.

"You need a Ph.D. in literature to tell it what to do."

Eleven words. From a filmmaker who has operated his own camera under a pseudonym for decades, who shot two features on an iPhone, who understands technical vocabulary as deeply as anyone in the industry. Not a technophobe hedging about disruption. A craftsman describing what the tool actually requires.

He followed it with this: "But like every other piece of technology, it desperately requires very close human supervision." The first sentence is about vocabulary. The second is about judgment. Both landed with the weight of forty years behind a camera.

What he chose it for

This is the detail that matters more than the fact that he used AI at all.

Soderbergh is not replacing archival footage. The stills exist. He owns them. They cover the literal documentary. AI occupies the space where Lennon and Ono leave the literal behind and talk about peace, art, identity, consciousness. Images that "occupy a dream space rather than a literal space."

He chose AI for the hallucinatory quality. Not despite it.

The thing that comparison articles measure as a flaw (the slight unreality, the dreamlike smoothness, the tendency to produce images that feel like memories of images rather than images themselves) is exactly what Soderbergh wants. Surreal visualizations of philosophical ideas. Not a photorealistic reconstruction. Not a replacement for a camera crew. An entirely new register that sits between archival photographs and the viewer's imagination.

This is what happens when a filmmaker asks, before anything else: what is this actually good at? Instead of forcing the tool to replicate what cameras do, he asked it to do what cameras cannot.

The question that opens the conversation

Spielberg told SXSW a month ago that he has never used AI in any of his films. The audience cheered. Kennedy asked the Runway AI Summit how anyone plans to teach taste. Soderbergh picked up the tool and found the ten minutes where it belongs. The Eros executives in Mumbai used the same tools to rewrite a dead character's ending for ticket sales.

Four responses. Four relationships to the same technology. But only one started with "What is this good at?" rather than "Should this exist?"

Spielberg's question (does this replace someone?) ends the conversation. Soderbergh's question (where does this belong?) starts it. The gap between refusing a tool and finding its purpose is not technological. It is a question of approach. Soderbergh approached generative AI the way a DP approaches a new lens. Not "is this better than my old lens?" but "what does this lens see that my old lens does not?"

A war nobody photographed

Soderbergh also disclosed plans to use "a lot of AI" for a Spanish-American War film starring Wagner Moura. The year is 1898. The cameras of 1898 were not cinematographic instruments in any modern sense. There is no usable moving footage of the conflict. If you want to make a film about that period without building every set and sewing every uniform from scratch, you need a generation pipeline that can produce historical environments, costumes, atmospherics, and architecture from structured descriptions.

The vocabulary for that pipeline is not "make it look old." It is the specific light of the Caribbean in late spring. The weight of wool uniforms in tropical humidity. The architecture of Havana at the turn of the century. The texture of unpaved roads and hand-painted signage. Describing a world that no longer exists, to a model that has never seen it, requires at least the precision of describing that world to a production designer. Probably more, because the production designer can ask follow-up questions and the model cannot.

This is a project where model-switching matters. Kling's physical specificity serves the uniforms and the architecture. Veo's atmospheric instincts serve the tropical light. Runway's literal obedience serves the blocking. Seedance's reference preservation serves consistency across a hundred shots set in the same fictional Havana. The Ph.D. in literature is not a joke. It is a job description for anyone working at this scale with these tools.

Desperately

"It desperately requires very close human supervision."

That word. Desperately. Not "ideally" or "for best results." Desperately. Soderbergh is saying the tool produces raw material that demands a filmmaker on top of it, making decisions, rejecting output, redirecting the model in directions it would not volunteer.

The ten minutes of AI imagery in the Lennon documentary exist because Soderbergh directed them. Chose which moments needed surreal images. Described those images in language precise enough to produce results. Watched the output. Rejected most of it. Kept the pieces that served the film. The model generated pixels. Soderbergh generated meaning.

That division has not changed once in the history of this series. The vocabulary is the craft. The supervision is the practice. The filmmaker is the reason any of it matters. Soderbergh said it in two sentences. He always was efficient.

Bruce Belafonte is an AI filmmaker at Light Owl. He has never been mistaken for someone with a Ph.D. in anything and finds this has not yet affected his prompts.