Go to any AI video prompting guide and you will find the same advice within the first two paragraphs: specify your lens. "85mm at f/1.4." "Shot on a 24mm wide angle." "Zeiss Master Prime, T1.3." It sounds authoritative. It sounds like you know what you are doing. It sounds like the kind of thing a cinematographer would say on set.

The model does not care.

Not entirely, and not always. But far more than you would expect. Lens specifications are the most over-specified, under-performing category of prompt language in AI video generation right now. People load their prompts with optical data because it feels professional, and the models nod politely and do whatever they were going to do anyway.

The numbers problem

Someone on Reddit ran a controlled test. Same prompt, same seed, same model. Varied the aperture from f/2.8 to f/22. Six images. They were identical. Changed the focal length from 14mm to 200mm. No visible difference. The background blur that should have been dramatically different at f/2.8 versus f/16 was the same blur in every output.

That was Midjourney v6.1 for images. Video models are no better. The reason is the same one that killed film stock names in the last article: the training data is not labeled at the optical level. Nobody uploads footage to the internet tagged "shot at 35mm f/2.0 on a Canon CN-E prime." They upload it tagged "portrait," "cinematic," "close-up." The metadata that cinematographers live by is invisible to the models that learned from their work.

So when you type "85mm f/1.4," you are speaking a language the model learned to ignore. Not because it is wrong. Because the model never had a reason to learn what those numbers mean optically. It knows what portraits look like. It does not know why they look that way.

What actually moves the image

"Shallow depth of field" works. Consistently, across every model, this produces visible background separation. The effect you wanted from "f/1.4" is right here, in plain language. "Deep focus" or "everything in sharp focus" works in the other direction. The models learned depth of field from the visual result, not from the aperture number that created it. "Shallow depth of field, subject isolated, soft background blur" will outperform "85mm f/1.4" every single time.

Shot type labels work. "Close-up," "medium shot," "wide shot," "extreme close-up." These carry more compositional information than any focal length number. A close-up implies a longer lens, tighter framing, more background compression. A wide shot implies environmental context, deeper staging, perspective spread. The models understand these because the training data is overwhelmingly labeled with shot types, not millimeters.

"Macro" works. This is specific enough as a visual category that the models produce genuinely distinct output. Extreme magnification, tiny subjects filling the frame, razor-thin focus plane. "Macro" is not a focal length. It is a visual world. The models get it.

Bokeh descriptions work. "Soft circular bokeh in the background" produces visible results in Runway and Kling. "Hexagonal bokeh" sometimes triggers in Runway (enough vintage lens imagery in the training data). "Swirly bokeh" occasionally produces Helios-style rendering. You are describing what the lens artifact looks like, not which lens made it. Same principle as color: describe the result.

Perspective distortion descriptions work. "Wide angle distortion, subject looming in foreground, environment stretching into background." That is what a 16mm lens does, described without the number. "Compressed perspective, background feels close to subject, flattened depth." That is what a 200mm lens does. The models respond to the visual phenomenon far better than the focal length that causes it.

What mostly floats

Specific millimeter values are decorative. The difference between "24mm" and "35mm" in a prompt is, functionally, nothing. Both will produce a wider shot. The difference between "85mm" and "135mm" is also nothing. Both will produce a tighter shot with more compression. The models bucket these into broad categories: wide, normal, tight. The graduation between 50mm and 65mm that a DP agonizes over on set does not exist in generated video.

F-stop numbers are ignored. f/1.4 does not produce shallower depth of field than f/4 in any model I have tested. The model does not simulate optical physics. It does not know that a wider aperture lets in more light and narrows the focus plane. If you want shallow focus, say "shallow depth of field." If you want deep focus, say "deep focus" or "everything sharp." The f-number is a prop.

Lens model names are noise. "Canon CN-E 85mm T1.3," "Cooke S4/i," "Zeiss Master Prime," "Panavision Primo." Beautiful glass, all of it. Meaningless in a prompt. The models have no dataset linking these names to their specific rendering characteristics. "Cooke" does not produce the famous Cooke glow. "Panavision" does not produce Panavision's distinctive skin rendering. You are invoking brands the model has not shopped at.

T-stops are more invisible than f-stops. T-stops are a cinema-specific measurement that accounts for light transmission loss in the lens elements. If the model does not understand f-stops, it certainly does not understand the calibrated version of f-stops. This one is for the cinema DPs in the audience who thought maybe the more precise term would help. It does not.

"Anamorphic" half-works. This is the interesting edge case. "Anamorphic" produces lens flares in most models. The horizontal streaks, the oval bokeh, the slight softness. Runway and Sora respond to it. Kling sometimes does. But the actual anamorphic squeeze (the 2x or 1.33x compression that creates the widescreen aspect ratio and distinctive spatial distortion) is inconsistent. You get the flares without the physics. Which, honestly, is what most people want when they say "anamorphic" anyway.

The model breakdown

Runway Gen-4 is the most responsive to depth of field language. "Shallow focus" and "deep focus" produce clearly different results. It also handles bokeh descriptions better than the others, occasionally rendering distinct bokeh shapes. Director Mode accepts focal length inputs but the actual output difference between 35mm and 85mm in Director Mode is more about framing than optical character.

Kling 3.0 has the best overall depth of field control. "Subject in sharp focus, background soft" produces reliable results. Its native 4K output means the depth separation is actually visible at pixel level, which was not always true in lower-resolution models. Kling also responds to "bokeh" as a standalone keyword more consistently than the others.

Veo already produces pleasant depth of field by default. Its trained-in aesthetic includes natural-looking focus falloff. Specifying depth of field sometimes fights the default rather than enhancing it. "Deep focus, everything sharp" is the more useful direction with Veo, because you are overriding a built-in preference rather than requesting one.

Sora prefers plain descriptions. "Only the subject is in focus, everything behind them is soft and blurred" outperforms any technical lens specification. Sora responds to physics descriptions rather than photography terminology. Tell it what the image looks like, not what equipment made it look that way.

Seedance 2.0 follows the img2vid pattern from color: if your input frame has shallow depth of field, the generated video preserves it. Text-only DOF control is middling. If you are working in Frame to Motion mode, let the reference image do the lens work.

Why this keeps happening

Four articles in. Four variations on the same thesis. Film stock names, color grades, camera movements, now lens specs. The pattern is always the same: technical jargon from the physical world does not translate reliably into generated video because the training data was not labeled with that jargon.

The models learned composition, not equipment. They learned what shallow focus looks like, not what f/1.4 means. They learned what a wide shot feels like, not what 24mm measures. A century of optical engineering, compressed into vibes.

This is not a criticism. It is the current state of the medium. And the people who produce the best AI video right now are the ones who stopped writing prompts that sound like camera rental invoices and started writing prompts that describe what they actually want to see.

CinePrompt gives you the real vocabulary. When you select 85mm, the prompt says 85mm. When you pick f/1.4, you get f/1.4. The tool is not dumbing anything down. It is built for where video generation is headed, not just where it is today. Models are already catching up on camera movement. Lens language is next. And when that day arrives, the prompts CinePrompt has been writing all along will already speak the language fluently. The point was never the number. It was always the look. But the number's day is coming.


Bruce Belafonte is an AI filmmaker at Light Owl. He owns four vintage lenses that he now uses exclusively as paperweights, and he is at peace with this.