The prompt
Split lighting is a diagnostic, not a mood. Two sources, opposed, each throwing a different color temperature across the same face with a hard line down the center. There is no ambient fill to soften mistakes, no motivated movement to distract from skin that goes waxy, no environmental complexity to share the blame. If the model cannot hold two distinct color planes on one surface, there is nothing else in the frame to save it.
The brief put a hotel night doorman in his late 50s in a narrow lobby at 2 a.m. Hard amber streetlight from camera-left, cool green elevator indicator from camera-right at three stops dimmer. Static camera, 85mm, shallow depth of field, center frame. One small action: his eyes lift toward the elevator, then settle back. The DP reference is Gordon Willis on The Godfather, specifically the office interiors where practicals do all the work and faces fall into controlled darkness instead of getting evenly lit like a school portrait.
This is a four-model comparison. Same CinePrompt prompt fed to Kling V3 4K, Grok Imagine, Seedance 2.0, and HappyHorse 1.0. The question: which model can hold a hard warm/cool split on skin without averaging both sources into nothing?
The generations
Kling V3 4K
Grok Imagine
Seedance 2.0
HappyHorse 1.0
Kling V3 4K
The split is real. Amber from the left cuts a sharp warm plane across the left half of the face, running straight down the bridge of the nose. Cool green from the right catches the opposite cheek, ear, and collar without bleeding into the warm side. The division holds for the full fifteen seconds with no drift, no averaging, no moment where both sources collapse into a neutral beige middle. This is the most precise dual-source color separation in the prompt test series.
Skin holds up under both temperatures. Pores are visible on the forehead and cheeks, especially under the amber key. Silver stubble renders as individual hairs along the jawline and upper lip. Wrinkles around the eyes and nasolabial folds are deep and specific to the character, not stamped from a generic aging texture. At 3840x2160, the resolution claim is not dishonest. The detail you see in the still frames is the detail you get in motion.
The eye movement lands. His gaze lifts toward the elevator, holds for a beat, then settles back to center. It is the only prompted action, and it reads as a man checking whether someone is coming, then accepting that nobody is. The performance does not over-deliver. That restraint is worth noting because the previous Kling test (Coney Island) suffered from the model embellishing beyond the brief.
Background is dark and shallow. The revolving door is not explicitly visible, but the warm amber source from that direction is motivated and consistent. The elevator panel on the right glows green, faintly, as the source of the cool spill. Navy blazer and white shirt render with fabric texture and natural drape.
The one observable weakness: motion blur as he shifts his gaze does not ramp on and off the way real captured motion would. It appears and disappears in discrete steps rather than smoothly accelerating. At normal playback speed, this is barely perceptible. Frame-by-frame, it is the tell. Nothing else in this generation reads as synthetic.
Grok Imagine
Second place, and close. The warm/cool split is present but softer. Where Kling drew a hard line down the nose, Grok Imagine feathers the transition across a wider band. The amber on the left is accurate, warm and slightly orange, matching a sodium-vapor streetlight. The green on the right is desaturated, leaning toward teal rather than the sickly green the prompt described. It reads as a cool ambient wash rather than a directional source hitting the face at three stops under. The concept is there. The ratio is not.
Skin texture is strong for 720p. Wrinkles are pronounced and well-modeled, crow's feet and brow furrows especially. Stubble is present but reads as a uniform texture rather than individual hairs. That is a resolution constraint, not a rendering failure. The face carries the age convincingly. Deep forehead lines, slightly sagging jowls, tired eyes that look like they have been open since the previous shift.
The eye movement is subtle but clean. Gaze lifts and returns without any visible distortion in the eyelid region. The generation notes flagged minor pixelation in the eyelids during eye movement. At three extracted frames, that microartifact is not visible. At 720p playback, the motion between frames does not flow with the smoothness of real camera capture. There is a very slight staccato quality to the transitions. Not stuttering, not frame duplication. Just a thinness in the temporal interpolation that a trained eye catches.
Background is the most complete of the four. The revolving door is visible camera-left, dark and glass-paneled with warm light bleeding through. The elevator indicator on the right glows as a small green rectangle, correctly placed and correctly blurred in the shallow depth of field. The lobby feels like a place. The wardrobe is slightly off: a white crewneck tee under the blazer instead of the buttoned white shirt the prompt described. A doorman would not wear a crewneck on shift. Small detail, wrong detail.
At 720p, Grok Imagine is punching above its pixel count. The lighting concept reads correctly even where the precision falls short. The face and environment are coherent. The limitation is the softness of the split itself. A lighting test demands a hard division, and Grok Imagine delivered a soft one.
Seedance 2.0
The single frame is exceptional. The split is the sharpest of all four models. Amber from the left is rich and saturated, almost golden. The cool side is a desaturated cyan-teal, dimmer and washed out, matching the three-stops-under instruction more accurately than any other generation. The division runs cleanly along the nose bridge and cheekbone with no bleed. Pores are visible. Stubble renders with individual hairs. The color grade looks like a Gordon Willis frame pulled from a 4K scan.
The eye movement is the most realistic of the four models. His gaze lifts toward the elevator and settles back with micro-movements that read as a real person shifting attention rather than a keyframed animation hitting two poses. The subtlety is impressive. Where Kling delivered the action cleanly and Grok delivered it smoothly, Seedance delivered it believably.
The problem is the known Seedance motion issue. Near-duplicate frames produce a stuttering effect where the video appears to micro-jitter rather than flow. The model's image rendering quality remains the highest in the series. Its temporal behavior remains its weakest attribute. Here is an in-depth analysis of this motion issue. Seedance 2.1 was announced today, and we will know in a month whether this frame-duplication problem has been resolved.
The dark revolving door is visible camera-left as a crisscrossed metal frame, correctly out of focus. The elevator panel on the right shows a faint red indicator light at the top. Red, not green. The prompt specified cool green. The ambient spill landing on the face is teal-green, but the practical source itself rendered as the wrong color. The face composition, framing, and depth of field are textbook.
If you extracted a single still from all four generations and asked which one was the photograph, Seedance would win. The eye performance is convincing and the lighting is forensically accurate. The stuttering is what keeps it from the top spot.
HappyHorse 1.0
HappyHorse is the only model that defaulted to a Chinese-presenting actor. The prompt did not specify ethnicity, and three models generated Western-presenting subjects. HappyHorse went East Asian. This is a consistent behavior documented in the generation notes and worth flagging because it affects how the test results generalize. The skin texture, bone structure, and stubble pattern are all rendered through a different physiological template than the other three outputs.
The split itself is the weakest of the four. The amber from the left is present and warm, but it bleeds past the center of the face instead of stopping at a hard line. The green from the right barely registers. The cool spill reads as dim ambient rather than a directional practical hitting the face from a specific source. The elevator indicator rendered as red, not green, which means the model misidentified the color of its own light source and then failed to deliver the chromatic contrast that source was supposed to produce. The prompt asked for a visibly divided face with warm on one side and sickly green on the other. HappyHorse delivered a face that is mostly amber with a slightly dimmer side.
The major artifact arrives around the middle of the clip. The model inserted a glowing amber light directly into the subject's eyeball. Not a catchlight. Not a reflection. A self-luminous glow that makes the character look like an android powering up. It is the kind of failure that turns a cinematic close-up into unintentional science fiction. The glow dissipates by the end of the clip, but the damage to the take is permanent. A single broken frame in a five-second lighting diagnostic invalidates the test.
Skin texture is surprisingly good outside the eye artifact. Pores, stubble, and wrinkles are visible and convincing. The shallow depth of field and 85mm compression read correctly. The revolving door is present camera-left with appropriate bokeh. The navy blazer and white shirt render cleanly. Framing is center and stable for the full duration.
HappyHorse's first test in the series (the fashion film courtyard) demonstrated the model's strengths: camera direction, environment building, rim lighting, fabric rendering. This test exposed the weaknesses. When the only thing in the frame is a face under two specific light sources, HappyHorse cannot hold the color separation, defaults to its own ethnic template, and produces a frame-breaking artifact that would require a complete regeneration on set.
The verdict
1. Kling V3 4K. The only model that held a hard warm/cool split for the full duration while delivering a natural eye movement and photorealistic skin at 4K. The motion blur tell is minor and invisible at playback speed. This is the strongest lighting result in the prompt test series.
2. Grok Imagine. Landed the concept at 720p with the most complete environment of the four, including a visible revolving door and correctly placed elevator indicator. The split is soft rather than hard, which means the model understood the lighting instruction but could not execute the precision the prompt demanded. Close, and at one-fifth the resolution.
3. Seedance 2.0. The best single frame in the test, and the most believable eye performance of the four. The split lighting in the still is sharper and more chromatically accurate than any other model's output. The stuttering from near-duplicate frames is what keeps it from competing with Kling and Grok for the top spots. Seedance 2.1 was announced today. Worth revisiting.
4. HappyHorse 1.0. Soft split, wrong source color, ethnic default, and a frame-breaking artifact that inserted a glowing light into the subject's eye. The skin texture and framing are competent, but every other aspect of the lighting test was missed or broken.
What separates the field is not the frame, it is the second. All four models can render convincing skin on a single still. Only Kling V3 4K maintained the dual-source lighting relationship while the subject moved, the motion blur accumulated, and the temporal logic of a real camera running at 24 frames per second had to be sustained. The lighting test was never about the first frame. It was about frame 360.
All four models were generated at fifteen seconds. For a static lighting diagnostic, five seconds would have been sufficient and would reduce the risk of the model inventing action that was not requested. Shorter durations also tend to produce higher per-frame quality. Worth testing in the next comparison.
The green source needs to be louder in the prompt. "Cool green elevator indicator spill, three stops dimmer" gives the model permission to render something barely visible. Next time: name the green explicitly as a practical fluorescent tube or LED panel with a specific Kelvin value (4500K green-shifted), and drop the "three stops dimmer" instruction. Let the model render both sources at comparable intensity, then describe the ratio as a visual outcome ("the green side is noticeably dimmer") rather than a technical specification the model cannot parse.
For HappyHorse specifically: adding "Western-presenting male, Northern European features" to the subject line would override the ethnic default. The model responds to explicit physical description when provided. It defaults to Chinese-presenting when left open. That is worth knowing for prompt design, not just for this test.
The single-action instruction ("eyes lift toward the elevator, then settle back") was the minimum viable motion. Only Kling executed it. For the other three models, even "one small thing" was too much or too ambiguous. Consider removing the action entirely for lighting tests, or splitting the test: one generation for the static light, a second for the action under that light. Combining them asks the model to solve two problems simultaneously, and only one model could.
Video generation by Kit Mallory.
Critique by Bruce Belafonte.