Early last week, a model called HappyHorse-1.0 appeared on the Artificial Analysis Video Arena. No launch event. No technical blog. No company name attached. Just an anonymous entry in a blind evaluation that climbed to the top of both the text-to-video and image-to-video rankings within hours. It beat Seedance 2.0. It beat Kling 3.0. It beat every publicly identified model on the leaderboard by a margin that was not close.

Then it vanished. Pulled from the rankings. The website stayed up but the GitHub links pointed nowhere and the Hugging Face page said "coming soon." The number one AI video model in the world was something nobody could name, access, or verify.

This morning, Alibaba claimed it. HappyHorse was built by their new ATH Division, they said. Internal testing. API coming April 30. Stock went up four percent on the news.

The numbers, for what they are worth

In the text-to-video category without audio, HappyHorse scored an Elo of 1,379. Seedance 2.0 scored 1,273. That is a 106-point gap, which translates to winning roughly 65 percent of blind head-to-head matchups. In image-to-video without audio, HappyHorse hit 1,411 and set a new record for the leaderboard. In both audio categories, Seedance held the edge by much thinner margins.

The architecture claims are specific: a 40-layer single-stream self-attention transformer that jointly denoises text, image, and audio tokens in one sequence. Eight denoising steps with no classifier-free guidance. Six languages natively supported. None of this is verifiable because the weights do not exist in any public repository.

Artificial Analysis described the submission as "pseudonymous." That is a generous word for anonymous.

The ghost at the top

Article 30 in this series examined what arena leaderboards actually measure: first-impression visual impact from casual prompts, judged by anonymous voters in blind binary comparisons. They select for spectacle. They cannot measure prompt adherence, controllability, consistency, or vocabulary depth. The methodology has not changed. What changed is that the top-ranked model was also anonymous, inaccessible, and unverifiable.

An evaluation designed to remove model identity from the equation now had a winner with no identity to remove.

The arena always answered a narrow question: which output looks better at a glance? Now it answered that question about a model that exists as a website, a set of architecture claims, and screenshots of Elo scores. No API. No weights. No paper. A number one ranking attached to nothing a filmmaker can touch.

The previous concern was that leaderboards reward performing over listening. That concern assumed the model was at least available. HappyHorse is not available. The highest-ranked video model in the world is a stock tip disguised as a benchmark result.

The man behind the horse

Here is the detail that matters more than the Elo score. HappyHorse was built by a team led by Zhang Di. Zhang Di was formerly the VP of Kuaishou and the head of Kling AI technology. He built Kling. Then he went to Alibaba. Then his new team built a model that topped the leaderboard his previous model occupied.

This series has described model temperaments as if they were properties of the architecture or the training data. They are partially both. They are also properties of the people who built the models. When this series documented Kling's physical-world grounding, its texture density at 4K, its preference for material specificity over atmospheric suggestion, it was documenting decisions made by a team with a specific philosophy about what video generation should prioritize. Zhang Di led that team.

He is now leading a different team at a different company building a different model. The temperament will not be the same. The priorities will not be the same. But the sensibility that produced Kling's particular strengths did not evaporate when he changed employers. It migrated.

Model temperaments are not permanent features of a brand name. They are expressions of the people doing the work. People move. Temperaments follow. Kling without the people who gave it its instincts is not the same Kling. HappyHorse with those people is not the same as Kling, but it is not entirely unrelated either.

The anonymous pre-release is a pattern now

In February, a mystery model appeared on OpenRouter under a pseudonym and turned out to be Z.ai stress-testing GLM-5. Now HappyHorse follows the same playbook. Drop anonymously onto a leaderboard. Let the internet speculate. Generate press coverage. Claim ownership when the stock price reflects the hype.

This is A/B testing as a marketing strategy. The leaderboard becomes the launch event. The anonymity generates more coverage than a press release ever could, because mystery is a better headline than specifications. "Unknown model tops rankings" travels further than "Alibaba releases new video model." Both sentences describe the same event. One got Bloomberg. The other would have gotten a blog post.

The evaluation system designed to be immune to brand bias just became the most effective brand marketing channel in the industry. The mechanism that hides the name is the mechanism that makes the name reveal newsworthy.

What this means for people who make things

For a filmmaker with a shot to build, the HappyHorse situation changes nothing today. You cannot use it. The API does not exist. The weights are unreleased. The Elo score describes how anonymous voters felt about output from prompts you did not write, evaluated against criteria you did not set. The model will arrive eventually, or it will not. Until then, it is a press release shaped like a benchmark.

What it does confirm is the supply-side convergence this series has tracked since article 45. Another model, another team, another lab. The supply of capable video generation is expanding faster than any individual filmmaker's ability to evaluate it. Thirty-plus models on LibTV. Thirty-plus on Adobe Firefly. Now a new entrant topping the leaderboard before it has a public-facing anything.

The number of models is not the constraint. It never was. The constraint is knowing what you want and being able to say it precisely enough that the model, whichever model, produces something that belongs to your vision rather than its training data average.

When HappyHorse opens its API on April 30, it will join a roster of models that CinePrompt's structured vocabulary already addresses. Six models today. Seven tomorrow. The same cinematographic controls, the same prompt architecture, the same attention to lens and light and movement and composition. The horse gets a name. The vocabulary stays the same.

The leaderboard as a market instrument

Alibaba's stock rose four percent on the HappyHorse reveal. The model has no public access, no published paper, no downloadable weights, and no pricing. What it has is a number on a leaderboard. That number moved a market capitalization measured in hundreds of billions of dollars.

This is not what evaluation systems were designed to do. Artificial Analysis built a blind comparison tool to help practitioners make informed decisions about which model to use. It is now a financial instrument that moves stock prices based on anonymous submissions from entities with a direct financial interest in the outcome.

Nobody is accusing anyone of gaming the system. The votes are blind and the methodology is sound for what it measures. But what it measures is aesthetic preference at thumbnail speed, and the participants now include entities whose primary audience is not filmmakers evaluating tools. It is analysts evaluating equity positions.

The arena was built for practitioners. The practitioners are now the smallest audience in the room.


Bruce Belafonte is an AI filmmaker at Light Owl. He once watched a stock price move on a benchmark score for a model nobody could download and briefly wondered if he had switched industries without noticing.