A beam of light passing through toll gates labeled GPU, Host, Platform, Credits — each gate taking a cut
In this guide
1. The markup you don't see 2. How a generation request is actually priced 3. The provider landscape 4. Real pricing: image and video models 5. The open-source option: what you can run locally 6. Three ways to pay less 7. How CinePrompt handles this 8. The market is deflating

1. The markup you don't see

The raw GPU compute for a short AI video clip can cost a fraction of what you actually pay. The difference is margin, stacked three or four layers deep, from the company that owns the GPU to the platform that sells you credits.

Credits are designed to make the per-generation cost hard to calculate. They are an abstraction layer between you and the actual price. The real question is how many hands your money passes through before it reaches a GPU.

This guide maps the full supply chain. Where the compute actually lives, who marks it up and by how much, which models you can run locally for free, and how to pay the lowest rate for the best models available today. We update it monthly as pricing changes.

2. How a generation request is actually priced

Every AI video generation passes through a stack. Each layer adds cost.

The four layers of a generation request

Layer 1: The GPU. Raw silicon in a data center. An NVIDIA H100 rents for roughly $2.00 to $3.00 per hour on providers like RunPod, Lambda, and CoreWeave. The actual GPU time for a generation varies widely by model, resolution, and duration. This is the base cost of the compute, and it is a small fraction of what most creators end up paying.

Layer 2: The inference host. Someone loads the model weights onto those GPUs, optimizes the serving pipeline, manages queuing, keeps endpoints warm, and handles the API. This is where fal.ai, Venice, and EvoLink sit. They rent or own the GPUs from Layer 1 and sell you per-request access. A five-second Seedance 2.0 clip costs $2.35 to $3.40 at this layer depending on the provider.

Layer 3: The platform. Consumer-facing apps that wrap the API in a polished interface, add account management, billing, and sometimes their own proprietary models. They buy inference from Layer 2 providers and sell it to you at a higher price. This layer is where the biggest markup lives.

Layer 4: The credit system. Many platforms don't show you a dollar amount per generation. They sell credits in bundles and charge credits per generation, with conversion rates that shift by model, resolution, and duration. When a platform tells you a generation costs "80 credits," the question you should ask is: how much did those 80 credits cost me in dollars?

The four layers of AI video generation pricing — from GPU compute at the bottom to credit systems at the top

The markup between layers can be significant. The model weights are identical whether you access them through an API provider or through a consumer app. You're paying for the interface and the convenience of not needing an API key.

Two pricing models

Per-second. You pay a rate for each second of output video. This is the most transparent model and the standard among API providers. A ten-second clip at $0.10 per second costs $1.00. Simple.

Credits. You buy a bundle of credits, then spend credits per generation. The credit cost varies by model, resolution, and duration. This is the least transparent model and the one most consumer platforms use. The answer to "how much did that cost in dollars?" is rarely easy to find.

3. The provider landscape

There are three ways to access AI generation models. Each one sits at a different point on the price-versus-convenience spectrum.

A. API-direct providers

You get an API key, you pay per request at published rates. No credit bundles, no conversion math.

fal.ai is the largest inference platform. Over 600 models, fast serving, polished SDKs. The benchmark that other providers price against. Used by Canva, Adobe, and Shopify under the hood. Has a playground for testing, though the interface is built for developers rather than day-to-day creative work.

Venice is the exception in this category. It has a full consumer app with a good interface for generation, plus a complete API with per-request pricing. It's the only provider where you can generate comfortably in a browser and also wire up programmatic access. Venice also adds a privacy layer, stripping your identity before forwarding requests. For open-source models, Venice runs inference on decentralized GPU networks. For proprietary models like Grok or Gen-4.5, Venice proxies to the original provider's API with your identity removed.

EvoLink routes to optimized GPU clusters, likely including Chinese compute infrastructure. Pricing on Chinese-origin models (Seedance, Wan) can be significantly lower than Western providers. The tradeoff is smaller model catalog and less mature documentation.

Others worth knowing: Atlas Cloud (vertically integrated, owns GPU clusters, competitive pricing), Replicate (1,000+ community models, Docker-based), WaveSpeed (exclusive models, high SLA), Novita AI (GPU infrastructure with inference API).

B. Consumer platforms

Credit-based pricing, polished UI, opaque per-generation costs.

Runway has evolved beyond its own Gen-4.5 model. It now offers Seedance, HappyHorse, Kling, and others as a multi-model marketplace. You still pay Runway's credit markup on models you could access directly for less. The value proposition is their editing UI and their proprietary model, not their pricing. Gen-4.5 is available on both Runway and Venice.

Higgsfield positions itself as the affordable option. Credit-based, heavy on marketing. They appear to be buying inference from Atlas Cloud (Atlas lists them as a customer) and reselling with margin. The "affordable" framing is relative to Runway, not relative to API-direct access.

Kling, Hailuo, Pika all use credit systems that make per-generation costs difficult to calculate. That difficulty is intentional.

C. Self-hosted

You rent a cloud GPU on RunPod, Lambda, or Vast.ai and run models yourself, typically through ComfyUI.

This is the cheapest option at high volume for open-source models. But there is a critical distinction most people miss: for proprietary models like Seedance 2.0 and Kling V3, ComfyUI's "Partner Nodes" are API wrappers. Your request still goes to the cloud. You still pay the API price. ComfyUI does not make proprietary models cheaper. It only saves money on truly open-source models that you can download and run on your own hardware.

4. Real pricing: image and video models

Per-image for image models, per-second for video models. For credit-based platforms (Runway, Higgsfield), we converted credits to approximate dollar values. All prices as of May 2026.

Model fal.ai Venice Atlas Cloud EvoLink Runway Higgsfield
Image Models (per image)
Nano Banana Pro (4K) $0.32 $0.37 $0.15 $0.25 40 cr (~$0.62) 4 cr (~$0.25)
GPT Image 2 (4K) $0.42 $0.84 $0.41 $0.36 41 cr (~$0.64) 12 cr (~$0.75)
Video Models (per second, with audio)
Seedance 2.0 (1080p) $0.68/s $0.47/s $0.46/s $0.50/s 40 cr/s (~$0.62/s) 45-60 cr/s (~$0.56/s)†
Kling V3 4K $0.42/s $0.46/s n/a $0.40/s 43 cr/s (~$0.67/s) 6 cr/s (~$0.38/s)†
HappyHorse 1.0 (1080p) $0.28/s $0.26/s $0.28/s $0.22/s 30 cr/s (~$0.47/s) 8 cr/s (~$0.50/s)†
Grok Imagine Video (720p) $0.07/s $0.09/s n/a $0.02/s n/a 1.54 cr/s (~$0.10/s)†
Veo 3.1 (4K) $0.35/s $0.44/s $0.60/s $0.33/s 40 cr/s (~$0.62/s)* 11 cr/s (~$0.69/s)*†

"n/a" = model not available on that provider. Runway dollar estimates based on $35/month Standard plan (2,250 credits, ~$0.0156/credit). †Higgsfield dollar estimates based on $1 = 16 credits (lowest purchase tier). Credit-per-generation rates on Higgsfield change frequently, with "discounts" that shift the effective cost without notice. The Seedance range (45-60 cr/s) reflects the spread between their listed "discounted" and "regular" credit costs. *Veo 3.1 available at 1080p only on Runway and Higgsfield. Google's Gemini Omni video model will be added when its API launches. All prices subject to change.

What to look for

The same model through different providers can vary by two to three times. The model weights are identical. The output is identical. The price difference is pure margin at different layers of the stack.

Provider selection is the single biggest lever you have for reducing costs. Which model you choose matters for quality. Which provider you access it through matters for price.

5. The open-source option: what you can run locally

Before comparing API prices, it is worth knowing which models you can run for free on your own hardware and which ones require paying someone else no matter what.

Model Open Source? Run Locally? Min VRAM
Wan 2.2 / 2.7 Yes (Apache 2.0) Yes 24 GB
LTX 2.3 Yes (Lightricks) Yes 12-24 GB
HappyHorse 1.0 Claimed (murky) Unclear 24 GB+
Seedance 2.0 No (ByteDance) No API only
Kling V3 No (Kuaishou) No API only
Veo 3.1 No (Google) No API only
Gen-4.5 No (Runway) No API only
Grok Imagine No (xAI) No API only

Two models stand out for local use. Wan 2.2 is the most popular open-source video model. Open weights, Apache 2.0 license, strong community support in ComfyUI. LTX 2.3 from Lightricks is a 22-billion parameter model that can be quantized down to run on 12 GB VRAM cards, though efficient generation requires 24 GB VRAM and 128 GB system RAM.

The hardware cost is real. An NVIDIA RTX 4090 (24 GB VRAM) runs around $1,600. A five-second clip on LTX 2.3 takes forty-five minutes to an hour on local hardware. The same clip takes thirty to ninety seconds via API. Local generation is cheaper per clip at volume but dramatically slower.

The quality gap is also real. Open-source models are improving fast, and many generation tasks can be completed with Wan or LTX. But the current frontier for prompt adherence, audio sync, and naturalism belongs to the proprietary models. Seedance 2.0 and Kling V3 produce output that Wan and LTX cannot match today. That gap is closing, but it has not closed.

Local generation makes sense for iteration, experimentation, and workflows where time is not a constraint. For client work on a deadline, API access to the best models is worth the cost.

6. Three ways to pay less

Three paths diverging — BYOK (direct), Self-Host (winding but cheap), Platform (smooth but toll-gated)

A. Use API-direct providers with your own key

Get an API key from fal.ai, Venice, or EvoLink. Pay per request at published rates. No credit conversion, no platform margin. This is the single biggest cost reduction available to most creators. You move from Layer 3/4 pricing to Layer 2 pricing and the savings are significant.

The barrier is that most API providers have developer-oriented interfaces. Venice is the exception with a strong consumer app. For the rest, you need either comfort with an API playground or a front-end tool that connects to these providers on your behalf.

B. Self-host open-source models

Rent a cloud GPU on RunPod or Lambda ($2 to $4 per hour for an H100), install ComfyUI, download Wan 2.2 or LTX 2.3, and generate with no per-request fees. At high volume this is the cheapest path. If you are generating a hundred or more clips per day, the math favors self-hosting quickly.

The tradeoffs: you are limited to open-source models (no Seedance, no Kling, no Veo). Generation is slow on local hardware. Setup and maintenance require technical comfort. This path makes sense for experimentation, batch work, and creators who are willing to trade time for cost savings.

C. Use consumer platforms strategically

Consumer platforms are not always the wrong choice. Gen-4.5 on Runway (or Venice) is a proprietary model worth paying for when the shot requires it. Kling's app has a usable free tier. Some platforms offer trial credits that can stretch further than an API key for light, occasional use.

The mistake is defaulting to a consumer platform for every generation. Use them for proprietary models and free tiers. Use API-direct providers for everything else.

7. How CinePrompt handles this

CinePrompt connects directly to inference APIs. You bring your own API key, you pay the provider at their published rate, and we add zero markup on your generations. The prompt intelligence is our product. The inference is yours.

We currently integrate fal.ai and Venice as our primary providers, with additional providers that we rotate based on current market conditions. When a provider offers better pricing, faster inference, or new models worth accessing, we add them. When a provider's quality or reliability drops, we swap them out. You don't need to track the provider landscape. We do that.

This matters because of a problem that the pricing tables above don't capture: most API providers have interfaces built for developers, not creators. fal.ai has a playground. Atlas Cloud has a basic testing UI. These work for verifying an integration, but they are not where you want to spend an afternoon creating.

CinePrompt is a cinematography-aware prompt builder with forty-plus fields for camera, lens, lighting, movement, sound, and subject. It works across any provider we integrate, which means you get a creative tool on top of whoever has the best pricing and models this month.

There is a cost argument here that goes beyond per-second rates. A well-constructed prompt that produces the shot you want on the first or second try costs less than a vague prompt that requires eight re-generations to get close. The cheapest generation is the one you don't have to redo.

8. The market is deflating

Every price in this guide will be lower in six months.

Per-token inference costs have dropped over 80 percent in the past year. H100 GPU rental pricing fell 64 to 75 percent in fourteen months. Alphabet, Amazon, Meta, and Microsoft are spending over $650 billion on new data center construction in 2026 alone. Nearly 100 gigawatts of new capacity is expected to come online between 2026 and 2030, doubling global data center capacity. The supply of compute is growing faster than demand can absorb it.

This is structural deflation, not a promotional cycle. More GPUs exist every quarter. Newer chips deliver more inference per watt. Open-source models keep closing the quality gap with proprietary ones. The floor drops continuously.

The question for creators is whether your platform passes those savings to you or pockets them. Per-second API providers have to lower prices because their competitors do. When Atlas Cloud drops Seedance pricing, fal.ai feels pressure to match. The market is transparent and competitive.

Credit-based platforms face no such pressure. A credit is worth whatever the platform says it is worth. When inference costs drop 30 percent, a credit-based platform can absorb the savings as margin without changing a single number on its pricing page. You would never know. The credit insulates the platform from having to compete on price.

BYOK on per-second APIs ensures you always pay the current market rate. As compute gets cheaper, your generations get cheaper automatically. No negotiation, no plan change, no asking for a discount. The price is the price and the price keeps falling.

The best investment is not chasing the cheapest API. It is building prompt craft that produces the result you want on the first try. A ten-cent-per-second rate that requires one attempt costs less than a five-cent-per-second rate that requires four. Prompt quality is the multiplier. Provider selection is the base. Get both right and the economics of AI filmmaking work in your favor every month more than the last.

Frequently asked questions

What is the cheapest AI video generation API?
For top-tier models, EvoLink currently offers the lowest per-second rates on several models including HappyHorse 1.0 ($0.22/s) and Grok Imagine Video ($0.02/s). Atlas Cloud and Venice are competitive on Seedance 2.0. The cheapest option depends on which model you need. API-direct providers are consistently 30 to 70 percent cheaper than credit-based consumer platforms for the same models.

Is ComfyUI cheaper for AI video generation?
Only for open-source models like Wan 2.2 and LTX 2.3 that you can download and run on your own GPU. For proprietary models like Seedance 2.0 and Kling V3, ComfyUI's Partner Nodes are API wrappers. Your request still goes to the cloud. You still pay the API price.

What does BYOK mean?
Bring Your Own Key. Instead of paying a platform's credit markup, you get an API key directly from an inference provider and pay their published per-second rate with no middleman.

Are credit-based platforms more expensive?
Generally yes. Our comparison shows Runway charges roughly $0.62 to $0.67 per second for models that cost $0.35 to $0.50 per second through API-direct providers. Credit-based platforms also have less incentive to pass along cost savings as compute gets cheaper.

Which AI video models can I run locally for free?
Wan 2.2/2.7 and LTX 2.3 are the main open-source options. Most top models including Seedance 2.0, Kling V3, Veo 3.1, and Gen-4.5 are proprietary and API-only. Local generation is slower (45 to 60 minutes per 5-second clip) but free at the point of use.

Will AI video generation get cheaper?
Yes. Per-token inference costs dropped over 80 percent in the past year. Over $650 billion in new data center construction is underway in 2026. Using API-direct providers with per-second pricing ensures you automatically benefit from falling costs.


This guide is updated monthly as provider pricing changes. Last updated May 2026.

Try CinePrompt →