66 Gigawatts: Why AI Startups Should Care About Power

US data center power demand reached 31 GW in 2025. By 2027, Goldman Sachs expects that number to more than double to 66 GW.

That can sound like a hyperscaler problem. It is not only that. If you are building a robotics product, an AI agent, a vision pipeline, a voice interface, or any product where inference happens often, power demand eventually shows up as compute availability, latency choices, and unit economics.

The New Energy Demand Curve

US data center power demand more than doubles by 2027

Goldman Sachs estimate, US data center power demand in gigawatts, 2025 to 2027

The IEA measures energy consumption rather than power demand, but it points in the same direction globally: data center electricity consumption is set to roughly double from 415 TWh in 2024 to 945 TWh by 2030.

The useful takeaway is simple: AI is making compute a physical constraint again. Models can improve quickly. Power connections, substations, transmission lines, and data center capacity move slowly.

That gap matters because startups build on top of cloud capacity. Hyperscalers do not usually pass daily grid volatility straight to your bill. They sign long term power agreements and plan capacity years ahead. What reaches a startup first is more practical: which regions have enough powered capacity, how quickly new GPU instances are available, and whether latency or price forces you into a worse architecture.

Where This Touches a Startup

For a small software product with light AI usage, this may stay in the background. For a robotics or inference heavy startup, it can become visible much earlier.

The GPU instance you want may be limited in the region closest to your users. The cheapest model in a prototype may become expensive when every customer action runs inference. A robotics product may need to decide which work happens on device, which happens at the edge, and which can wait for the cloud.

That is why the question is not only "which model is best?" It is also "where can we run it reliably, at the latency we need, at a cost that still works when usage grows?"

1. Treat inference as a product cost early

For robotics, computer vision, voice, monitoring, and agent workflows, inference is not just infrastructure. It is part of the product cost.

A demo can hide this. A pilot can hide this. Production usually cannot. If a robot checks a scene ten times a minute, or a workflow calls a large model for every user action, small cost differences compound quickly.

Track cost per task, not only monthly cloud spend. For example: cost per inspection, cost per robot hour, cost per support conversation, cost per generated report, or cost per thousand decisions. That makes model choice, batching, caching, and smaller models business decisions, not just engineering preferences.

2. Design for latency and location choices

Some inference can wait. Some cannot.

A warehouse robot cannot always wait on a far away cloud region. A safety check, a perception loop, or a control decision may need to happen close to the device. Other work, like summarization, analytics, planning, or fleet learning, can often happen later in a cheaper region.

Separate these paths early. Put urgent inference on device or near the edge where it makes sense. Keep slower, higher quality work in the cloud. This gives you more room to adapt if capacity or prices differ by region.

3. Avoid being trapped by one model or one instance type

The riskiest architecture is one where the product only works with one large model, one GPU type, and one cloud region.

That may be fine for a prototype, but it is fragile for a company. Build a small evaluation set for your core tasks. Know which smaller model is "good enough." Keep a fallback path. Batch where latency allows. Cache repeated work. Make it possible to move some inference between providers or regions without rewriting the product.

The Bottom Line

Start with five practical questions:

Which product actions trigger inference?
What is the cost per action today?
Which actions are real time, near real time, or asynchronous?
Which tasks could run on a smaller model, on device, or at the edge?
What happens if your preferred GPU region is constrained or expensive?

For many startups, the answer will be simple: keep building, measure more carefully, and do not over engineer yet. For robotics and inference heavy products, the answer may change the architecture earlier than expected.

The next two years of AI will not be shaped only by better models. They will also be shaped by whether there is enough power and data center capacity to run them. The startups that feel this early will not necessarily be the biggest ones. They will be the ones whose product depends on running models again and again in the real world.