Baseten’s $300 million financing round isn’t just another oversized AI check, it’s a clean signal that the center of gravity in artificial intelligence has shifted, and pretty much for good. With IVP, CapitalG, and NVIDIA anchoring the round, the company now sits at a $5 billion valuation after its third fundraise in just one year, a pace that would have sounded absurd not long ago but now feels strangely logical. What investors are buying into here is not a model, not a chatbot, not a demo, but the plumbing of the next AI decade: inference. And that’s the key word, the one quietly replacing “training” in boardrooms, cloud roadmaps, and startup pitch decks. Training made the headlines, inference will make the money, and Baseten is positioning itself as the default place where real AI products actually run.
For six years, Baseten has been building in relative silence, focusing on something that was deeply unglamorous until suddenly it wasn’t: making models run fast, reliably, and cheaply in production. Today, its infrastructure sits behind companies like Cursor, Notion, Abridge, Clay, OpenEvidence, and others that define the new generation of AI-native software. These are not toy apps, they are workflow systems, medical documentation engines, developer copilots, and knowledge tools that live or die by latency, uptime, and cost predictability. The quotes from customers almost read like a quiet revolt against the old cloud model: performance matters, yes, but reliability, developer experience, and cost discipline matter more when AI is embedded into daily operations. That’s where Baseten has been winning, not by being flashy, but by being boring in exactly the right way.
The timing of this round tells the real story. Analysts now estimate that inference will account for roughly two-thirds of all AI compute by the end of 2026, up from just one-third in 2023, a reversal that mirrors what happened in cloud computing once applications matured. Training creates possibility; inference creates businesses. Once models are deployed into real workflows, every millisecond of latency, every dropped request, every unexpected cost spike becomes existential. Baseten’s entire thesis is built around that reality, offering companies the ability to run many models, own their IP, and control their infrastructure without being locked into proprietary runtimes or hyperscaler whims. It’s a subtle but powerful shift away from dependence on a few massive foundation model vendors toward an ecosystem of thousands of specialized, domain-specific models, each optimized for a specific job.
What makes this round especially telling is who is backing it. NVIDIA’s involvement isn’t decorative, it’s strategic. Inference-heavy workloads mean sustained GPU demand over time, not just one-off training runs, and platforms like Baseten turn that into a durable revenue stream. CapitalG and IVP, meanwhile, are betting that inference platforms will become as foundational as AWS was to Web 2.0, invisible but indispensable. When Baseten’s CEO says inference is to this generation what cloud was to the last, it doesn’t sound like marketing fluff anymore; it sounds like a description of what is already happening under the surface. The AI boom is maturing, quietly, operationally, and somewhat unromantically, but this is exactly where the next trillion-dollar layer gets built. Baseten isn’t chasing the spotlight, it’s wiring the stage, and that may turn out to be the smartest place to stand.