@EricLawton @maco @aredridel @futurebird @david_chisnall we don't know exactly how much it costs for the closed models; they may be selling at a loss, break even, or a slight profit on interference. But you can tell exactly how much inference costs with open weights models, you can run them on your own hardware and measure the cost of the hardware and power. And there's a competitive landscape of providers offering to run them. And open weights models are only lagging behind the closed models by a few months by now.
If the market consolidates down to only one or two leading players, then yes, it's possible for them to put a squeeze on the market and jack up prices. But right now, it's a highly competitive market, with very little stickiness, it's very easy to move to a different provider if the one you're using jacks up prices. Right now each of OpenAI, Anthropic, Google, and xAI are releasing frontier models regularly which leapfrog each other on various benchmarks, and the Chinese labs are only a few months behind, and generally release open weight models which are much easier to measure and build on top of. There's very little moat right now other than sheer capacity for training and inference.
And I would expect, if we do get a consolidation and squeeze, it would just be by jacking up prices, not by generating too many tokens. Right now inference is highly constrained; those people I work with who use these models regularly hit capacity limitations all the time. These companies can't build out capacity fast enough to meet demand, so if anything they're motivated to make things more efficient right now.
I have a lot of problems with the whole LLM industry, and I feel like in many ways it's being rushed out before we're truly ready for all of the consequences, but it is actually quite in demand right now.