Some report burning through their whole monthly "AI credit" allotment in a single day.
See full article...
See full article...
Last month I received a MS Teams message from my boss, asking us who might want a subscription to Co-pilot. I'd already seen that they were going to start charging on a per-token basis, and had already come to the conclusion that LLMs are not reliable enough for the things I might want to have someone else do for me, and so passed on the offer. I wonder much my colleagues who did ask for a subscription are actually using theirs. I have spending authority for the R&D money, so I think I need to call my college in the lab and see what it's costing her this month.It's odd that my AI costs have not changed at all. They are still stubbornly stuck at $0.00
Once a model, or a software program, is open source, and people start using the software and creating forks, I never seen the software program "fade away". At worst it just goes on archive.org
I remember when I used to be able to buy RAM. What a time that was to be alive.
You could torrent them or do other distributed-download systems. If, in this hypothetical, the big guys are cracking down on open-weights models to help armor their monopolies, then it seems like that torrents would be basically necessary.Hugging Face is VC-backed to the tune of several hundred million dollars as well as an undisclosed amount from Amazon and Meta. It’s revenue is in the tens of millions of dollars i.e. it’s incredibly unprofitable. Model weights run from tens to hundreds of gigabytes per download. Serving that amount of data is expensive. Some guy with a patreon isn’t going to be able to host these models if the big boys decide they don’t want to share any more.
For general purposes, I think the OSS-GPT-20B is good enough. It's good enough to write letters, proofread posts, and things that most people use GenAI for. It can also do some programming as well. The frontier models are more powerful, but uses much more energy.which is that the hyperscalers might "plateau" the capabilities of free models they release, while the frontline cloud models keep improving
Citation Needed.Sorry but this framing is disingenuous. Before there were no agents doing actual work, and now there are.
That’s the rub.
The previous plans were not built for what these things are capable of doing now (aka building apps end to end)
Workers will be replaced by AI, businesses will adopt it and it will become essential for their processes, and then the price will increase dramatically until it’s almost/just as expensive as it used to be when meatbags did the work. Except now all that money is being siphoned into a handful of megacorporations instead of going to countless millions of workers in the form of salaries.
This was always the plan. Like Uber burning VC cash to subsidise rides until the taxi industry died, then jacking up the prices to reach profitability.
The billionaires and future-trillionares are no longer content with selling you products and services. They’re now literally coming for your entire salary.
Unfortunately, and as nVidia and Micron and their kin are about to find out, when the gold rush runs its course, all that sweet money made selling tools, whisky and whores can stop very suddenly.Traditionally, the best way to make money in a gold rush is to sell tools
Sadly, Tokens is what is used, and one token is not equal to one word.I love how all these models are using "tokens" to obscure the actual dollar amount. Pretty great marketing when nobody is talking about a query being X amount of cash, but instead using a somewhat inscrutable digital token that people can't easily quantify.
Actually the worst spot is when the subscription (effectively) goes away, you are charged for usage and paying several thousand dollars a month, which might enough to justify hiring someone to control usage and minimize LLM costs.I've never even come close to using my Claude Max allocation, despite pretty significant development work, plus quite a bit of chat, but I've been eyeing a 5080 or 5090 to move my coding locally - I have ~$140 monthly spend on AI tools, and even if I have to keep the v0 sub (it's a next app/site visual design tool), my payback period is under a year, maybe a few months more if I'm factoring in the electricity cost.
That said, even though performance, in terms of throughput, might not be noticeably affected, Claude is being improved constantly, while a local model puts the onus on me for upgrades and additional training, which certainly isn't nothing. I'm obviously cost sensitive, but I'm working on client projects that will eventually hit production, so quality matters a lot as well, and I can pass some, if not all, of the cost on to the client (each client currently has a $50/mn software fee to cover the various web services, software licenses, plugins, etc, for their sites, so bumping that a few bucks is always an option).
I just don't want to be late adapting when the other foot drops; the worst spot to be would be having no GPU and nothing configured and receiving an email that my Claude sub is now going to be $500 or $1000 a month
So ... some tokens are more equal than others?Sadly, Tokens is what is used, and one token is not equal to one word.
I suspect DeepSeek V4 is orders of magnitude cheaper in part because it's still a text-based model, whereas all the US frontiers are MLLMs. It's a model that doesn't use all the bells and whistles, while our companies run ever more expensive models. Chinese MLLMs are cheaper but not on the scale of DeepSeek's latest price cut--see Kimi or GLM 5. Many US companies may have reservations about using any of these companies though...But is one actually cheaper or more expensive than the other to run, as opposed to the currently-charged price?
That's the issue at question, I think. If AI 1 raises prices then people go to AI 2; AI 2 raises prices and they go to 3; etc. Do they eventually run out of "cheap" AI?
(Someone up-thread mentioned Deepseek being cheaper per token, but I have to wonder if they're trying the same playbook with AI as rare-earths.)
If you assume that the electricity cost of 1 kWh is $0.15, running something that uses 1kW of power for a month is roughly $100. That's less than 2xH100. If doing inference in GPT-5 takes 10 gpus, that's $500/month. That's just the cost of running the GPUs, no operating costs, no cooling costs, for a single inference point.
This is what is going to pop the bubble, current AI usage is so subsidized that unless you develop much better smaller models or efficient agents, most of the current uses are simply not economically viable.
The key term in the post you quoted is "Local LLMs", cloud LLMs are "better" than Local LLMs today, but Local LLMs have matched the course and speed of cloud LLMs pretty well, there is no reason to believe this trend will not continue.