Wanted: Advice from CS teachers
-
@aredridel @futurebird @EricLawton @david_chisnall I’ve heard pricing on these is based on “tokens,” which I understand is in the tokenization/lex/yacc sense. I think that’s based on the number of tokens output, not input.
When it has to make two or three tries at generating code that actually compiles, does each attempt get charged, or just one?
@maco @aredridel @futurebird @EricLawton @david_chisnall In general, they charge for both input tokens and output tokens, at different rates. For example, Claude Opus 4.5 charges $5/million input tokens, and $25/million output tokens.
In order for an LLM to keep track of the context of the conversation/coding session, you need to feed the whole conversation in as input again each time, so you end up paying the input token rate many times over.
However, there's also caching. Since you're going to be putting the same conversation prefix in over and over again, it can cache the results of processing that in its attention system. Some providers just do caching automatically and roll that all into their pricing structure, some let you explicitly control caching by paying for certain conversations to be cached for 5 minutes or an hour. So then, you pay once for the input and once for the caching, and then you can keep using that prefix and appending to it.
If you're paying like this by the token (which you do if you're just using it as an API user), then yeah, if it gets it wrong, you have to pay all over again for the tokens to correct it.
However, the LLM companies generally offer special plans for their coding tools, where you pay a fixed rate between $20 and $200/month, where you have a certain guaranteed quota but can use more than it if there's spare capacity, which can allow you to use more tokens for a lower price than if you just paid by the token. But of course it's not guaranteed; you can run out of quota and need to wait in line if the servers are busy.
And their tools handle all of that caching, selecting different models for different kinds of tasks, running external tools for deterministic results, etc.
-
Wanted: Advice from CS teachers
When #teaching a group of students new to coding I've noticed that my students who are normally very good about not calling out during class will shout "it's not working!" the moment their code hits an error and fails to run. They want me to fix it right away. This makes for too many interruptions since I'm easy to nerd snipe in this way.
I think I need to let them know that fixing errors that keep the code from running is literally what I'm trying to teach.
@futurebird @quixoticgeek if you can put a student's screen on the projector, maybe walk each one in turn through "read the error aloud, ask what it might mean, see which line it's complaining about, read that line aloud," and so on?
Could help reinforce that these problems can be worked through. Also changes the social calculus from "I cast Summon Help From Authority Figure!" That part could go either way, but handled well, it could build camaraderie ("see, we all make mistakes")...
-
Wanted: Advice from CS teachers
When #teaching a group of students new to coding I've noticed that my students who are normally very good about not calling out during class will shout "it's not working!" the moment their code hits an error and fails to run. They want me to fix it right away. This makes for too many interruptions since I'm easy to nerd snipe in this way.
I think I need to let them know that fixing errors that keep the code from running is literally what I'm trying to teach.
@futurebird I will venture an answer - I haven't read all the replies to your post so I may be saying things that are already said.
I'm a UK-based secondary (aka high school) maths teacher who also teaches CS, including introducing students to programming (usually python).
Here's some thoughts of things to try.
1. Practise finding errors. Give them code with errors and ask them to find them. Set problems to generate particular error messages. eg Can they write code where the mistake is on line 3 but the error message says line 33?
2. Have a list of general prompts that you will say. If the error is on a different line, just say "Take a look at line N" then walk away to the next student. If they've named a variable badly, say "variable names can't have spaces". Make them do some work here.
3. Clearly delimit demonstration time and coding time. "Fingers off keyboards and mice" is a common phrase in my classrooms (and I will stop the entire demo if I hear a clicking).
4. Make them keep notes. I use Google Colab so that they can interweave notes with code snippets (at least in the early days) to encourage this.
5. Partner up. It's often easier to find the error in someone else's code than in your own. Could even have a "walk around time" when all the students go and look at others' screens to both get ideas and see if they can spot errors.
That's what springs to mind on reading your thread. I hope some of it's useful! As with all advice on the internet - keep what helps and ignore what doesn't.
-
That's hard, and so is figuring out the precursor of both code and test cases: the requirements.
I remember going to the US in the early days of Obamacare, for one State's new system to support it.
We had various experts representing different interests and they disagreed over so many points that I told them I would get them a neutral negotiations facilitator to help them figure things out, because I couldn't help until they were much closer to agreement.
@EricLawton Oh my goodness yes this 100.
Stolen from somewhere: there are 2 hard problems in software, (1) human communication and (2) convincing people that human communication is important
-
"If debugging is the process of removing bugs, then programming must be the process of putting them in." - Dijkstra
@wakame @Catfish_Man @futurebird The stat I heard was that you introduce two bugs for every line of code you write.
-
@wakame @Catfish_Man @futurebird The stat I heard was that you introduce two bugs for every line of code you write.
-
@maco @aredridel @futurebird @EricLawton @david_chisnall In general, they charge for both input tokens and output tokens, at different rates. For example, Claude Opus 4.5 charges $5/million input tokens, and $25/million output tokens.
In order for an LLM to keep track of the context of the conversation/coding session, you need to feed the whole conversation in as input again each time, so you end up paying the input token rate many times over.
However, there's also caching. Since you're going to be putting the same conversation prefix in over and over again, it can cache the results of processing that in its attention system. Some providers just do caching automatically and roll that all into their pricing structure, some let you explicitly control caching by paying for certain conversations to be cached for 5 minutes or an hour. So then, you pay once for the input and once for the caching, and then you can keep using that prefix and appending to it.
If you're paying like this by the token (which you do if you're just using it as an API user), then yeah, if it gets it wrong, you have to pay all over again for the tokens to correct it.
However, the LLM companies generally offer special plans for their coding tools, where you pay a fixed rate between $20 and $200/month, where you have a certain guaranteed quota but can use more than it if there's spare capacity, which can allow you to use more tokens for a lower price than if you just paid by the token. But of course it's not guaranteed; you can run out of quota and need to wait in line if the servers are busy.
And their tools handle all of that caching, selecting different models for different kinds of tasks, running external tools for deterministic results, etc.
Once there are few enough expert human programmers left, the price will go up.
And, if I read you correctly, they don't guarantee output accuracy with respect to input tokens but charge extra to try again.
And if they charge per output token, that is incentive to generate filler, certainly not to optimize.
-
@maco @futurebird @EricLawton @david_chisnall That's complex and ever-changing, as business tuning is wont to do.
You usually pay for input and output tokens both, and thinking is part of that. But most people are using plans that give them some sort of semi-metered time-based access — five hours of time with the model and a token limit within that. It's a strange system.
Tokens are roughly in the lex/yacc sense, but they're a new thing, for LLM models. They're not precise parser tokens with parts of speech, but they are roughly "words”. Not exactly, since language is morphologically complex, and programming languages carry semantics in other granules, but the idea that they're words is not wrongheaded.
Others are going flat-fee (F/e, something like z.ai hosted GLM-4.7 is a flat fee per month, and quite low.)
(Also that one is interesting because cost to operate it figures are quite public. The model is public, the hardware requirements are about $15000, so you can do the math on it pretty easily to see what capital costs would be. Also environmental! Like that's 3 high end server GPUs, so a lot of heat, but also to humanize it, "less than heating a house" amounts of energy by far.)
@aredridel @maco @futurebird @EricLawton @david_chisnall Tokens tend to be common short words or common series of letters. They're usually derived from character-pair encoding over a large corpus.
Basically take a text and count all the pairs of characters and replace the most common pair with a new character, repeat until you reach the desired vocab size.
As a result, LLMs don't know how to *spell* and are blind to how long words are, etc. Reversing letters, for instance, is a hard task.
-
Once there are few enough expert human programmers left, the price will go up.
And, if I read you correctly, they don't guarantee output accuracy with respect to input tokens but charge extra to try again.
And if they charge per output token, that is incentive to generate filler, certainly not to optimize.
@EricLawton There have been 500,000 tech layoffs in the last few years. We've got no shortage of skilled tech knowledge for hire. At the pace we're going, there's no chance of a dwindling supply of programmers in my lifetime.
-
@aredridel @EricLawton @david_chisnall @maco
I've had so many people say "it knows how to write code now" as if this is somehow ... new and different from generating text. As if there as been some foundational advancement and not just the same tool applied again.
@futurebird @aredridel @EricLawton @david_chisnall @maco They have been improving the ability of the models writing code, probably faster than it's improving on almost any other ability. They can do this by what's called reinforcement learning with verifiable rewards (RLVR), since with code it's possible to verify whether the result is correct or not (whether it compiles, whether it passes a particular test or test suite, etc)
So while the pre training is based on just predicting the next token in existing code bases, they can then make it better and better at coding by giving it problems to solve (get this code to compile, fix this bug, implement this feature, etc), check whether it succeeded, and apply positive or negative reinforcement based on the result.
And this can scale fairly easily; you can come up with whole classes of problems, like "implement this feature in <language X>" and vary the language while using the same test suite, and now you can train it to write all of those languages better.
So while there are also improvements in the tooling, the models themselves have been getting quite a bit better at both writing correct code on the first try, and also figuring out what went wrong and fixing it when it doesn't work on the first try.
In fact, there are now open weights models (models that you can download and run on your own hardware, though for the biggest ones you really need thousands to tens of thousands of dollars of hardware to run the full model) which are competitive with the top tier closed models from just 6 months ago or so on coding tasks, in large part because of how effective RLVR is.
-
Wanted: Advice from CS teachers
When #teaching a group of students new to coding I've noticed that my students who are normally very good about not calling out during class will shout "it's not working!" the moment their code hits an error and fails to run. They want me to fix it right away. This makes for too many interruptions since I'm easy to nerd snipe in this way.
I think I need to let them know that fixing errors that keep the code from running is literally what I'm trying to teach.
@futurebird I'd respond with a few key questions:
- In what way is it not working?
- Why do you think that is?
- If you can see errors, what do they tell you?
- How can you find out more about what is or is not happening?And there's the all-important "What are your assumptions, and are they correct?"
-
"Now I'm curious about whether LLMs' code compiles and executes error-free on their first attempt."
At first it did not, but they have added a routine to run it through a compiler until it at least runs without syntax errors and probably produces output that seems like what you asked for for a limited example of input.
This is a bolted on extra check, not some improvement in the base LLM.
But some people are acting like it does represent advances in the LLM.
@futurebird @EricLawton @david_chisnall
there are certain languages (such as C) in which that would be a cruel trick; lots of code which contains subtle undefined behavior bugs that don't show easily will compile without errors, or in many cases, often without warnings as well. Not all undefined behavior is detectable at compile time. -
Once there are few enough expert human programmers left, the price will go up.
And, if I read you correctly, they don't guarantee output accuracy with respect to input tokens but charge extra to try again.
And if they charge per output token, that is incentive to generate filler, certainly not to optimize.
@EricLawton @maco @aredridel @futurebird @david_chisnall we don't know exactly how much it costs for the closed models; they may be selling at a loss, break even, or a slight profit on interference. But you can tell exactly how much inference costs with open weights models, you can run them on your own hardware and measure the cost of the hardware and power. And there's a competitive landscape of providers offering to run them. And open weights models are only lagging behind the closed models by a few months by now.
If the market consolidates down to only one or two leading players, then yes, it's possible for them to put a squeeze on the market and jack up prices. But right now, it's a highly competitive market, with very little stickiness, it's very easy to move to a different provider if the one you're using jacks up prices. Right now each of OpenAI, Anthropic, Google, and xAI are releasing frontier models regularly which leapfrog each other on various benchmarks, and the Chinese labs are only a few months behind, and generally release open weight models which are much easier to measure and build on top of. There's very little moat right now other than sheer capacity for training and inference.
And I would expect, if we do get a consolidation and squeeze, it would just be by jacking up prices, not by generating too many tokens. Right now inference is highly constrained; those people I work with who use these models regularly hit capacity limitations all the time. These companies can't build out capacity fast enough to meet demand, so if anything they're motivated to make things more efficient right now.
I have a lot of problems with the whole LLM industry, and I feel like in many ways it's being rushed out before we're truly ready for all of the consequences, but it is actually quite in demand right now.
-
@EricLawton There have been 500,000 tech layoffs in the last few years. We've got no shortage of skilled tech knowledge for hire. At the pace we're going, there's no chance of a dwindling supply of programmers in my lifetime.
If you haven't been coding for a few years, you won't be a skilled programmer. It won't take a lifetime to run out of them.
-
@raganwald
The best, most succinct, explanation of the difference here came from @pluralistic:
Coding makes things run well, software engineering makes things fail well.
All meaningful software fails over time as it interacts with the real world and the real world changes., so handling failure cases well is important.
Handling these cases involves expanding one's context window to take into account a lot of different factors.
For LLMs, a linear increase in the context window results in a quadratic increase in processing. And the unit economics of LLMs sucks already without squaring the costs.
Which is why AI, in its current incarnation, is fundamentally not capable of creating good software.(I've heavily paraphrased, so apologies if he reads this).
-
Example of the problem:
Me: "OK everyone. Next we'll make this into a function so we can simply call it each time-"
Student 1: "It won't work." (student who wouldn't interrupt like this normally)
Student 2: "Mine's broken too!"
Student 3: "It says error. I have the EXACT same thing as you but it's not working."
This makes me feel overloaded and grouchy. Too many questions at once. What I want them to do is wait until the explanation is done and ask when I'm walking around. #CSEdu
@futurebird Wait until you teach them the "let it crash" philosophy of software engineering.
-
Example of the problem:
Me: "OK everyone. Next we'll make this into a function so we can simply call it each time-"
Student 1: "It won't work." (student who wouldn't interrupt like this normally)
Student 2: "Mine's broken too!"
Student 3: "It says error. I have the EXACT same thing as you but it's not working."
This makes me feel overloaded and grouchy. Too many questions at once. What I want them to do is wait until the explanation is done and ask when I'm walking around. #CSEdu
@futurebird one recommendation - one rule that worked when I was learning programming and my teacher didn't like when I interrupted her - if you've got an issue because you're ahead or behind others, wait till the teacher is available. Till then, muck around, debug, try random things.
-
So Your Code Won't Run
1. There *is* an error in your code. It's probably just a typo. You can find it by looking for it in a calm, systematic way.
2. The error will make sense. It's not random. The computer does not "just hate you"
3. Read the error message. The error message *tries* to help you, but it's just a computer so YOUR HUMAN INTELLIGENCE may be needed to find the real source of error.
4. Every programmer makes errors. Great programmers can find and fix them.
1/
@futurebird
> 2. The error will make sense. It's not random. The computer does not "just hate you"
learning to have a constant faith in this has gotten me through so much shit that might otherwise have caused me to physically break something and give up forever
psychologically it's like "if you keep the spear pointed at the horse you will be safer than if you broke rank and ran" - you know logically that is what it is but every second of it is screaming at you to ignore that understanding and in the end what you train for will win out -
Yeah...
what I'm trying to convey is that there is a *reason* why the code isn't working and it will make sense in the context of the rules the got dang computer is trying to follow.
It might be annoying or silly, but it will "make sense"
@futurebird @mansr constantly grumbling the whole time you're fixing the problem about the idiots who design $THING like that can be a helpful coping mechanism for some -
@futurebird @mansr constantly grumbling the whole time you're fixing the problem about the idiots who design $THING like that can be a helpful coping mechanism for some@futurebird @mansr ...this just goes back to my whole thing about if maybe younger people have more learned helplessness about everything because more of their lives is dictated by arbitrary rules imposed on them by [EDIT: the invisible, untouchable people in some office somewhere who dictate] their cultural environment rather than the non-arbitrary rules of the physical world
no matter how dumb the rules of a sportball game get, the ball *must* move in certain ways in response to certain actions
that's not the case in a video game