Wanted: Advice from CS teachers
-
"Now I'm curious about whether LLMs' code compiles and executes error-free on their first attempt."
At first it did not, but they have added a routine to run it through a compiler until it at least runs without syntax errors and probably produces output that seems like what you asked for for a limited example of input.
This is a bolted on extra check, not some improvement in the base LLM.
But some people are acting like it does represent advances in the LLM.
@futurebird @EricLawton @david_chisnall @maco Are they though? The only sensible way to evaluate it is as a system — nobody uses the raw LLM, it's always through layers of API, tokenization, and now models or at least separate "trains of thought" leveraged against each other to refine the output. Using the tooling to conform output is a good hack to keep the systems able to deal with new things by using new tools instead of needing new training.
And it's not exactly an extra check — it's embedded in a feedback loop.
-
@futurebird @EricLawton @david_chisnall @maco Are they though? The only sensible way to evaluate it is as a system — nobody uses the raw LLM, it's always through layers of API, tokenization, and now models or at least separate "trains of thought" leveraged against each other to refine the output. Using the tooling to conform output is a good hack to keep the systems able to deal with new things by using new tools instead of needing new training.
And it's not exactly an extra check — it's embedded in a feedback loop.
@aredridel @EricLawton @david_chisnall @maco
I've had so many people say "it knows how to write code now" as if this is somehow ... new and different from generating text. As if there as been some foundational advancement and not just the same tool applied again.
-
@aredridel @EricLawton @david_chisnall @maco
I've had so many people say "it knows how to write code now" as if this is somehow ... new and different from generating text. As if there as been some foundational advancement and not just the same tool applied again.
@futurebird @EricLawton @david_chisnall @maco Yeah. And it really just is more and more precise force of the same sort. It does however end up at a qualitatively different place, with different impacts to the system of programming work itself because of it.
-
@futurebird @EricLawton @david_chisnall @maco Yeah. And it really just is more and more precise force of the same sort. It does however end up at a qualitatively different place, with different impacts to the system of programming work itself because of it.
@aredridel @futurebird @EricLawton @david_chisnall I’ve heard pricing on these is based on “tokens,” which I understand is in the tokenization/lex/yacc sense. I think that’s based on the number of tokens output, not input.
When it has to make two or three tries at generating code that actually compiles, does each attempt get charged, or just one?
-
@wakame @futurebird can confirm, I work on the standard library for a major programming language and my working assumption is “you can tell I’m writing a bug because my hands are moving”.
Which is why we have tens of thousands of tests and multiple code reviewers and elaborate compiler checking and teams of people dedicated to making sure everyone else’s code that uses my code still works and everyone “dogfoods” the changes and and and… stuff still slips through every once in a while.
@wakame @futurebird when I’m feeling extra dramatic I sometimes think of Diane Duane’s “So You Want To Be A Wizard”; we’re in a war against entropy itself, a sandcastle versus the tide, utterly unwinnable. And yet we show up every day and carve out a little place where something is Better for the people relying on our work. A bug fixed here, a tiny efficiency win multiplied by running a trillion times a day there.
“To these ends, in the practice of my Art, I will put aside fear for courage…”
-
@aredridel @futurebird @EricLawton @david_chisnall I’ve heard pricing on these is based on “tokens,” which I understand is in the tokenization/lex/yacc sense. I think that’s based on the number of tokens output, not input.
When it has to make two or three tries at generating code that actually compiles, does each attempt get charged, or just one?
@maco @aredridel @EricLawton @david_chisnall
I've only ever used free offers including a few I experienced in workshops, so I don't know about the pricing.
-
Wanted: Advice from CS teachers
When #teaching a group of students new to coding I've noticed that my students who are normally very good about not calling out during class will shout "it's not working!" the moment their code hits an error and fails to run. They want me to fix it right away. This makes for too many interruptions since I'm easy to nerd snipe in this way.
I think I need to let them know that fixing errors that keep the code from running is literally what I'm trying to teach.
treat it like a video game. each error is a life but you have to burn all your lifes to get assistance. start with 5 lives?
-
@aredridel @futurebird @EricLawton @david_chisnall I’ve heard pricing on these is based on “tokens,” which I understand is in the tokenization/lex/yacc sense. I think that’s based on the number of tokens output, not input.
When it has to make two or three tries at generating code that actually compiles, does each attempt get charged, or just one?
And if you're paying for it, there is an implied warranty that you'll get what you paid for.
Oh well, disputes w will be settled using lawyers with LLMs.
Which will further normalise the occupation of society by these corporate spokesbots.
-
Wanted: Advice from CS teachers
When #teaching a group of students new to coding I've noticed that my students who are normally very good about not calling out during class will shout "it's not working!" the moment their code hits an error and fails to run. They want me to fix it right away. This makes for too many interruptions since I'm easy to nerd snipe in this way.
I think I need to let them know that fixing errors that keep the code from running is literally what I'm trying to teach.
@futurebird
I teach intro courses at a university, which could be a little different but probably not completely. We often do "live coding" in class, with either me or a student at the keyboard while we solve a problem. Regardless of whether it's me or a student "driving", there are always lots of errors to fix, so it's an opportunity to model error-fixing as a normal, expected, creative activity. I always thank students for pointing out my boo-boos, which are plentiful. -
@aredridel @futurebird @EricLawton @david_chisnall I’ve heard pricing on these is based on “tokens,” which I understand is in the tokenization/lex/yacc sense. I think that’s based on the number of tokens output, not input.
When it has to make two or three tries at generating code that actually compiles, does each attempt get charged, or just one?
@maco @futurebird @EricLawton @david_chisnall That's complex and ever-changing, as business tuning is wont to do.
You usually pay for input and output tokens both, and thinking is part of that. But most people are using plans that give them some sort of semi-metered time-based access — five hours of time with the model and a token limit within that. It's a strange system.
Tokens are roughly in the lex/yacc sense, but they're a new thing, for LLM models. They're not precise parser tokens with parts of speech, but they are roughly "words”. Not exactly, since language is morphologically complex, and programming languages carry semantics in other granules, but the idea that they're words is not wrongheaded.
Others are going flat-fee (F/e, something like z.ai hosted GLM-4.7 is a flat fee per month, and quite low.)
(Also that one is interesting because cost to operate it figures are quite public. The model is public, the hardware requirements are about $15000, so you can do the math on it pretty easily to see what capital costs would be. Also environmental! Like that's 3 high end server GPUs, so a lot of heat, but also to humanize it, "less than heating a house" amounts of energy by far.)
-
@aredridel @futurebird @EricLawton @david_chisnall I’ve heard pricing on these is based on “tokens,” which I understand is in the tokenization/lex/yacc sense. I think that’s based on the number of tokens output, not input.
When it has to make two or three tries at generating code that actually compiles, does each attempt get charged, or just one?
@maco @aredridel @EricLawton @david_chisnall
I strongly suspect they are vastly undercharging and counting on building dependency and jacking up the prices later.
The free workshops seem to be all about that and didn't impress me much. But, I did get to play with the tech so I could better understand it which was worth it despite all of the sales pitch infused through the process.
-
@maco @aredridel @EricLawton @david_chisnall
I strongly suspect they are vastly undercharging and counting on building dependency and jacking up the prices later.
The free workshops seem to be all about that and didn't impress me much. But, I did get to play with the tech so I could better understand it which was worth it despite all of the sales pitch infused through the process.
@futurebird @aredridel @EricLawton @david_chisnall oh yes, I had the impression there was some of that going on. Some of the services did jack their prices up some time a year or two ago; I remember there being sticker shock. I think people expected what was free at first to go to like $20/mo and it actually went much higher.
But I have no details. I haven’t used any of it.
-
@maco @aredridel @EricLawton @david_chisnall
I strongly suspect they are vastly undercharging and counting on building dependency and jacking up the prices later.
The free workshops seem to be all about that and didn't impress me much. But, I did get to play with the tech so I could better understand it which was worth it despite all of the sales pitch infused through the process.
@futurebird @maco @EricLawton @david_chisnall Oof. A sales pitch embedded in it sounds miiiiiserable.
As far as pricing ... man it's hard to tell. The training of models is very expensive, and energy-consuming. That has to be amortized somehow. But the actual running takes only a little more than 'home computer' level. (and cruddier models do run on home computer scale things)
-
Wanted: Advice from CS teachers
When #teaching a group of students new to coding I've noticed that my students who are normally very good about not calling out during class will shout "it's not working!" the moment their code hits an error and fails to run. They want me to fix it right away. This makes for too many interruptions since I'm easy to nerd snipe in this way.
I think I need to let them know that fixing errors that keep the code from running is literally what I'm trying to teach.
@futurebird not a cs teacher, so feel free to disregard, but maybe you could split lectures up with students just taking notes during some examples, and following along with others? My favorite coding professor also often intentionally put in common errors to the examples he was doing, then asked the class what needed to be done to fix them
-
@maco @aredridel @EricLawton @david_chisnall
I strongly suspect they are vastly undercharging and counting on building dependency and jacking up the prices later.
The free workshops seem to be all about that and didn't impress me much. But, I did get to play with the tech so I could better understand it which was worth it despite all of the sales pitch infused through the process.
@futurebird i remember to have read the current prices are only 1% of what it would need to cover the cost.
And thats not yet including the increased energy prices caused by the co2 tax increses now coming.(not that any of the big companies would pay this) -
I'm kind of shocked that functions are hard. Are they hard for students who understand functions in the context of mathematics?
@futurebird@sauropods.win Yes, I would say so. Functions in math are different from functions in code. Mathematical functions look more like lookup tables or dictionaries. One sticking point is the flow of control: a function has a block of instructions that are not executed at the point where they're written in the source code. This is really confusing for some people, especially if they've just been taught that computers go through a list of instructions one by one, executing each in sequence.
Add in functions that have side effects, functions that don't return a value (procedures), functions that trap the rest of the execution (continuations), etc., and you're well outside of what most people understand mathematical functions to be like. The mathematical sine function can't make a network connection or write to a file or...
You can sometimes suss this out by comparing a function to a dictionary (or similar lookup type data structure). Those don't involve changes in the flow of control, and students tend to grasp what they're doing much faster. Students who grasp dictionaries sometimes cannot transfer that understanding to functions because of the flow of control issue, I think, so it can be helpful to probe whether they understand one but not the other and try to figure out why.
-
@wakame @futurebird can confirm, I work on the standard library for a major programming language and my working assumption is “you can tell I’m writing a bug because my hands are moving”.
Which is why we have tens of thousands of tests and multiple code reviewers and elaborate compiler checking and teams of people dedicated to making sure everyone else’s code that uses my code still works and everyone “dogfoods” the changes and and and… stuff still slips through every once in a while.
"If debugging is the process of removing bugs, then programming must be the process of putting them in." - Dijkstra
-
Example of the problem:
Me: "OK everyone. Next we'll make this into a function so we can simply call it each time-"
Student 1: "It won't work." (student who wouldn't interrupt like this normally)
Student 2: "Mine's broken too!"
Student 3: "It says error. I have the EXACT same thing as you but it's not working."
This makes me feel overloaded and grouchy. Too many questions at once. What I want them to do is wait until the explanation is done and ask when I'm walking around. #CSEdu
@futurebird Not much to add you haven't already thought of, but I agree with a lot of what you said and feel your frustration. If I need to keep going with the instruction like in your example (turning into a function next), I would tell the students with errors to shift to copying the new stuff down as notes, so they don't compound multiple errors if they continue to try coding along with me, ...
-
@futurebird Not much to add you haven't already thought of, but I agree with a lot of what you said and feel your frustration. If I need to keep going with the instruction like in your example (turning into a function next), I would tell the students with errors to shift to copying the new stuff down as notes, so they don't compound multiple errors if they continue to try coding along with me, ...
@futurebird ...then during the walk-around time, have them "ask three before me" so they can practice fixing each other's errors. I also like your idea (and should do more of it myself) of giving them code with errors as a warm up, and asking them to think about how to fix.
-
@aredridel @futurebird @EricLawton @david_chisnall I’ve heard pricing on these is based on “tokens,” which I understand is in the tokenization/lex/yacc sense. I think that’s based on the number of tokens output, not input.
When it has to make two or three tries at generating code that actually compiles, does each attempt get charged, or just one?
@maco @aredridel @futurebird @EricLawton @david_chisnall In general, they charge for both input tokens and output tokens, at different rates. For example, Claude Opus 4.5 charges $5/million input tokens, and $25/million output tokens.
In order for an LLM to keep track of the context of the conversation/coding session, you need to feed the whole conversation in as input again each time, so you end up paying the input token rate many times over.
However, there's also caching. Since you're going to be putting the same conversation prefix in over and over again, it can cache the results of processing that in its attention system. Some providers just do caching automatically and roll that all into their pricing structure, some let you explicitly control caching by paying for certain conversations to be cached for 5 minutes or an hour. So then, you pay once for the input and once for the caching, and then you can keep using that prefix and appending to it.
If you're paying like this by the token (which you do if you're just using it as an API user), then yeah, if it gets it wrong, you have to pay all over again for the tokens to correct it.
However, the LLM companies generally offer special plans for their coding tools, where you pay a fixed rate between $20 and $200/month, where you have a certain guaranteed quota but can use more than it if there's spare capacity, which can allow you to use more tokens for a lower price than if you just paid by the token. But of course it's not guaranteed; you can run out of quota and need to wait in line if the servers are busy.
And their tools handle all of that caching, selecting different models for different kinds of tasks, running external tools for deterministic results, etc.