Wanted: Advice from CS teachers

unlambda@hachyderm.io

@maco @aredridel @futurebird @EricLawton @david_chisnall In general, they charge for both input tokens and output tokens, at different rates. For example, Claude Opus 4.5 charges $5/million input tokens, and $25/million output tokens.

In order for an LLM to keep track of the context of the conversation/coding session, you need to feed the whole conversation in as input again each time, so you end up paying the input token rate many times over.

However, there's also caching. Since you're going to be putting the same conversation prefix in over and over again, it can cache the results of processing that in its attention system. Some providers just do caching automatically and roll that all into their pricing structure, some let you explicitly control caching by paying for certain conversations to be cached for 5 minutes or an hour. So then, you pay once for the input and once for the caching, and then you can keep using that prefix and appending to it.

If you're paying like this by the token (which you do if you're just using it as an API user), then yeah, if it gets it wrong, you have to pay all over again for the tokens to correct it.

However, the LLM companies generally offer special plans for their coding tools, where you pay a fixed rate between $20 and $200/month, where you have a certain guaranteed quota but can use more than it if there's spare capacity, which can allow you to use more tokens for a lower price than if you just paid by the token. But of course it's not guaranteed; you can run out of quota and need to wait in line if the servers are busy.

And their tools handle all of that caching, selecting different models for different kinds of tasks, running external tools for deterministic results, etc.

geeksam@ruby.social

@futurebird @quixoticgeek if you can put a student's screen on the projector, maybe walk each one in turn through "read the error aloud, ask what it might mean, see which line it's complaining about, read that line aloud," and so on?

Could help reinforce that these problems can be worked through. Also changes the social calculus from "I cast Summon Help From Authority Figure!" That part could go either way, but handled well, it could build camaraderie ("see, we all make mistakes")...

loopspace@mathstodon.xyz

@futurebird I will venture an answer - I haven't read all the replies to your post so I may be saying things that are already said.

I'm a UK-based secondary (aka high school) maths teacher who also teaches CS, including introducing students to programming (usually python).

Here's some thoughts of things to try.

1. Practise finding errors. Give them code with errors and ask them to find them. Set problems to generate particular error messages. eg Can they write code where the mistake is on line 3 but the error message says line 33?

2. Have a list of general prompts that you will say. If the error is on a different line, just say "Take a look at line N" then walk away to the next student. If they've named a variable badly, say "variable names can't have spaces". Make them do some work here.

3. Clearly delimit demonstration time and coding time. "Fingers off keyboards and mice" is a common phrase in my classrooms (and I will stop the entire demo if I hear a clicking).

4. Make them keep notes. I use Google Colab so that they can interweave notes with code snippets (at least in the early days) to encourage this.

5. Partner up. It's often easier to find the error in someone else's code than in your own. Could even have a "walk around time" when all the students go and look at others' screens to both get ideas and see if they can spot errors.

That's what springs to mind on reading your thread. I hope some of it's useful! As with all advice on the internet - keep what helps and ignore what doesn't.

matt@istheguy.com

@EricLawton Oh my goodness yes this 100.

Stolen from somewhere: there are 2 hard problems in software, (1) human communication and (2) convincing people that human communication is important

@futurebird @david_chisnall

apontious@dice.camp

@wakame @Catfish_Man @futurebird The stat I heard was that you introduce two bugs for every line of code you write.

futurebird@sauropods.win

@apontious @wakame @Catfish_Man

ericlawton@kolektiva.social

@unlambda

Once there are few enough expert human programmers left, the price will go up.

And, if I read you correctly, they don't guarantee output accuracy with respect to input tokens but charge extra to try again.

And if they charge per output token, that is incentive to generate filler, certainly not to optimize.

@maco @aredridel @futurebird @david_chisnall

stevenaleach@sigmoid.social

@aredridel @maco @futurebird @EricLawton @david_chisnall Tokens tend to be common short words or common series of letters. They're usually derived from character-pair encoding over a large corpus.

Basically take a text and count all the pairs of characters and replace the most common pair with a new character, repeat until you reach the desired vocab size.

As a result, LLMs don't know how to *spell* and are blind to how long words are, etc. Reversing letters, for instance, is a hard task.

stilescrisis@mastodon.gamedev.place

@EricLawton There have been 500,000 tech layoffs in the last few years. We've got no shortage of skilled tech knowledge for hire. At the pace we're going, there's no chance of a dwindling supply of programmers in my lifetime.

unlambda@hachyderm.io

@futurebird @aredridel @EricLawton @david_chisnall @maco They have been improving the ability of the models writing code, probably faster than it's improving on almost any other ability. They can do this by what's called reinforcement learning with verifiable rewards (RLVR), since with code it's possible to verify whether the result is correct or not (whether it compiles, whether it passes a particular test or test suite, etc)

So while the pre training is based on just predicting the next token in existing code bases, they can then make it better and better at coding by giving it problems to solve (get this code to compile, fix this bug, implement this feature, etc), check whether it succeeded, and apply positive or negative reinforcement based on the result.

And this can scale fairly easily; you can come up with whole classes of problems, like "implement this feature in <language X>" and vary the language while using the same test suite, and now you can train it to write all of those languages better.

So while there are also improvements in the tooling, the models themselves have been getting quite a bit better at both writing correct code on the first try, and also figuring out what went wrong and fixing it when it doesn't work on the first try.

In fact, there are now open weights models (models that you can download and run on your own hardware, though for the biggest ones you really need thousands to tens of thousands of dollars of hardware to run the full model) which are competitive with the top tier closed models from just 6 months ago or so on coding tasks, in large part because of how effective RLVR is.

ersatzmaus@mastodon.social

@futurebird I'd respond with a few key questions:

- In what way is it not working?
- Why do you think that is?
- If you can see errors, what do they tell you?
- How can you find out more about what is or is not happening?

And there's the all-important "What are your assumptions, and are they correct?"

llewelly@sauropods.win

@futurebird @EricLawton @david_chisnall
there are certain languages (such as C) in which that would be a cruel trick; lots of code which contains subtle undefined behavior bugs that don't show easily will compile without errors, or in many cases, often without warnings as well. Not all undefined behavior is detectable at compile time.

unlambda@hachyderm.io

@EricLawton @maco @aredridel @futurebird @david_chisnall we don't know exactly how much it costs for the closed models; they may be selling at a loss, break even, or a slight profit on interference. But you can tell exactly how much inference costs with open weights models, you can run them on your own hardware and measure the cost of the hardware and power. And there's a competitive landscape of providers offering to run them. And open weights models are only lagging behind the closed models by a few months by now.

If the market consolidates down to only one or two leading players, then yes, it's possible for them to put a squeeze on the market and jack up prices. But right now, it's a highly competitive market, with very little stickiness, it's very easy to move to a different provider if the one you're using jacks up prices. Right now each of OpenAI, Anthropic, Google, and xAI are releasing frontier models regularly which leapfrog each other on various benchmarks, and the Chinese labs are only a few months behind, and generally release open weight models which are much easier to measure and build on top of. There's very little moat right now other than sheer capacity for training and inference.

And I would expect, if we do get a consolidation and squeeze, it would just be by jacking up prices, not by generating too many tokens. Right now inference is highly constrained; those people I work with who use these models regularly hit capacity limitations all the time. These companies can't build out capacity fast enough to meet demand, so if anything they're motivated to make things more efficient right now.

I have a lot of problems with the whole LLM industry, and I feel like in many ways it's being rushed out before we're truly ready for all of the consequences, but it is actually quite in demand right now.

ericlawton@kolektiva.social

@stilescrisis

If you haven't been coding for a few years, you won't be a skilled programmer. It won't take a lifetime to run out of them.

flipper@mastodonapp.uk

@raganwald
The best, most succinct, explanation of the difference here came from @pluralistic:
Coding makes things run well, software engineering makes things fail well.
All meaningful software fails over time as it interacts with the real world and the real world changes., so handling failure cases well is important.
Handling these cases involves expanding one's context window to take into account a lot of different factors.
For LLMs, a linear increase in the context window results in a quadratic increase in processing. And the unit economics of LLMs sucks already without squaring the costs.
Which is why AI, in its current incarnation, is fundamentally not capable of creating good software.

(I've heavily paraphrased, so apologies if he reads this).

@futurebird @EricLawton @david_chisnall

flipper@mastodonapp.uk

@futurebird Wait until you teach them the "let it crash" philosophy of software engineering.

everyopsguy@infosec.exchange

@futurebird one recommendation - one rule that worked when I was learning programming and my teacher didn't like when I interrupted her - if you've got an issue because you're ahead or behind others, wait till the teacher is available. Till then, muck around, debug, try random things.

apophis@brain.worm.pink

@futurebird
> 2. The error will make sense. It's not random. The computer does not "just hate you"

learning to have a constant faith in this has gotten me through so much shit that might otherwise have caused me to physically break something and give up forever

psychologically it's like "if you keep the spear pointed at the horse you will be safer than if you broke rank and ran" - you know logically that is what it is but every second of it is screaming at you to ignore that understanding and in the end what you train for will win out

apophis@brain.worm.pink

@futurebird @mansr constantly grumbling the whole time you're fixing the problem about the idiots who design $THING like that can be a helpful coping mechanism for some

apophis@brain.worm.pink

@futurebird @mansr ...this just goes back to my whole thing about if maybe younger people have more learned helplessness about everything because more of their lives is dictated by arbitrary rules imposed on them by [EDIT: the invisible, untouchable people in some office somewhere who dictate] their cultural environment rather than the non-arbitrary rules of the physical world

no matter how dumb the rules of a sportball game get, the ball *must* move in certain ways in response to certain actions

that's not the case in a video game

Abspeckgeflüster – Forum für Menschen mit Gewicht(ung)

Wanted: Advice from CS teachers