@criffer @mos_8502 Part of the disgusting way commercial LLMs are trained is that they're not just hoovering up the Internet: there are armies of invisible underpaid and exploited people in the majority world that filter, categorize, label, and otherwise massage the training data. This has been happening in some capacity for a very long time for commercial ML-based services in general, e.g. image classification services, and not just the recent explosion LLMs.
When you realize the horrific stuff these people have to deal with on a daily basis in filtering the numerous awful bits of the Internet and the fact that these poor workers get zero mental health support and take all that trauma back to their homes and communities, it's hard not to be absolutely disgusted by the entire industry.