As much as I loathe LLM "AI" built from hoards of stolen data, machine learning "AI" has become terrifically useful.
-
As much as I loathe LLM "AI" built from hoards of stolen data, machine learning "AI" has become terrifically useful.
This past week I had 10 audio recorders set out in the forest and nearby grassland, all recording non-stop from Monday afternoon to Friday morning. That was on our recent university field ecology field trip.
Today I downloaded all the files to a hard drive (156 GB of data) and then I set my little M1 Macbook Air to work, using the offline desktop BirdNet app to identify all of the birds in the recordings.
It took most of the day, and now I have a 42,284 row spreadsheet of birds detected.
It really feels like magic.
Here's a quick sorted lists of all the bird detections with species IDs with a confidence score >0.9.
Together with the students in the course, we'll later compare how birds have changed since we started doing this in 2020, and how the birds in the grassland differ from the forest.
@joncounts LLMs (combined with ASR models) are basically this but for text.
LLMs let you turn large and messy corpora of text, audio and images into a neat csv, which you can then analyze in any data science tool of choice. You can't just Excel your way through "how likely are right-wing newspapers to mention the race of a rapist, depending on what that race is." Not without a team of grad students doing the gruntwork at least. LLMs automate away all of that gruntwork, letting you answer research questions much faster.
You can't use pure Chat GPT for this, you need specialized tools.
Sure, LLMs hallucinate, just like human annotators do. This is why you (as a human) need to go through a sample of your corpus and figure out what your hallucination rate is. This is still much faster than annotating the entire corpus.
-
@joncounts which audiorecorders did you use. I remember Audiomoth. Are those still a thing?
@tillmanreuter We use NZ Department of Conservation manufactured AR4s, which come in excellent weather proofed cases, plus we’ve got a set of AudioMoths plugged into better microphones (the same that the AR4s use).
-
@joncounts How do you record continuously? I have tried this but I run into battery life issues.
@spacefinner The Department of Conservation AR4s use four AA batteries and can easily run longer than a week, although we program them to record at lower frequency at night to save power (since the nocturnal birds in NZ don’t sing at such high frequencies). I use the three AA battery case for the AudioMoths, which runs for about two weeks (although only with lower power microSD cards).
-
@joncounts LLMs (combined with ASR models) are basically this but for text.
LLMs let you turn large and messy corpora of text, audio and images into a neat csv, which you can then analyze in any data science tool of choice. You can't just Excel your way through "how likely are right-wing newspapers to mention the race of a rapist, depending on what that race is." Not without a team of grad students doing the gruntwork at least. LLMs automate away all of that gruntwork, letting you answer research questions much faster.
You can't use pure Chat GPT for this, you need specialized tools.
Sure, LLMs hallucinate, just like human annotators do. This is why you (as a human) need to go through a sample of your corpus and figure out what your hallucination rate is. This is still much faster than annotating the entire corpus.
@miki Thanks. Yes, it’s all the hoovering up of training data sets without permission by the big LLM products that I object to (plus the massive power consumption needed to build and refine the models). The tech behind the models is pretty neat.
-
As much as I loathe LLM "AI" built from hoards of stolen data, machine learning "AI" has become terrifically useful.
This past week I had 10 audio recorders set out in the forest and nearby grassland, all recording non-stop from Monday afternoon to Friday morning. That was on our recent university field ecology field trip.
Today I downloaded all the files to a hard drive (156 GB of data) and then I set my little M1 Macbook Air to work, using the offline desktop BirdNet app to identify all of the birds in the recordings.
It took most of the day, and now I have a 42,284 row spreadsheet of birds detected.
It really feels like magic.
Here's a quick sorted lists of all the bird detections with species IDs with a confidence score >0.9.
Together with the students in the course, we'll later compare how birds have changed since we started doing this in 2020, and how the birds in the grassland differ from the forest.
@joncounts have you by chance captured a sound of a tree falling, or it doesn't make a sound when there is a listening device
-
As much as I loathe LLM "AI" built from hoards of stolen data, machine learning "AI" has become terrifically useful.
This past week I had 10 audio recorders set out in the forest and nearby grassland, all recording non-stop from Monday afternoon to Friday morning. That was on our recent university field ecology field trip.
Today I downloaded all the files to a hard drive (156 GB of data) and then I set my little M1 Macbook Air to work, using the offline desktop BirdNet app to identify all of the birds in the recordings.
It took most of the day, and now I have a 42,284 row spreadsheet of birds detected.
It really feels like magic.
Here's a quick sorted lists of all the bird detections with species IDs with a confidence score >0.9.
Together with the students in the course, we'll later compare how birds have changed since we started doing this in 2020, and how the birds in the grassland differ from the forest.
@joncounts
I set up a BirdNetPi in my garden. The number of false positives for species that could not possibly be there was ridiculous. My local record centre were not at all interested in the data either due to unreliability.
I discontinued that experiment. -
As much as I loathe LLM "AI" built from hoards of stolen data, machine learning "AI" has become terrifically useful.
This past week I had 10 audio recorders set out in the forest and nearby grassland, all recording non-stop from Monday afternoon to Friday morning. That was on our recent university field ecology field trip.
Today I downloaded all the files to a hard drive (156 GB of data) and then I set my little M1 Macbook Air to work, using the offline desktop BirdNet app to identify all of the birds in the recordings.
It took most of the day, and now I have a 42,284 row spreadsheet of birds detected.
It really feels like magic.
Here's a quick sorted lists of all the bird detections with species IDs with a confidence score >0.9.
Together with the students in the course, we'll later compare how birds have changed since we started doing this in 2020, and how the birds in the grassland differ from the forest.
This is the difference between agentic AI and generative AI... One is actually useful, the other is not.
Agentic AI is what you've used to do a specific task. It's also used to check scans for disease and tumours, predict weather patterns, structural integrity in engineering... to try and come up with new compounds for medicines, new chemical structures. It's used in physics and astronomy... It's also used to spy and invade privacy of citizens.
Aside from the spying side of the equation, it's hard to make enough money from the rest to justify the amount of money being spent.. and by justify the money made... it's not enough to satisfy the greedy wealth hoarding wankers behind them.
Generative AI is worthless... it's trying to replace every job, every human... because more profit can be made if you don't have to worry about paying wages and abiding by laws and regulations... Everything it does is slop.
-
As much as I loathe LLM "AI" built from hoards of stolen data, machine learning "AI" has become terrifically useful.
This past week I had 10 audio recorders set out in the forest and nearby grassland, all recording non-stop from Monday afternoon to Friday morning. That was on our recent university field ecology field trip.
Today I downloaded all the files to a hard drive (156 GB of data) and then I set my little M1 Macbook Air to work, using the offline desktop BirdNet app to identify all of the birds in the recordings.
It took most of the day, and now I have a 42,284 row spreadsheet of birds detected.
It really feels like magic.
Here's a quick sorted lists of all the bird detections with species IDs with a confidence score >0.9.
Together with the students in the course, we'll later compare how birds have changed since we started doing this in 2020, and how the birds in the grassland differ from the forest.
@joncounts I do think it's revolutionary tech, which of course you get corpos trying everything they can to turn it into a profit machine rather than something that improves the human condition.
But I think things are gonna settle down where it's no longer black magic and most people know to use it for the practical problems it excels at.
Of course its impact will not be all good and I'm deeply concerned about the future of art in popular culture.
-
@joncounts I do think it's revolutionary tech, which of course you get corpos trying everything they can to turn it into a profit machine rather than something that improves the human condition.
But I think things are gonna settle down where it's no longer black magic and most people know to use it for the practical problems it excels at.
Of course its impact will not be all good and I'm deeply concerned about the future of art in popular culture.
@joncounts but at some point that's just what revolutionary tech does, it creates deep change and we are left to either lament what was lost or adapt to the new reality.
And I'm not concerned over the future of art itself because humans will always create and push the boundaries of aesthetic design, no matter what.
-
As much as I loathe LLM "AI" built from hoards of stolen data, machine learning "AI" has become terrifically useful.
This past week I had 10 audio recorders set out in the forest and nearby grassland, all recording non-stop from Monday afternoon to Friday morning. That was on our recent university field ecology field trip.
Today I downloaded all the files to a hard drive (156 GB of data) and then I set my little M1 Macbook Air to work, using the offline desktop BirdNet app to identify all of the birds in the recordings.
It took most of the day, and now I have a 42,284 row spreadsheet of birds detected.
It really feels like magic.
Here's a quick sorted lists of all the bird detections with species IDs with a confidence score >0.9.
Together with the students in the course, we'll later compare how birds have changed since we started doing this in 2020, and how the birds in the grassland differ from the forest.
Such a good proof of the caveat "Technology is just a tool, it can be used for good or be used for evil." People decide which.
-
As much as I loathe LLM "AI" built from hoards of stolen data, machine learning "AI" has become terrifically useful.
This past week I had 10 audio recorders set out in the forest and nearby grassland, all recording non-stop from Monday afternoon to Friday morning. That was on our recent university field ecology field trip.
Today I downloaded all the files to a hard drive (156 GB of data) and then I set my little M1 Macbook Air to work, using the offline desktop BirdNet app to identify all of the birds in the recordings.
It took most of the day, and now I have a 42,284 row spreadsheet of birds detected.
It really feels like magic.
Here's a quick sorted lists of all the bird detections with species IDs with a confidence score >0.9.
Together with the students in the course, we'll later compare how birds have changed since we started doing this in 2020, and how the birds in the grassland differ from the forest.
Would you please like to describe how you verified the results? You do have some kind of verification process, I guess, it would be interesting to read about it.
-
E energisch_@troet.cafe shared this topic