To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
-
@osm_tech question. Why do people scrape server which make the data freely available? And, probably, better structured in the final product. I don't see the point.
@JonSaenzAgirre It is a good questions, and we don't know the answer either. Our planet data is so much easier to process and use.
-
@osm_tech tHeN yOu jUsT neEd tO sCaLe
@utf_7 In this economy with RAM prices what they are?!?

-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #Abuse@osm_tech@en.osm.town
Could something like Anubis help you guys? -
W wando@troet.cafe shared this topic
-
@osm_tech does coming from residential IPs mean that someone has baked a scraper into some popular tool that people don't realize is doing that?
@HunterZ@mastodon.sdf.org @osm_tech@en.osm.town lots of mobile/desktop apps, browser extensions, and even IoT devices are paid by "residential proxy" companies to prey on their users by selling said users's connections to AI scrapers https://www.spamhaus.org/resource-hub/compromised/lets-talk-about-the-danger-of-residential-proxy-networks/
-
@HunterZ @osm_tech this is actually quite common. Mobile advertising SDKs for games, background apps, etc include residential scraping proxy functionality that they can sell to the highest bidder, and then when scrapers want to avoid restrictions they can pay a fraction of a penny to send their requests via your phone. Millions of people use apps with this built in and have no idea. Most websites don't want to ban the residential scrapers because it can hurt growth.
@ryanprior @HunterZ @osm_tech I have that scraping also on my private webserver and it forced me to make a whole bunch of content private. yet still the botnet scrapes onto it and gets 404s now. Every single request from a different IP...
-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #Abuse@osm_tech one day if you'd like to switch to nginx, I lend you a hand if you have a specific problem
-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #Abuse@osm_tech I wonder if the culprit will ever come forward, apologise, and change their ways? Someone tasked these proxy scrapers with ridiculous requests.
Have they been targeting the main OSM API, the website interface designed for humans, or Overpass? -
@ryanprior @HunterZ @osm_tech I have that scraping also on my private webserver and it forced me to make a whole bunch of content private. yet still the botnet scrapes onto it and gets 404s now. Every single request from a different IP...
@olbohlen @HunterZ @osm_tech sad to hear that! It's wild though, you can sign up for a scraper proxy service in minutes. They're legal, inexpensive, and easy to use. Admins who assume scrapers are using their own machines that inauthentic traffic will come from a few IP addresses are sadly living in the past.
-
@olbohlen @HunterZ @osm_tech sad to hear that! It's wild though, you can sign up for a scraper proxy service in minutes. They're legal, inexpensive, and easy to use. Admins who assume scrapers are using their own machines that inauthentic traffic will come from a few IP addresses are sadly living in the past.
@ryanprior @HunterZ @osm_tech sure I could, but I refuse to put my selfhosted stuff behind some new dependency...
-
@ryanprior @HunterZ @osm_tech sure I could, but I refuse to put my selfhosted stuff behind some new dependency...
-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #AbuseMight be a good idea to become OSMF Member now or just donate some money.
Membership is starting at 15£/yer
https://supporting.openstreetmap.org/ -
@JonSaenzAgirre It is a good questions, and we don't know the answer either. Our planet data is so much easier to process and use.
@osm_tech @JonSaenzAgirre thats dumb ai, probably. No "i" at all...
-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #Abuse -
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #Abuse@osm_tech sounds familiar, last year I braved turning cloudflares "under attack" mode off for https://dnshistory.org/ and saw an extra 5 million requests/day (500k unique IPs) overloading things. It's still blocking >700k requests/day a month later...
-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #Abuse@osm_tech and we can tell the scrapers are AI built because a cursory glance at the documentation on the "coders" part would've prevented this problem.
-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #Abuse@osm_tech Thank you. I'm a beginner who has just been doing toy projects and has barely any notion of what web scraping is but I'm very happy to learn that your data can be downloaded

-
@osm_tech I wonder if the culprit will ever come forward, apologise, and change their ways? Someone tasked these proxy scrapers with ridiculous requests.
Have they been targeting the main OSM API, the website interface designed for humans, or Overpass? -
@osm_tech does coming from residential IPs mean that someone has baked a scraper into some popular tool that people don't realize is doing that?
-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #Abuse@osm_tech Limit the speed to Modem 14400 speed each IP for a month or so.

-
N necrosis@chaos.social shared this topic
-
@osm_tech question. Why do people scrape server which make the data freely available? And, probably, better structured in the final product. I don't see the point.
@JonSaenzAgirre @osm_tech
The scrapers are DUMB.
They are not curated, have only basic maintenance, are built to gobble up ANYTHING textual they encounter, without respect, mercy or reason.Just collect meaningless data.
That’s the nature of the coveted LLMs: just statistics, no understanding, structure or meaning.
And greedy crooks in haste to make quick money just grab everything they can.
The AI bubble needs to pop really soon.