If you write about the messy reality behind "free" internet services: we're seeing #OpenStreetMap hammered by scrapers hiding behind residential proxy/embedded-SDK networks.
-
@osm_tech if you have anyone with a good microphone/audio setup, who's willing to speak on a podcast, I'd be happy to record something for one of the @latenightlinux shows.
@joeress @latenightlinux Sounds interesting. We'll follow up by email.
-
If you write about the messy reality behind "free" internet services: we're seeing #OpenStreetMap hammered by scrapers hiding behind residential proxy/embedded-SDK networks. We're a volunteer-run service and the costs are real. We'd love to talk to a journalist about what we're seeing + how we're responding. #AI #Bots #Abuse
@osm_tech Tell me more. You can reach me at sjvn01 <at> gmail.com
-
@BalooUriza We use fail2ban to handle some of this with custom rules, but eventually fail2ban becomes a bottleneck after 100,000 IP addresses.
@osm_tech @BalooUriza For IPv4, a bitmask of the entire address space is a viable "efficient" implementation of blocking. I wonder if there are tools that can do it that way rather than needing a gigantic list.
-
This gets ugly really fast, if you want to see the full extent: <https://alternativeto.net/software/netnut-proxy-network/> for a list of _known_ residential proxy-providers.
@AliveDevil @utf_7 @osm_tech So ridiculous that Google and Apple won't just permaban any developer embedding one of these "SDKs".
-
@osm_tech The proxy SDK providers need to be treated like the DDOS providers they are and prosecuted.
@InsertUser @osm_tech Pulling them from app stores and banning developers of the SDKs would be a good start. Save the criminal charges for after the damage control is done.
-
@pietervdvn Because that would involve a human using their brains or having a shred of conscience and those both go against the basic principles of the companies doing this.
@InsertUser @pietervdvn @osm_tech It goes against their whole ideology. The ideology says trust the machine to do what it copied from scraped Stack Overflow posts. If you try to intervene to make it do better, you're not trusting it.
-
@osm_tech @BalooUriza For IPv4, a bitmask of the entire address space is a viable "efficient" implementation of blocking. I wonder if there are tools that can do it that way rather than needing a gigantic list.
@osm_tech @BalooUriza Like, a bitmask of IPv4 space is several times smaller than a Chrome instance.

-
@AliveDevil @utf_7 @osm_tech So ridiculous that Google and Apple won't just permaban any developer embedding one of these "SDKs".
@dalias I'd wish for them to enforce policies, but they get Ad- and IAP-revenue, so why bother.
Also, these "Sdks" probably have kill-switches (or rather, delayed activation) built-in, to not immediately contact their C&C servers.
-
@dalias I'd wish for them to enforce policies, but they get Ad- and IAP-revenue, so why bother.
Also, these "Sdks" probably have kill-switches (or rather, delayed activation) built-in, to not immediately contact their C&C servers.
@AliveDevil Yes but they could still be banned when caught. A few devs getting banned would be a big deterrent for others to ship this malware.
The right *technical* defense, however, is not to allow apps arbitrary network access unless they're declared in the manifest as a "browser" or other "client software" that the user can use with any service they want (like IRC clients, mail clients, Mastodon clients, etc.).
Instead, the manifest should declare a single domain the app can contact, or multiple if the developer is willing to pay for more intensive vetting of them, and only allow network access to the declared domain(s).
-
If you write about the messy reality behind "free" internet services: we're seeing #OpenStreetMap hammered by scrapers hiding behind residential proxy/embedded-SDK networks. We're a volunteer-run service and the costs are real. We'd love to talk to a journalist about what we're seeing + how we're responding. #AI #Bots #Abuse
That's something for you @404mediaco, isn't it?
-
If you write about the messy reality behind "free" internet services: we're seeing #OpenStreetMap hammered by scrapers hiding behind residential proxy/embedded-SDK networks. We're a volunteer-run service and the costs are real. We'd love to talk to a journalist about what we're seeing + how we're responding. #AI #Bots #Abuse
Please contact me on Signal: DanArs.82
-
@LMieldazis @geerlingguy oooh do we get to show him our out-of-band (remote access) Raspberry Pi with dual power feeds, 4G modem and loads of serial connections? Saved our skin a good few times.
@osm_tech @LMieldazis would love to talk maps ops! I've seen many projects wrapping in map data and adding scripts to dl entire regions
-
If you write about the messy reality behind "free" internet services: we're seeing #OpenStreetMap hammered by scrapers hiding behind residential proxy/embedded-SDK networks. We're a volunteer-run service and the costs are real. We'd love to talk to a journalist about what we're seeing + how we're responding. #AI #Bots #Abuse
-
If you write about the messy reality behind "free" internet services: we're seeing #OpenStreetMap hammered by scrapers hiding behind residential proxy/embedded-SDK networks. We're a volunteer-run service and the costs are real. We'd love to talk to a journalist about what we're seeing + how we're responding. #AI #Bots #Abuse
@osm_tech you and everyone else…
-
If you write about the messy reality behind "free" internet services: we're seeing #OpenStreetMap hammered by scrapers hiding behind residential proxy/embedded-SDK networks. We're a volunteer-run service and the costs are real. We'd love to talk to a journalist about what we're seeing + how we're responding. #AI #Bots #Abuse
@osm_tech Pinging @GarretSidzaka as he might have some leads.
-
If you write about the messy reality behind "free" internet services: we're seeing #OpenStreetMap hammered by scrapers hiding behind residential proxy/embedded-SDK networks. We're a volunteer-run service and the costs are real. We'd love to talk to a journalist about what we're seeing + how we're responding. #AI #Bots #Abuse
@osm_tech Why not write the article yourself as a blog post? Would much rather hear the full version of your side of the story than a journo's interpretation of it.
-
If you write about the messy reality behind "free" internet services: we're seeing #OpenStreetMap hammered by scrapers hiding behind residential proxy/embedded-SDK networks. We're a volunteer-run service and the costs are real. We'd love to talk to a journalist about what we're seeing + how we're responding. #AI #Bots #Abuse
I feel for yall. These residential proxies and the sdk networks are the bane of my existence and I’m paid to deal with them.
-
If you write about the messy reality behind "free" internet services: we're seeing #OpenStreetMap hammered by scrapers hiding behind residential proxy/embedded-SDK networks. We're a volunteer-run service and the costs are real. We'd love to talk to a journalist about what we're seeing + how we're responding. #AI #Bots #Abuse
@osm_tech @jwildeboer recently wrote about these sdk-based services. His approach might be of use here - or at the very least, it might make useful reading: https://jan.wildeboer.net/2025/02/Blocking-Stealthy-Botnets/ and https://jan.wildeboer.net/2025/04/Web-is-Broken-Botnet-Part-2/
-
If you write about the messy reality behind "free" internet services: we're seeing #OpenStreetMap hammered by scrapers hiding behind residential proxy/embedded-SDK networks. We're a volunteer-run service and the costs are real. We'd love to talk to a journalist about what we're seeing + how we're responding. #AI #Bots #Abuse
@osm_tech Have you heard of Anubis by Xe Iaso? 🥺 Good luck, we need you!!
-
@osm_tech I wonder if there's a way to fail2ban requests coming in faster than typically found in human requests.
Cycling to new IPs is trivial, I ban a few thousand IPs and cidr ranges in my WAF, I’ll see 75% of them show up the next time the scraper hits. Then after that most don’t show up again and the next scrape comes from a mostly new set of IPs.
I’ve see A few instances where they will cycle IPs during the same scraping event if some of them are blocked.
I’ve got scrapers that will send every request from a unique IP.
There is a lot of money to be made right now offering hard to block scraping services or tools to enable them.