···2233Send Malicious Scrapers into an equally malicious tarpit with added rusty nails. Nailpit is an exercise in offensive security, in which malicious actors (in this case, web scrapers) are targeted and have their resources wasted/attacked. The purpose is to use this against scrapers that *ignore* one's `robots.txt` file and any `Disallow` directives, particularly ones that try to scrape private/non-public sections of one's websites/services. In doing so and with enough volume, such scrapers can constitute an effective DoS attack. *This is bad*. Therefore, this project aims to contribute another tool in making sure such misbehaving scrapers are discouraged from targeting your website by inundating them with garbage and poisoned content.
4455+## Performance
66+77+`nailpit` generates pages *extremely* quickly, 4 worker threads able to pump out 450 MB/s easily on an AMD 7950X CPU, and sending 64kB pages around 6K req/s. It spends 1-2ms at most to throw a full response back, with the index/warning page hitting 270k req/s. This is without rate-limiting and slow-loris'ing the responses.
88+99+The idea is to use this to flood back scrapers with garbage cheaply that would otherwise hit more expensive routes/applications that are easier to DoS. Or you slow-loris to force the scrapers to slow down and reduce the load on the server by streaming the response gradually. As such, this reduces both CPU and memory pressure on the server.
1010+1111+Rate-limiting has a spicy mode to attempt to disrupt the scrapers further if they hit the limits. Do be warned that this mode might be seen as... naughty. But all in all, `nailpit` is designed to *not* use too much resources, and markov chains are orders of magnitudes cheaper at generating slop than an LLM might. Extra sloppy, yum yum.
1212+513## Disclaimer
614715Nailpit is intended to not be exposed to the public, only to bots/scrapers. Any link into the tarpit should be hidden to users, and the initial entry point for Nailpit is a disclaimer as well. This project is not responsible for misconfigured deployments or consequences relating to that. You are responsible for ensuring this is deployed correctly and employed against only agents that are ignoring widely used and accepted web standards such as `robots.txt`.