Nailpit#
Send Malicious Scrapers into an equally malicious tarpit with added rusty nails. Nailpit is an exercise in offensive security, in which malicious actors (in this case, web scrapers) are targeted and have their resources wasted/attacked. The purpose is to use this against scrapers that ignore one's robots.txt file and any Disallow directives, particularly ones that try to scrape private/non-public sections of one's websites/services. In doing so and with enough volume, such scrapers can constitute an effective DoS attack. This is bad. Therefore, this project aims to contribute another tool in making sure such misbehaving scrapers are discouraged from targeting your website by inundating them with garbage and poisoned content.
Performance#
nailpit generates pages extremely quickly, 4 worker threads able to pump out 450 MB/s easily on an AMD 7950X CPU, and sending 64kB pages around 6K req/s. It spends 1-2ms at most to throw a full response back, with the index/warning page hitting 270k req/s. This is without rate-limiting and slow-loris'ing the responses.
The idea is to use this to flood back scrapers with garbage cheaply that would otherwise hit more expensive routes/applications that are easier to DoS. Or you slow-loris to force the scrapers to slow down and reduce the load on the server by streaming the response gradually. As such, this reduces both CPU and memory pressure on the server.
Rate-limiting has a spicy mode to attempt to disrupt the scrapers further if they hit the limits. Do be warned that this mode might be seen as... naughty. But all in all, nailpit is designed to not use too much resources, and markov chains are orders of magnitudes cheaper at generating slop than an LLM might. Extra sloppy, yum yum.
Basically, Caddy server is more likely to crap out before nailpit struggles to serve responses or it eats up all available resources.
Disclaimer#
Nailpit is intended to not be exposed to the public, only to bots/scrapers. Any link into the tarpit should be hidden to users, and the initial entry point for Nailpit is a disclaimer as well. This project is not responsible for misconfigured deployments or consequences relating to that. You are responsible for ensuring this is deployed correctly and employed against only agents that are ignoring widely used and accepted web standards such as robots.txt.
Minimum CPU Support#
For x86_64/amd64 processors, processors that at least qualify for x86-64-v3 level (so supporting AVX2), and for aarch64, processors from the A53 onwards (with NEON support, so Raspberry PI 3 B+). armv7 and RISCV64 are compiled without instruction optimisations.
How to use / deploy#
By default, nailpit won't work unless you provide at least some input data. In the directory you are running nailpit from, create an input directory and add a .txt file inside of it. Name it anything, whatever, like first.txt. In this file, add in content/text, the more the better, as this will train the markov chain on what to generate. So for example, add many paragraphs of lorem ipsum text to the file just to see it work. Once you have at least one txt input, nailpit will be able to run. Multiple .txt files will act as different markov chains, each one outputting differently structured text to the other, and each time a generated page is requested, these chains are selected at random to produce content for that request. So if you want more varied/randomised content, you want not just very large text files of pure text/content, but also many different files. Do keep in mind that the more content and files you use, the bigger the memory usage of the application, though this is kept in check with some memory optimisation techniques.
The input text files should be pure text. It should not be html or markdown or any other format, just text.
Docker#
The easiest way to run nailpit is to run it in a docker container. This makes it fairly easy to deploy and ensures its running environment is consistent. This does add a bit more overhead, but realistically, not enough to really matter.
If building the image directly from this repo, just use docker build . -t nailpit, or if cross-compiling to a different platform like a Raspberry Pi 5, run docker build --platform=linux/arm64 .. nailpit docker image supports linux/amd64, linux/arm64, linux/arm/v7 and linux/riscv64 platforms. Running the image then becomes the following (with two volumes provided for user overrides):
docker run -v ./configuration/:/app/configuration -v ./input/:/app/input -p 3001:3001/tcp nailpit:latest
The socket nailpit listens to can be overridden with -e NAILPIT_SOCKET=0.0.0.0:3001, and it expects the full ip:port string. There's three volumes to be configured, one for /app/configuration which is where the default config file lives and will be where your override config file will live, one for /app/input which is where the user's input files are located, and the last /app/templates for user provided template overrides.
Docker Compose#
nailpit images are available from docker hub via docker.io/sachymetsu/nailpit:latest. Right now, only the latest tag is provided, but these should be stable enough. Using nailpit with docker compose can be done with the following example configuration:
services:
nailpit:
container_name: nailpit
image: docker.io/sachymetsu/nailpit:latest
restart: unless-stopped
volumes:
- /home/user/nailpit/configuration:/app/configuration
- /home/user/nailpit/input:/app/input
- /home/user/nailpit/templates:/app/templates
network_mode: host
Images are currently provided for linux/amd64 and linux/arm64 platforms.
Configuration#
All of the configuration options are documented in the default config file found here. To create your own configuration, create a pit.toml file in the configuration folder and add just the configuration options you want to override.
How to contribute#
If you are interested in contributing to this project, check out the CONTRIBUTING document.
Code of Conduct#
If you are interested in contributing to this project, be sure to review the CODE OF CONDUCT.
License#
This project is licensed under AGPL 3.0.