···11+## Anubis has the ability to let you import snippets of configuration into the main
22+## configuration file. This allows you to break up your config into smaller parts
33+## that get logically assembled into one big file.
44+##
55+## Of note, a bot rule can either have inline bot configuration or import a
66+## bot config snippet. You cannot do both in a single bot rule.
77+##
88+## Import paths can either be prefixed with (data) to import from the common/shared
99+## rules in the data folder in the Anubis source tree or will point to absolute/relative
1010+## paths in your filesystem. If you don't have access to the Anubis source tree, check
1111+## /usr/share/docs/anubis/data or in the tarball you extracted Anubis from.
1212+1313+bots:
1414+# Pathological bots to deny
1515+- # This correlates to data/bots/ai-robots-txt.yaml in the source tree
1616+ import: (data)/bots/ai-robots-txt.yaml
1717+- import: (data)/bots/cloudflare-workers.yaml
1818+- import: (data)/bots/headless-browsers.yaml
1919+- import: (data)/bots/us-ai-scraper.yaml
2020+2121+# Search engines to allow
2222+- import: (data)/crawlers/googlebot.yaml
2323+- import: (data)/crawlers/bingbot.yaml
2424+- import: (data)/crawlers/duckduckbot.yaml
2525+- import: (data)/crawlers/qwantbot.yaml
2626+- import: (data)/crawlers/internet-archive.yaml
2727+- import: (data)/crawlers/kagibot.yaml
2828+- import: (data)/crawlers/marginalia.yaml
2929+- import: (data)/crawlers/mojeekbot.yaml
3030+3131+# Allow common "keeping the internet working" routes (well-known, favicon, robots.txt)
3232+- import: (data)/common/keep-internet-working.yaml
3333+- import: /xe/cfg/anubis/xesite-rss-feeds.yaml
3434+3535+# # Punish any bot with "bot" in the user-agent string
3636+# # This is known to have a high false-positive rate, use at your own risk
3737+# - name: generic-bot-catchall
3838+# user_agent_regex: (?i:bot|crawler)
3939+# action: CHALLENGE
4040+# challenge:
4141+# difficulty: 16 # impossible
4242+# report_as: 4 # lie to the operator
4343+# algorithm: slow # intentionally waste CPU cycles and time
4444+4545+# Generic catchall rule
4646+- name: generic-browser
4747+ user_agent_regex: >
4848+ Mozilla|Opera
4949+ action: CHALLENGE
5050+5151+dnsbl: false